Benefits of Open Data

Learning Objectives

Communicate the benefits and challenges of Open data and it’s effects on science

Introduction

In this lesson, we’ll discuss the benefits of open data and in particular its direct effect in advancing Open Science. We will also discuss details of how Open Data can impact the response of science in global emergencies, and how Open Data facilitates multidisciplinary work.

Open Data for the greater good

As we mentioned earlier, data plays a significant role in our day-to-day lives. Open Data, in particular, has played a key role. If you pause and think about it, you may realize that Open Data is not only common in our society, but you might have benefited from it and used it yourself.

Here, are some notable examples of Open Data that has positively impacted society at large:

Each country or territory often provides open access to a variety of socioeconomic information about the population, community, and business in its jurisdiction. These data are often called census survey data which may include the aggregated statistics of gender, race, ethnicity, education, income, and health data of a community. These data are often used to understand the composition of a local neighborhood and are critical to inform decisions on resource allocation to ensure the quality of life for the community.

The changing climate poses a significant risk to our daily lives and has been responsible for intensifying drought, increasing flooding, and devastating fire incidents worldwide. Open data is therefore critical in providing life-saving information to adapt to the changing climate and help assess the climate risks of the place where we live. Government agencies (e.g., National Oceanic Atmospheric Administration in the U.S., UK Met Office, European Centre for Medium-Range Weather Forecasts) have been providing public access to long-term weather and climate information for decades. A more recent initiative stems from organizations developing value-added open data products to advise society on the risk of changing climate. One recent example is the flood and fire risk in the United States developed by a non-profit organization First Street Foundation

Open Data for better Open Science

Scientific discovery and innovation stand to gain a tremendous amount from Open Data. This impact stems directly from the multiple inputs and methods developed for investigating problems. Specifically, three core components of Open Data drive this diverse scientific innovation and provide enormous societal and scientific benefits:

Validation:

Open Data that is easily accessible by other researchers allows for scrutiny, which helps discover mistakes more quickly and ingrains confidence that the research was conducted with sound and ethical principles and methods. Evidence-based progress is important in providing confidence in the scientific results and is important for the insights drawn to inform future research.

Data that has been reviewed, maintained and scrutinized by many, as well as informed by diverse consultation, drives robust and thorough scientific pursuits.

This validation process is a key component of reproducibility, which is important in building on prior research. Reproducibility is the cornerstone of pushing science forward, as it is the very baseline to check results and expand upon them by introducing new experiments and questions.

Transparency:

Building on the idea of validation and scrutinization, transparency facilitates this process. It allows for early engagement with the data and ensures the data was collected with sound and ethical principles (these will be elaborated upon in lessons 3 (Responsible Open Data) & 4 (The CARE and FAIR principles).

This transparency allows for early intervention if there are unexpected harms. This is where the idea of multiple perspectives becomes important again. Collaboration:

Open datasets are made available to all (see section Inclusivity in lesson 1) - which means new, robust insights are gathered at a faster pace as mistakes can be caught more easily, expensive data collection doesn’t need to be repeated, and researchers build upon the work of their peers. For example, the first image of a black hole; Scientists recently produced the first image of a black hole in our galaxy. This achievement was only possible through open collaboration and sharing of telescope data by different observatories distributed across different parts of the world [1].

The data isn’t limited to those within a specific field nor exclusive to those with institutional access. Importantly, this means the data can be shared with non-traditional academic researchers such as nurses, social workers, agronomists, journalists and other communities. This allows for researchers to also derive insights from varying perspectives.

The scope of research can be easily expanded to derive more holistic insights. For example, the Coupled Model Intercomparison Project (CMIP) that started in 1995 paved the way to understand how climate change was impacting our daily lives by investigating factors such as malaria distribution in Africa, infrastructure and urban design as well the implications of climate change on the risk of epilepsy [2, 3].

Collating similar data sets and performing meta-analyses on those data sets can provide a substantially improved signal that would not be possible in any one of these data sets. Additionally, this facilitates convergence across scientific disciplines, increasing the value of the research.

Open Data to support policy change

Open data can lead to policy change which directly impacts the lives of communities, such as those destined to suffer first from the slow changes to the Arctic. A study, taking advantage of the OpenStreetMap data [4], helped map projected changes in the Arctic. These mappings in turn helped emphasize the need for adaptation-based policies at community and regional levels to avoid stagnation of change in the light of a sudden and dramatically worse situation fueled by climate change.

Open Data in face of global emergencies

The COVID-19 pandemic demonstrated to the world, in real-time, how the collective movement of researchers sharing their data (such as sharing of coronavirus genome data [5]) can lead to an unprecedented number of discoveries in a relatively short amount of time. This directly impacted radical vaccine development efforts and the timely control of the COVID-19 infection [6]. These insights will continue to pay off, with this research spurring future developments.

Data sharing has many benefits and can aid access to knowledge. However, it is also important to bear in mind where the data has come from, who should have a say in its interpretation and use, and how the data can be shared responsibly, more on that in lessons 3 & 4.

Open Data and public engagement (citizen science)

A citizen scientist is a citizen or amateur scientist that will collaborate with professional researchers to help gather data on a broader spatial and temporal scale than the researchers might be able to achieve on their own [7, 8]. This outsourcing of responsibility helps members of the public engage in scientific pursuits that ultimately benefit them and allow research to be conducted on a grander scale than that might be possible with only professional researchers. Citizen science is gaining popularity, with increasing recognition as a valuable contribution to scientific advancements [9].

For example, volunteer citizen scientists in Beirut were recruited from 50 villages to help test water quality [10]. These volunteers were trained to be able to conduct the tests and in turn, not only was the data collected to inform the scientific advancements, the citizen scientists had the opportunity to learn to better manage their water resources and were able to improve conditions, creating a mutually beneficial interaction.

Open Data and decolonisation of knowledge

Free distribution of knowledge gives rise to increased participation in science. Open Data is central to fostering science that is inclusive and diverse, with direct and relevant benefits to impacted individuals and communities. This fostering is particularly important in the mission towards the decolonisation of knowledge [11].

In a world where knowledge can be a commodity, with currency in the form of published papers and hoarded datasets, exclusion from research can limit progress and negatively impact a community’s progress in a world driven by a knowledge-based economy.

Open Data, and its positive side effect of decolonisation of knowledge, promotes and benefits from diverse perspectives through purposeful inclusion of African, Latin American and other underrepresented Low and Middle Income Countries. This inclusion allows a dramatic change in who has access to work with and reuse data.

It can also become a powerful tool in the fight for visibility and credit. By fostering a global research culture of transparency and validation, where the work of underrepresented groups is celebrated and compensated, such as giving credit or much needed vaccines in exchange for the world-class genome sequencing in Africa, we will create a sustainable model that ensures under-represented countries are able to keep contributing towards a global revolution for example against infectious disease. It also gives marginalized groups such as women, under-represented communities, indigenous scholars, non-Anglophone scholars, as well as scholars from less-advantaged countries a voice in how the global and nuanced narrative of science is developed. This broad scale participation and inclusion shows respect to the involved people and communities and helps raise the profile of the research through considerate inclusion.

Having said that, Open Data has been demonstrated to further marginalize or exploit small-scale and community driven initiatives, such as in the case of African researchers neither receiving due credit nor compensation for their genome sequencing during the COVID-19 pandemic [12]. This is further explored in the next section as we introduce ways of mitigating harms that could happen via unthoughtful and irresponsible sharing of data.

Summary

Open Data which is purposefully inclusive and open to scrutiny, benefits scientific innovation by allowing for a more diverse and robust scientific process that draws on multiple perspectives. This also allows for the early identification of mistaken insights as well as early intervention for unforeseen harms to impacted communities.

Open Data allows non-traditional researchers to contribute to scientific development and bring their unique insights to the table. With these benefits in mind, we should always bear in mind that Open Data requires careful consideration of the possible downsides of making data open without due credit and consultation with potentially vulnerable and/or marginalized communities. The next lesson discusses important considerations for the responsible management, collection and use of open data by all stakeholders.

Assessment

Can you think of any examples where opening data might help you answer a question, or a question that will impact your community?

References

https://eventhorizontelescope.org/
https://oceanrep.geomar.de/id/eprint/12875/1/CMIP.pdf
https://doi.org/10.1002/epi4.12359
https://www.openstreetmap.org/#map=5/54.910/-3.432
https://www.nature.com/articles/d41586-021-00305-7#:~:text=Other%20researchers%20say%20that%20restrictions,while%20protecting%20data%20providers
https://www.nature.com/articles/d41586-020-01246-3
https://www.oed.com/view/Entry/33513?redirectedFrom=citizen+scientist#eid316597459
https://en.unesco.org/science-sustainable-future/open-science/recommendation
https://ecsa.citizen-science.net/
https://www.idrc.ca/en/book/contextualizing-openness-situating-open-science
https://zenodo.org/record/3946773#.YsFyqHbMJPb
https://www.nature.com/articles/d41586-021-01194-6