Responsible Open Data

Learning Objectives

  • Recognize open data that is created responsibly
  • Appreciate how to use data responsibly

Introduction

Data is a precious resource that should be shared whenever possible. As demonstrated in the previous lesson, dramatic improvements can arise from Open Data and the decolonisation of knowledge by ensuring sure data is open and available to all.

While Open Data benefits science in wonderful ways and already provides enormous benefits to society, the misuse and inconsiderate sharing of data can have far-reaching harmful effects. There may be also cases where the research data should not be collected nor shared publicly out of respect for the legal frameworks and communities needs. Understanding these potential harms requires reflection on the part of the research team and consultation with people and communities impacted by the research.

In this lesson, we introduce the concept of Responsible Open Data. These are points for consideration when thinking about making data open and managing it once it is open, as well as elaborating on ways for providing impacted communities the opportunity to drive the scientific narrative and the direct impact on their lives. In the next lesson, we will discuss a framework for actively engaging in and actioning these considerations in your research (CARE principles in lesson 4 - CARE and FAIR principles).

Empowering Individuals and Communities through Open Data

The needs of marginalized and underrepresented communities can and have been ignored with respect to Open Data. Communities that are the participants, or the main drivers of some types of data collection tend to be invisible when it comes to publishing as credit is taken by the bigger academic or institutional researchers.

Some of the notable factors that contribute to the exploitation of marginalized and underrepresented communities, oftentimes leading to disastrous outcomes including inappropriate use and sharing of data, include:

Lack of protective frameworks:

There are instances where it might not be appropriate to share data openly. For example, there are legal frameworks on a regional, national and international level to take into account; however, these might not always be sufficient to protect contributors and communities from exploitation. It is also important to note that there may be instances where no such frameworks exist, and people as contributors to the content of the data might be open for exploitation. In any case, whether a framework exists or not, careful, frequent, and ongoing communication and direct involvement of communities/contributors in any data decisions is needed, or a blanket ban should be assumed where consultation is not feasible.

Lack of equitable participation:

Open Data that is shared with due consideration and consultation allows impacted communities to take charge and guide research in a way that best suits their narrative, values and needs. It allows more autonomy in these communities to further their scientific development and to contribute to the larger field of open science.

Managing Research Data responsibly

Many research disciplines work with personal data that can be used to identify an individual (see [3]). This type of data cannot be shared easily, as data should be anonymized before doing so, and this is increasingly difficult in the current rapid state of development. New technical progressions may make it easier to recombine datasets and re-identify individuals. Some individuals or communities are more susceptible to exploitation, as described earlier.

The accidental detrimental effects of Open Data may extend beyond individuals and affect others; i.e., endangered species or natural resources that should be protected [4], for example; the local extinction of Goniurosaurus luii (Chinese cave geckos) in Vietnam was attributed to poaching activities which occurred shortly after data related to their discovery was published, this, in turn promoted a call for scrutinizing Open Data sharing practices in the field of biodiversity [5].

Additionally, research can be carried out in collaboration with industry, generating commercially sensitive data, which may place restrictions on what can be shared. Research can be used for harmful purposes (see Ethos, lesson 2) or pose a risk to (inter)national security.

There are several tools available that will help making decisions about what you can share publicly:

  • CARE and FAIR principles (lesson 4)
  • (inter)national laws that apply to data sharing (lesson 6 - Sharing Open Data)
  • Guidelines/policies set up by your discipline or research institute (lesson 6 - Sharing Open Data)
  • License restrictions (lesson 6 - Sharing Open Data)

Summary

In summary, you may not always be able to share the research data openly and there may be other responsibilities that are associated with managing the data if it has been made open. In such instances, the focus is placed on controlled and limited access with reuse in mind.

The CARE principles, presented in the next lesson provide a framework for responsibly collecting data with all stakeholders in mind. The FAIR (Findable, Accessible, Interoperable, Reusable) principles, also described in the next lesson, provide guidelines for this and allow you to share part of the data without necessarily disclosing all the data.

Assessment

  • Can you think of a specific example in which releasing data could lead to harm? Which people and/or communities might you consult to determine this and discuss remedies?

  • Example of how one can re-identify a person from shared data?

References

  1. https://researchsupport.admin.ox.ac.uk/governance/ethics/resources/consent#:~:text=Informed%20consent%20is%20one%20of,before%20they%20enter%20the%20research.
  2. https://www.internationalgenome.org/sites/1000genomes.org/files/docs/Informed%20Consent%20Form%20Template.pdf
  3. https://the-turing-way.netlify.app/reproducible-research/rdm/rdm-personal.html
  4. https://doi.org/10.1038/s41559-018-0608-1
  5. https://doi.org/10.1126/science.aan1362