Responsible Open Data
Learning Objectives
- Recognize open data that is created responsibly
- Appreciate how to use data responsibly
Introduction
Data is a precious resource that should be shared whenever possible. As demonstrated in the previous lesson, dramatic improvements can arise from Open Data and the decolonisation of knowledge by ensuring sure data is open and available to all.
While Open Data benefits science in wonderful ways and already provides enormous benefits to society, the misuse and inconsiderate sharing of data can have far-reaching harmful effects. There may be also cases where the research data should not be collected nor shared publicly out of respect for the legal frameworks and communities needs. Understanding these potential harms requires reflection on the part of the research team and consultation with people and communities impacted by the research.
In this lesson, we introduce the concept of Responsible Open Data. These are points for consideration when thinking about making data open and managing it once it is open, as well as elaborating on ways for providing impacted communities the opportunity to drive the scientific narrative and the direct impact on their lives. In the next lesson, we will discuss a framework for actively engaging in and actioning these considerations in your research (CARE principles in lesson 4 - CARE and FAIR principles).
Empowering Individuals and Communities through Open Data
The needs of marginalized and underrepresented communities can and have been ignored with respect to Open Data. Communities that are the participants, or the main drivers of some types of data collection tend to be invisible when it comes to publishing as credit is taken by the bigger academic or institutional researchers.
Some of the notable factors that contribute to the exploitation of marginalized and underrepresented communities, oftentimes leading to disastrous outcomes including inappropriate use and sharing of data, include:
Lack of protective frameworks:
There are instances where it might not be appropriate to share data openly. For example, there are legal frameworks on a regional, national and international level to take into account; however, these might not always be sufficient to protect contributors and communities from exploitation. It is also important to note that there may be instances where no such frameworks exist, and people as contributors to the content of the data might be open for exploitation. In any case, whether a framework exists or not, careful, frequent, and ongoing communication and direct involvement of communities/contributors in any data decisions is needed, or a blanket ban should be assumed where consultation is not feasible.
Lack of proper informed consent:
Informed consent is an essential step in ethical research practices and is a responsibility for researchers to fulfill before the research takes place. Informed consent allows participants to participate fully, with a complete understanding of the research, without coercion or undue influence. This consent can be withdrawn at any time, without consequence [1]. While an exceptionally important component of science and open science in general, the exact requirements for obtaining informed consent are highly discipline specific and understanding these nuances are beyond the scope of this work.
With this in mind, it is important to understand that even if one has obtained true informed consent, it is not a once-off action. It requires consultation and education. This is important in the context of data being put online for use and reuse - especially seeing as research and its impact changes over time, and as such, communities could be opened up to unexpected harms in the future. Therefore measures need to be in place so that this consent can be withdrawn or altered without consequence to the communities at risk. This understanding needs to be ensured, as a lack of understanding can be demonstrated in the open data 1000 Human Genomes consortium’s consent form [2]: the consent form has a passage most don’t catch, but open themselves to biocolonialism by agreeing to have their blood samples used for an unlimited supply of DNA.
Lack of equitable participation:
Open Data that is shared with due consideration and consultation allows impacted communities to take charge and guide research in a way that best suits their narrative, values and needs. It allows more autonomy in these communities to further their scientific development and to contribute to the larger field of open science.
Managing Research Data responsibly
Many research disciplines work with personal data that can be used to identify an individual (see [3]). This type of data cannot be shared easily, as data should be anonymized before doing so, and this is increasingly difficult in the current rapid state of development. New technical progressions may make it easier to recombine datasets and re-identify individuals. Some individuals or communities are more susceptible to exploitation, as described earlier.
The accidental detrimental effects of Open Data may extend beyond individuals and affect others; i.e., endangered species or natural resources that should be protected [4], for example; the local extinction of Goniurosaurus luii (Chinese cave geckos) in Vietnam was attributed to poaching activities which occurred shortly after data related to their discovery was published, this, in turn promoted a call for scrutinizing Open Data sharing practices in the field of biodiversity [5].
Additionally, research can be carried out in collaboration with industry, generating commercially sensitive data, which may place restrictions on what can be shared. Research can be used for harmful purposes (see Ethos, lesson 2) or pose a risk to (inter)national security.
There are several tools available that will help making decisions about what you can share publicly:
- CARE and FAIR principles (lesson 4)
- (inter)national laws that apply to data sharing (lesson 6 - Sharing Open Data)
- Guidelines/policies set up by your discipline or research institute (lesson 6 - Sharing Open Data)
- License restrictions (lesson 6 - Sharing Open Data)
Summary
In summary, you may not always be able to share the research data openly and there may be other responsibilities that are associated with managing the data if it has been made open. In such instances, the focus is placed on controlled and limited access with reuse in mind.
The CARE principles, presented in the next lesson provide a framework for responsibly collecting data with all stakeholders in mind. The FAIR (Findable, Accessible, Interoperable, Reusable) principles, also described in the next lesson, provide guidelines for this and allow you to share part of the data without necessarily disclosing all the data.
Assessment
Can you think of a specific example in which releasing data could lead to harm? Which people and/or communities might you consult to determine this and discuss remedies?
Example of how one can re-identify a person from shared data?
References
- https://researchsupport.admin.ox.ac.uk/governance/ethics/resources/consent#:~:text=Informed%20consent%20is%20one%20of,before%20they%20enter%20the%20research.
- https://www.internationalgenome.org/sites/1000genomes.org/files/docs/Informed%20Consent%20Form%20Template.pdf
- https://the-turing-way.netlify.app/reproducible-research/rdm/rdm-personal.html
- https://doi.org/10.1038/s41559-018-0608-1
- https://doi.org/10.1126/science.aan1362