Appendix: Finding Open Data
The reusability of openly shared data relies on the prospects of it being found in the first place, therefore data findability is a key step in accessing and utilizing data. There are three major ways to find Open Data that are shared by researchers – repository, web search, and literature search.
Repositories
Ideally, Open Data should be available in repositories where the datasets are properly indexed and assigned a unique persistent identifier (as discussed in Lesson 6 – Sharing Open Data) thereby ensuring the data is unambiguously identifiable, searchable, discoverable along with associated metadata and documentations.
Therefore, the first step in finding Open Data related to your field is to identify discipline specific repositories (if there are any) and search for datasets there (see Lesson 6.4 – Repositories and Other Sharing Methods).
Find repositories in your field:
- Re3data.org is a global registry of research data repositories that covers research data repositories from different academic disciplines.
- FAIRsharing is a curated, informative, and educational resource on data and metadata standards, inter-related to databases and data policies.
- Recommended repositories by publishers (e.g., Recommended Data Repositories suggested by Scientific Data andPLOS One)
- World Data System represents a network of repositories.
Examples of generic repositories:
TheGeneralist Repository Comparison Chart is a tool you can use to decide where to store and share their FAIR data outside of their institutional repositories. Dataverse has also published acomparative review of eight data repositories.
Web-searches
To explore a wide variety of datasets from projects or popular topics, the use of a more general search engine can be helpful. Some disciplines or large institutions such as NASA and the National Institute of Health’s National Center for Biotechnology Information (NCBI) offer their own portal where you can search for their datasets, related publications and oftentimes tools for analysis (e.g., EMBL’s European Bioinformatics Institutehttps://www.ebi.ac.uk/ ). There are also an increasing number of international and national data portals to enable data discoveries.
Generic data search portals:
- Googlehttps://datasetsearch.research.google.com/
- Kagglehttps://www.kaggle.com/datasets
- Wikidatahttps://www.wikidata.org/wiki/Wikidata:Main_Page
- Open Data Network https://www.opendatanetwork.com/
- Awesome Public Datasetshttps://github.com/awesomedata/awesome-public-datasets#readme
Examples of Discipline specific:
- NASA Earthhttps://www.earthdata.nasa.gov/
- Cernhttps://opendata.cern.ch/
- NCBI National Center for Biotechnology Informationhttps://www.ncbi.nlm.nih.gov/
- EMBL’s European Bioinformatics Institutehttps://www.ebi.ac.uk/
- ISPCRhttps://www.icpsr.umich.edu/web/pages/
- International Monetary Fund https://www.imf.org/en/Data
- NOAA Climate Data Online https://www.ncdc.noaa.gov/cdo-web/datasets
- Federal Reserve Economic Research https://fred.stlouisfed.org/
- USGS EarthExplorer https://earthexplorer.usgs.gov/
- Open Science Data Cloud (OSDC) https://www.opensciencedatacloud.org/
- NASA Planetary Data System https://pds.nasa.gov/
Examples of National or international data portal
- US Federal datahttps://data.gov/
- EU Data Portalhttps://data.europa.eu/en
- WHOhttps://apps.who.int/gho/data/node.home
- THE WORLD BANK https://data.worldbank.org/
- DATA.GOV.UK https://www.data.gov.uk/
- UNICEF https://data.unicef.org/
Literature search
While not ideal, datasets are often attached to scholarly publications in the form of supplementary material, or referenced in text where to find them e.g. GitHub repository or personal/institutional websites. In addition, there are emerging journals and special collections/issues focused on describing and publishing data (e.g. Nucleic Acids Research database issueshttps://doi.org/10.1093/nar/gkab1195, Scientific Data, Earth System Science Data, etc.). In other words, while the datasets are openly available in these media, they are not properly indexed and therefore not very findable nor machine readable.
Finding academic publications can be a challenge in itself depending on the discipline and field of study. For instance, in life science and biomedical research, there are a number of repositories and search engines (e.g. PubMed, EuropePMC) indexing research outputs (e.g. publications, abstracts, references and communications) from various journals.
However in other disciplines (e.g. arts and humanities), search is often carried out with general search engines or research databases such as Google Scholar and JSTOR. In that case, it is advisable to reach out to library personnel and community members for further advice on where to find related literature and data, see lesson 5.4 Help section.
Generic:
- Google Scholarhttps://scholar.google.com
- Open knowledge map: A visual interface allowing the exploration of interconnected topics with relevant documents and concepts. https://openknowledgemaps.org/
- JSTOR a wide range of scholarly contenthttps://www.jstor.org/
- ResearchGatehttps://www.researchgate.net/search
Discipline specific:
- EuropePMC Life sciences https://europepmc.org/
- Pubmed biomedical literature https://pubmed.ncbi.nlm.nih.gov/
- arXiv is a free distribution service and an open-access archive for scholarly pre-prints in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics https://arxiv.org/
- Biorxiv Preprint server for biology https://www.biorxiv.org/
- EarthArXiv (https://eartharxiv.org) and Earth and Space Science Open Archive (https://essoar.org)
- ASAPbio provides a catalog of preprint servers https://asapbio.org/preprint-servers