Open Science tools

Introduction to Open Science tools.

(What are Open Science tools? Why use Open Science tools? How do Open Science tools fit into the research lifecycle?)

This lesson is the first of OpenCore Module 5: Open Science Tools and Resources. This Module provides a collection of tools that are available to increase the visibility and discoverability of your project. It complements the previous OpenCore Modules (Ethos of Open Science, Open Data, Open Software, and Open Results) by enhancing the practical implementation of the Open Science concepts explained previously. While earlier modules focused on the concepts, advantages, and disadvantages of responsible Open Science practices, this module will focus more on the practical applications of responsible Open Science practices. We focus on a few key tools, and highlight how they fit across the research lifecycle.

In this first lesson, you will be introduced to the _What _and the Why of Open Science tools. First, we provide a definition of Open Science tools. Second, we discuss the differences between ‘open’ and ‘closed’ tools and highlight the advantages of using open tools. Third, we elaborate on the research lifecycle, and show how Open Science tools fit into a researcher’s project workflow.

What do we mean by “Open Science tools”?

We use the word “tools” to cover any type of resource or instrument that can be used to support your research. In this sense, tools can be a collection of useful resources that you might consult during your research, a software that you could use to create and manage your data, or even a human infrastructure, such as a community network that you could join to get more guidance and support on specific matters.

In this context, Open Science tools are any tools that enable and facilitate openness in research, and support responsible Open Science practices. It is important to note that Open Science tools are very often open source and/or free, but not necessarily.

What’s the difference between ‘open’ tools and ‘closed’ tools? Why use Open Science tools?

One can intuitively grasp the difference between open and closed in relation to the “tools”, thinking of openness in terms of exchange with the environment. One should bear in mind that it is not a black and white separation, but rather a spectrum of options.

When speaking of useful resources that you can re-use - such as text, visuals, audio, video - it is important to pay attention to the license on the possibilities and conditions for re-use. Lack of indication of a license leads to impossibility to re-use the material. As indicated in 🔗 Module 1 Ethos of Open Science, Lesson 5🔗, Creative Commons licenses is one of the most common set of open licenses given to written content of any kind, allowing re-use and requiring attribution, with a spectrum of openness, from least to most open (or CC0, equivalent to public domain).

Software can be proprietary (“closed”) or open source. It is called open source when the original source code is made freely available and may be redistributed and modified. Generally, software has a separate set of licenses designed specifically for code projects that covers both the open distribution of the code itself as well as executable versions of the program which non-programmers can run. More information and details on open software can be found in the 🔗Open Software Module🔗.

Human infrastructure refers to a network of relationships between stakeholders interested in the conduct and outcomes of responsible Open Science (more on those stakeholders can be found in 🔗Module 1, Lesson 3🔗). Communities – or groups of people who share a geographical location, affiliation, common interest, or practice – play a key role in the human infrastructure aspect of open science. As everything else, communities can vary in their degree of openness. A community can take the form of a mailing list, conference, meet-up or messaging app as a way to stay in touch. In that case, being open would imply that anyone could join the community and be welcomed to speak, decisions would be made transparent, and communications are largely public. On the other hand, a closed community implies that membership is restricted by invitation and/or a fee, resources and communications are not public, and decision processes are not necessarily transparent. More ideas on how to increase participation of stakeholders and how to build and lead inclusive communities can be found in 🔗Module 1, Lesson 3🔗 and 🔗this module, Lesson 4🔗.

Activity/exercise

Now let’s practice by looking at some typical case studies and solutions, reflecting on the benefits and obstacles of open and closed tools.

Case study #1: Closed vs open resources

Case study #2: Closed vs open software

You are a researcher who has been using a proprietary MATLAB platform to analyze data and create models. You are getting a new job, at a different institution. Unfortunately, the new workplace does not have a license for MATLAB, therefore you cannot access your own code and data, stored in the proprietary file formats, and moreover, cannot continue your routine workflow with analysis. What are your options now?

  • You can purchase individual license for this proprietary software, or persuade the institute to purchase a group or campus-wide license
  • You could consider using open source alternatives for programming and numerical computing, such as GNU Octave, Sage, or even Python programming language and its scientific packages. It would not only save you money now, but provide the continuity of the tool - if you move again, to a different institution.

Case study #3: Closed vs open communities

  • Example:

Open science tools provide numerous benefits, many of which have been discussed in the previous modules. For example, they can help you collaborate openly and share easily; organize and manage your work; track how your work is treated and shared; and follow leading responsible Open Science practices.

Open Science practices enable easier access to existing tools and resources that promote collaboration between professionals with similar interests and research objects. For example, someone in Asia wanting to study Central African rainforest species could visit an online species database made available by other scientists. Despite their physical distance, many reasons lead to inequality in access to scientific resources, from institutional barriers to paid content.

There are efficient and coordinated ways to share resources in general. One of them is using 📖version control 📖, which is a system to keep track of any changes made to one or more files over time. That also serves as a backup for your work.You might have already done that – for example, if you ever used Google Docs. It stores a version of your work as you type it, and you can invite other users to work collaboratively in the same document, keeping record of all changes made by all users.

One broadly used tool for version control is Git. It enables version control either online or on the user’s machine [see https://git-scm.com/]. Related services include GitHub, Gitlab, and Bitbucket. Information is stored in online repositories where people can clone, edit, and review each other’s content.

Another way to share your work is by using standardized 📖workflows📖. A standardized workflow is typically a sequence of steps commonly used for a given purpose, such as accessing and manipulating genomic data. A good open science practice, then, is to share those workflows in platforms such as https://galaxyproject.org/ – which allows any user to replay those steps right there for free, quickly and easily. That and other similar services enable you to show a step-by-step overview of what other researchers did, build on their work, and share your new ideas.

Including 📖metadata📖, the data that describes your data, can significantly enhance the findability of your research object. Some examples of metadata are the keywords associated with a publication, the time range and instrument name of a given observational data set, and the ORCID number for a given person. Metadata is a tool that search interfaces use to more quickly find a resource. In fact, Google uses a metadata language called ‘Schema.org’ to build its search algorithm (see https://schema.org/ for more information).

Many research fields have their own metadata standards (e.g. SPASE for space physics: https://spase-group.org/data/), but remember that each website you use has something similar behind that magnifying glass button. Taking the extra time to include some basic descriptors for your research object can make your contribution to your research field much more findable. The same way finding someone else’s work on the Internet might help you, making your own work more discoverable is a great contribution to Open Science!

Next, we’ll highlight how open science tools and resources fit in the research lifecycle.

How do Open Science tools fit into the research lifecycle?

The complex nature of research in the modern scientific community – involving multiple stages, steps, contributors, and stakeholders in the process – benefits from certain frameworks and definitions to structure, organize, and somewhat standardize the research process for the sake of responsible and reproducible practices.

The 🔗Open Results🔗 module introduced you to the definitions and nine stages of the research lifecycle and workflow. Let’s define these terms again.

  • Research framework
  • Research workflow
  • Research lifecycle

There is quite some theory behind the models for research frameworks, lifecycles, and workflows (REF), including linear, circular, multi-loop, and multi-step flows. For the sake of clarity and pragmatism of mapping the Open Science tools used within the research lifecycle, we will consider a concise 6-stage spiraling model for the research workflow, covering discovery, analysis, and writing as well as publication, outreach, and assessment (see Fig.)

Reference: Bosman, J., & Kramer, B. (2016). Of Shapes and Style: visualising innovations in scholarly communication. figshare. doi: 10.6084/m9.figshare.3468641.v1

Most steps of the research workflow are supported by online applications (Kramer and Bosman, 2016). These digital (Open Science) tools have actually influenced the way in which we perform and share research, opening it up to a global audience.

Open Science tools can be used for:

  • Discovery: Tools for finding content to use in your research
  • Analysis: Tools to process your research output, e.g. tools for data analysis and visualization
  • Writing: Tools to produce content, such as Data Management Plans, presentations, and pre-prints
  • Publications: Tools to use for sharing and/or archiving research
  • Outreach: Tools to promote your research

The usage of such tools by researchers across different disciplines has been surveyed and reviewed in several efforts (Kramer and Bosman, 2016, Bezuidenhout and Havemann, 2021). Numerous digital tools have been mapped on the “discovery, analysis and writing, publication, outreach, and assessment” stages of the research lifecycle (see Fig). As we saw in the previous section, all tools have varying degrees of openness. Purposefully choosing tools to use at each stage to increase transparency, findability, and reproducibility, you are able to construct and define your research workflow in alignment with responsible Open Science practices. As was discussed in Module 1, Ethos of Open Science, open should not be a thoughtless default or afterthought, but included into the design and inception of the research project. Your choice of Open Science tools can be individual, but most often it would benefit from group discussions within your research team, institution, and communities of practice.

Note: the concepts of workflow and lifecycle are widely used and applied to parts of the research, e.g. data. Data workflow, data lifecycle are discussed in depth in 🔗Lesson X of the Module Open Data🔗.

How do Open Science tools address responsible practices?

The 🔗Open Data and Open Results🔗Modules introduced the concept of FAIR principles and discussed how their application according to best practices can increase the visibility and uptake of our research.

Let’s refresh the terms:

  • FAIR Data Principles - Findable, Accessible, Interoperable, & Reusable. Wilkinson et al. (2016) provided FAIR Guiding Principles for scientific data management and stewardship; Hong et al. (2022) establish FAIR principles for research software.
  • CARE Principles - Collective Benefit, Authority to Control, Responsibility, & Ethics. Carroll et al. (2020) established the CARE Principles for Indigenous Data Governance, complementing the FAIR data principles.

Best practices to implement these principles include describing data using metadata standards and controlled vocabularies, assigning licenses, and uploading data to repositories that allow for creation of “📖persistent identifiers📖”. Examples of useful Open Science tools include:

  • Data Management Plan (DMP) tool, which allows you to create and share your data management plans to meet funder requirements and as a best practice for managing your data (link to website, to Lessons)
  • Data Repositories, which assign persistent identifiers to your data (example or link)
  • Tools for integration research management with DMPtool and repositories (example or link)
  • Communities - national and international, discipline-specific, or open science-centered - can be of incredible value in curating resources and building communities of practice for researchers and other stakeholders in adopting FAIR principles. Examples include the FAIR Data Forum https://fairdataforum.org/ and the Research Data Alliance (RDA) https://www.rd-alliance.org/

Working within the ethos of the FAIR and CARE principles can help to ensure that research is accessible, inclusive, ethical, and responsible. More about FAIR principles and practical steps to make your data FAIR can be found here: https://www.go-fair.org/fair-principles/

Self-Assessment: Questions for reflection:

  1. Assessment of your (open science) tools and resources

Most probably you are already using some tools and resources, even if you are new to open science practices. Here we invite you make a preliminary revision of them:

  • Think of all the tools and resources you use in your study/research/work and rely on - resources (content with text/media), software and communities. Think of all stages of your research - discovery, analysis, writing, publication, outreach and assessment.
  • Tools have varying degrees of openness, dictated by various factors. Imagine (or draw) the scale from 0 to 10, where 0 stands for completely closed and 10 for completely open.
  • For which of the tools (from categories of resources, software and communities) place it on the scale on a number that reflects the degree of openness.
  • How many tools do fall towards the lower part of the scale (0 to 4)? Take a moment to reflect if these tools are in line with your actual preference, goals and necessities in the long-term run.
  • Perform a quick search using search engine or this open dataset of Open Science tools (https://kumu.io/access2perspectives/dost#dataset) for more open alternatives (e.g. free, open source) and jot them down “for your information”.

In the next lessons we will introduce you to various tools, which you may not have heard yet. Stay tuned!