GBC’s Executive Director considers how we assess the impact of the global biodata resource infrastructure

Apr 17, 2024

How do we assess both the impact of the biodata resource infrastructure on the delivery of life and biomedical science research programmes, and the societal benefits they generate? Both questions are critical to policy development and decision making around future investment into the infrastructure. Accurately assessing impact would be straightforward if we were able to track who is accessing the resources that comprise the infrastructure; what data they are using; and how they are utilising those data. However, the vast majority of biodata resources are, and should remain, completely open—not requiring users to identify themselves—so the “who” is always unknown. The tremendous quantities of data that are used and the challenge of connecting access to specific datasets or information with research outcomes, also preclude knowing the “what” and the ”how”. 

Consequently, the tracking of value from biodata, to knowledge and then to societal outcomes is challenging. Beyond assessing rates of access to websites and counting citations, it is typically impossible to put in place systematic monitoring. Rather, impact assessment must rely to a great extent on anecdotes, or impact stories, that describe particular outcomes. These can then be overlain with data and tool provenance information, where this is available, to create illuminating narratives. 

One such example narrative comes from the antiviral drug Molnupiravir,, which was repurposed rapidly during the COVID-19 pandemic for the treatment of SARS-CoV-2 infections. This narrative begins with some 30,000 coronavirus sequences that had already been deposited and made openly available in biodata resources before COVID-19. Over time, these records have contributed to our understanding of the basic biology of these viruses and, in particular, the role of RNA-dependent RNA polymerases in their replication. Identification of nucleoside analogues with the potential to disrupt viral replication  using protein and nucleoside structure records, complemented with open clinical trials data demonstrating the safety of using Molnupiravir in patients, surely allowed rapid screening of existing drugs, with Molnupiravir being identified as a potential treatment as a mutagen that disrupts viral replication, and then put rapidly to therapeutic use around the world.

Molnupiravir: an antiviral rapidly repurposed for COVID-19 with support from biodata resources. Structure source wwPDB, image from JSmol server St Olaf College, USA.

Along this chain of events, at least four open biodata resources would have been critical in making data available; these include the Worldwide Protein Data Bank, databases of the International Nucleotide Sequence Database Collaboration, PubChem and ClinicalTrials.gov. However, neither the biodata resources nor the data records within them capture these final outcomes and there is no systematic reporting mechanism for tracking usage and associated impact stories.

Impact assessment is essential but challenging for organisations involved in the biodata infrastructure parts of these value chains. These organisations are those that develop biodata infrastructure assets, such as data standards (e.g. the Genomics Standards Consortium and Global Alliance for Genomics and Health), aggregated biodata service providers (e.g. the Swiss Institute for Bioinformatics, EMBL’s European Bioinformatics Institute), developers of biodata resource sustainability models (Global Biodata Coalition) and tool and workflow developers (e.g. Galaxy, NextFlow communities). The ability to demonstrate impact is essential as it supports strategic planning, stakeholder engagement and opens access to funding to sustain these activities. It is challenging for the reasons given above—there is no systematic mechanism to capture and evaluate impactful outcomes. However, while each organisation that provides these assets has its own niche, all contribute in some way to the same overall impact value chain. There is great potential for the sharing of impact stories among the entities represented along the impact value chain and the Global Biodata Coalition is prioritising the assessment of impact among its strategic objectives.