In this first blog, the Global Biodata Coalition’s Executive Director, Guy Cochrane, looks at the recent completion of the Global Core Biodata Resource selection process and considers why this an important milestone, how the process was run, opportunities for future rounds of selection and what lies beyond the GCBRs.
Last Thursday saw the announcement of the results of the first Global Biodata Coalition’s Global Core Biodata Resource (GCBR) selection process. Identifying the 37 selected biodata resources—deposition databases, knowledgebases and their respective data services —as essential for life science and biomedicine research represents a major milestone towards sustainability for essential research infrastructure.
Research in the life and biomedical sciences has essential dependencies on data; all research generates new and/or consumes existing data. Biodata resources are used by scientists to manage, share, analyse and access their own data and those of other scientists, such as for molecular biology or biodiversity.
However, with no central coordination of biodata resources, this highly distributed infrastructure lacks planning for sustainability. Many key biodata resources that are used by scientists around the world—some as frequently as every few seconds—have no safe future beyond short-term grant funding. This threatens their potential to retain key staff, to invest in software engineering, and to procure appropriate hardware or cloud capacity.
The Global Biodata Coalition (GBC)—a coalition of research funding organisations—was established to develop and deploy models under which biodata resources could attain sustainability. GBC’s growing membership— currently 11 organisations including national funding agencies and foundations and a further 10 observer organisations exploring membership—actively works together to explore and drive sustainability.
So why is the GCBR selection process such a milestone?
Just as there is no overarching planning of the biodata resource landscape as a whole—they are typically established within the research community as the need arises—there is also no comprehensive description of which resources exist and how each is used. The GCBR concept defines biodata resources that are used heavily, regularly and generally across biology and biomedicine, are essential to providing content onwards to other biodata resources and whose existence is essential to researchers across the globe.
The GCBRs—in particular the Forum of GCBR managers which the GBC will establish—will help to define the needs of biodata resources, such as for skills, scientific community, technical infrastructure and funding. The Forum will key into existing work from further elements of GBC, such as the GBC Board working group on sustainability, which brings together representatives of funding organisations (GBC Members) with experience of supporting biodata resources to work towards models that enable international cooperation to bring sustainability to biodata resources.
The Forum of GCBR managers will serve a second function: it will help build the evidence to describe how data resources are used, how sustainable their activities are at the moment, what is at risk, how they are staffed, what their activities cost and much more. This evidence will serve not only to guide the development of sustainability models, but will also help funding organisations currently external to GBC to engage and—we hope—become a part of GBC.
A further value of the GCBR selection process is the clarity that it brings to funding organisations, institutions and individual scientists as they plan their policy and practices around the management of their scientific data. Those biodata resources on the GCBR list offer, for example, professional levels of service, community-aligned support functions and appropriate governance to be recommended and signposted in policy and data management planning, both for the deposition of newly generated data and the referencing of established data sets.Of course, the Forum itself will help to define exactly how it works and where it focuses; we look forward to meeting early in 2023 to discuss the work ahead and lay out the Forum’s agenda.
What was the GCBR selection process?
GCBR selection was based on open applications by the managers of biodata resources around the world. The two-stage process ran through 2022 and a panel of 57 expert reviewers assessed a total of 62 applications against a set of defined qualitative and quantitative indicators. The indicators fall into 5 categories: Scientific focus and quality of science; Community served by the resource; Quality of service; Funding, governance and legal infrastructure; Impact stories. These indicators form a profile that characterises the resource and enables identification via expert review of those data resources that meet the requirements for GCBR status.
At GBC, we are very grateful for both the hard work and dedication of the expert reviewers and the work of the applicants to put together well structured and detailed descriptions of their data resources.
There was, and remains, no predetermined notion of how many GCBRs should exist. The selection process was non-competitive and based on evaluation against the indicators; the more applicant biodata resources that performed sufficiently against the indicators, the more GCBRs there would be on the list.
The selected GCBRs are a high quality set of essential biodata resources. However, there exist many biodata resources that also provide important high quality services to specific user communities that do not qualify as GCBRs as they do not meet—nor should they attempt to meet—some of the criteria. For example not all biodata resources have within their remit the provision of global services to global communities; some biodata resources are established to store data results from a specific time-limited project; all biodata resources work through an early development phase before they are mature; some biodata resources focus on a specific subject domain. While Quality of Service is essential for GCBR status, the selection process is an assessment against the very specific indicators over all 5 categories.
Will there be future GCBR selection rounds?
We recognise that the list of 37 GCBRs from the first round of selection is not comprehensive. The selection process was application-led and it is likely that not all potential GCBRs applied. A lack of awareness of the process and complexities around preparing application information are to be expected at this early stage. We may hold a supplementary round of selection within the first cycle to provide a further opportunity for those who were unable to apply in the first round.
There will be additional opportunities further into the future: we plan future cycles of GCBR selection that will likely have expanded scope, such as the inclusion of biodata resources that are open, but operate controlled access for the protection of human research subject privacy.
Finally, we will put in place a review process for GCBRs. While we expect that the indicators will remain unchanged—or will at least evolve only gradually—the life cycles of the selected biodata resources themselves may move them away from GCBR status as research technologies and scientific interest evolves.
And what about biodata resources that are not GCBRs?
The GBC is not solely about GCBRs. The GCBRs are, by definition, those biodata resources that are foundational for the many thousands of biodata resources around the world. They include deposition databases feeding data onwards to biodata resources and knowledgebases upon which specific more specialist biodata resources build. As such, it is a priority to identify, stabilise and build sustainability for GCBRs, such that the entirety of the biodata resource infrastructure will be protected. Beyond the GCBRs, GBC will also actively engage further with those involved in managing biodata resources that are not expected to become GCBRs. Look out in the coming weeks for publications describing the GBC’s Global Inventory of Biodata Resources.
We encourage engagement with GBC and welcome contact with funding organisations, those involved in building and operating biodata resources and the interested scientific community. More details of GBC are on the web site (https://globalbiodata.org/) and please do get in touch by email (i) or Twitter (@globalbiodata).