In a Perspective article in the New England Journal of Medicine (NEJM) five years ago (9 April 2015), Bill Gates wrote:  “Perhaps the only good news from the tragic Ebola epidemic in Guinea, Sierra Leone, and Liberia is that it may serve as a wake-up call: we must prepare for future epidemics of diseases that may spread more effectively than Ebola.”  He elaborated that the lack of preparation for Ebola cost the world significant time, and he warned that such delays in response to a future epidemic could “result in a global disaster”.  We are now experiencing the unfortunate truth in these prescient words.

While healthcare data sharing continues to be a challenge, global initiatives centered around sharing of research data have been progressing for more than two decades.  My repeated recommendation, during my tenure on the U.S. Health IT Standards Committee, was to look at research as an example. Albeit a smaller scope, there are lessons from clinical research (including public health) that have already proven to be valuable and will no doubt continue to play out in a positive way during the current pandemic.  Research holds an essential position as the efferent arm of a learning health cycle.

Global collaboration in the area of clinical research will be essential to identifying safe and effective cures for COVID-19 and a vaccine to protect against future outbreaks.   Seth Berkley, CEO of Gavi, recently commented in Science magazine (27 March 2020): “If ever there was a case for coordinated global vaccine development effort using a ‘big science’ approach, it is now.  There is a strong track record for publicly funded, large-scale scientific endeavors that bring together global expertise and resources toward a common goal. The Manhattan Project during World War II didn’t just bring about nuclear weapons quickly; it led to countless changes in how scientists from many countries work together.”

Having grown up in Los Alamos, New Mexico as the daughter of a radiochemist, the Manhattan Project holds personal significance.  This is an unprecedented example of leadership by the U.S. to halt the global consequences of continued war.  Indeed, there was incredible collaboration, primarily among scientists on site at the national laboratories. However, the Manhattan Project was shrouded in the necessary secrecy essential for its success, and information sharing was ‘paper-based’. We should indeed learn from the Manhattan Project and many feel the U.S. should be a leader in finding a cure for COVID-19; however, this pandemic is a different threat, secrecy has no place in solving this crisis, and we now have moved well beyond paper with super-computers.  A ‘big science’ approach is essential.

Electronic health records were introduced in the 1990s, after which adoption was encouraged in the U.S. through ‘Meaningful Use’ and in other countries around the world through their respective initiatives.  Sadly, EHRs to this day continue to store healthcare data in disparate, varied and often proprietary formats; thus, interoperability is sparse within the U.S. and is certainly not global.  There have been glimmers of hope for common data exchange standards methods through which EHRs could potentially support public health and research directly. One example is the IHE RFD profile (cited in the 2010 HITSP Interoperability Specification #158 for the use case to support public health and research using EHRs); RFD was actually showcased at prior HIMSS event as a rapid method to report outbreak data from EHRs to the Centers for Disease Control. Regretfully, the HITSP work was sidelined for political reasons and the replacements have not yet caught pace.  The future of interoperability for EHRs is now deemed to be HL7 FHIR, but FHIR resources to support research are still in development.

In fact, our most reliable initial source of information in the U.S. for tracking the COVID-19 cases and deaths has been Johns Hopkins University through its Center for Systems Science and Engineering. Their website states:  “Our primary data source is DXY, an online platform run by members of the Chinese medical community, which aggregates local media and government reports to provide COVID-19 cumulative case totals in near real-time at the province level in China and country level otherwise…….From January 22-31 the entire data collection and processing was managed manually……As the outbreak evolved, the manual reporting process became unsustainable, and on February 1, we adopted a semi-automated living data stream strategy.”  Clearly, a global standard to facilitate this healthcare data exchange use case is warranted.

Concurrently, during the 1990s, clinical researchers began to adopt electronic data capture (EDC) to replace 3-part NCR paper forms.  This generated significant discussion and technology development to be able to collect and exchange data electronically while adhering to regulations (established by FDA ana analogous regulators in Europe and Japan) that are designed to ensure accurate and trustworthy information upon which they base their decisions to approve therapies that are safe and effective.  The provenance and integrity of the clinical research data are of utmost importance, especially since the health and lives of patients (i.e. each of us) are at stake.  Efficiently collecting high quality data from each patient and submitting the aggregated and analyzed set of data (across all patients in a study) necessitated data content and exchange standards.  The Clinical Data Interchange Standards Consortium (CDISC) was founded in 1997 around the vision of addressing these needs globally.  A foundational set of global data standards to facilitate the collection, exchange and reporting of clinical research data was developed during the subsequent decade.

The foundational CDISC standards (which supported data that are used for all research studies such as demographics) were then augmented with standards that are specific to therapeutic areas (TA) such as diabetes and asthma.  By 2017, CDISC had produce a vast body of knowledge and content standards to support diseases that affected ~ 2 billion individuals worldwide.  CDISC standards are required by regulators in the U.S. and Japan (and endorsed by other regulators) to facilitate reviews of data in support of new therapies.  CDISC standards are used by global companies, if not for data collection (which is where they add the most value), at least for data reporting when these data support an application for regulatory approval of a new therapy.  The Global Academic Research Organization (ARO) Network, initiated in Japan, has committed to CDISC standards and harmonization.

When Ebola began to ravage Africa, I called leaders at Oxford University who were involved in trying to halt this outbreak (and were collaborating with WHO) to offer CDISC assistance in developing a standard that could facilitate the meaningful exchange of data for Ebola clinical research and facilitate comparisons across studies in search of cures and vaccines. CDISC leaders formed a team with the Oxford clinical trialists and many volunteers; an Ebola therapeutic area standard was developed. “There are many different actors involved in tracking, diagnosing, treating and containing an outbreak. Sharing information across these disciplines is critical to understand and respond to a disease outbreak, and is particularly important in the case of Ebola which has such devastating consequences.” said Laura Merson, Associate Director of the Infectious Diseases Data Observatory (IDDO), and a clinical trial expert who coordinated the project on behalf of IDDO. “The CDISC Ebola data standard is a significant step forward that will enable a more rapid cross-disciplinary response which can reduce the impact of the next epidemic.”  Laura Merson gave a compelling presentation at the subsequent European CDISC Interchange on the topic of “Raising the Standard for Global Collaboration in Emerging Infection”.

Three years later, I am very pleased to commend the CDISC Chief Standards Officer, Peter Van Reusel, and his stellar team of data standards development experts, for their commitment to develop a COVID-19 standard.  Leveraging the Ebola standard and other CDISC standards for vaccines and virology, this team has a head start and has announced a goal to develop an interim TA standard/user guide in a record 6 weeks.  Based on CDISC Business Case metrics, this standard will significantly reduce the time and effort involved in starting up and sharing/comparing data across clinical research studies to test potential therapies and vaccines.  It will also facilitate the regulatory reviews of such data when evaluating submissions for approvals of new products or approved products for a new indication.

In the meantime, case report forms have already been developed (based on the Ebola standard) and posted by WHO, Oxford’s IDDO and the International Severe Acute Respiratory and emerging Infection Consortium (ISARIC) so that studies can begin to collect data now in a format that will lend itself to aggregation of data across patients, analyses and comparisons among potential therapies.  At least one EDC company is making their technology available: “Castor has joined the global fight against the Coronavirus by making our research data capture system available for free for all COVID-19 research projects. As of April 5, Castor is supporting more than 100 COVID-19 studies across 10 countries. We have developed ready for use eCRFs based on the WHO standard CRFs, to help researchers start their study or registry in less than an hour. The lack of quality data should not be the reason for a delay in developing vaccines and treatments.

These clinical research best practices, standards and technologies are informing and facilitating a number of collaborations.  A U.S. National COVID Cohort Consortium (N3C) was recently announced. They will leverage work that has been led by FDA and NIH/NCATS over the past 3 years to harmonize common data models (CDM) across research networks (PCORNet, OHDSI, i2b2/ACT and Sentinel) to facilitate access to ‘real world data’ from EHRs to support data sharing and regulatory decisions. Elligo Health Research has participated since the inception of this project. Data provided through the N3C in one of the CDMs will be harmonized, with a goal to have results in CDISC format for submission to FDA.

A global coalition has also been formed to accelerate COVID-19 clinical research in resource-limited settings.  This coalition includes representatives from ~ 20 countries. From the U.S.—Baylor Medical College (which contributed to the CDISC data collection standard, CDASH), Harvard (Paul Farmer) and Partners in Health, along with the Bill and Melinda Gates Foundation (which supported development of TA standards such as malaria).

When considering the elaborate technologies and robust computing power we now have available, including computer modeling and AI, it is important to remember that the models, AI and machine learning are only as good as the underlying data.  From a June 2014 NEJM Perspective, which I wrote with Innovative Medicines Initiative leader Dr. Michel Goldman, The precise format of the data to be shared cannot be an afterthought. In an era of increased transparency and integrative analyses of data from multiple origins, data standards are essential to ensure accuracy, reproducibility, and scientific integrity. Their use will help in fostering innovation—and thereby in honoring the sacrifices of research participants everywhere.”

While we struggle to access and comprehend the vast amount of data on COVID-19 patients, it is time to take advantage of the lessons, knowledge and global standards that are poised to streamline the research that will be necessary to rapidly evaluate the safety and efficacy of therapies and a vaccine essential to curtailing this pandemic and avoiding a recurrence.  Healthy competition will encourage innovation to develop products, and global research data standards enable innovation.  While healthcare sorts out what it means to standardize, harmonize and share EHR data, research can lead by example. Global collaboration and data sharing are at the heart of the solution.

Rebecca D. Kush, Ph.D. – President, Catalysis; Chief Scientific Officer, Elligo Health Research, Fellow,  Translational Research Center for Medical Innovation (Japan), Founder and President Emeritus, CDISC