Aggregating Data Sources - Enhancing Clinical Trial Results

Scientists in a lab

While data gained through clinical trials is significant at the individual trial level, its value increases exponentially when the information is proactively aggregated, shared, cleansed and analyzed.

By design, clinical trials reveal large volumes of data, adding to our universal knowledge of medical conditions and potential treatments. While the data’s importance is significant at the individual trial level, its value increases exponentially when the information is proactively aggregated, shared, cleansed and analyzed. Although there are challenges associated with this undertaking, the benefits could be considerable.

Barriers to Data Collation

With more than 334,600 clinical trials currently underway — up from around 100,000 in 2010 — the industry has evolved greatly in just one decade. During that same time, the rise of digitalization has allowed us to streamline processes and capture data more easily. As these two trends collide, it offers an opportunity to create a shared information repository. However, there are obstacles:

  • Most data sources exist in silos and are not easily shared
  • Data is stored in varying formats that don’t translate to or communicate with other formats
  • Cleansing, prepping and analyzing data is prohibitively time-consuming

I believe we can address these issues and establish a more productive and valuable resource for everyone.

Data Explosion

Clinical trials are incorporating data from so many sources today:

  • Genomic sequencing
  • Medical imaging
  • Laboratories
  • Wearables
  • Genetic data
  • Tissue banks
  • mHealth and patient-generated data

Disruptive technologies are paving the way to smaller, controlled clinical trials where, from patient enrollment to drug administration and follow-ups, everything is executed through a phone given to trial participants.

Some of the data allows us to drill down to the neighborhood level — clinical laboratories that have a network of testing locations across the country, for instance, can share bloodwork results by ZIP code without compromising patient confidentiality. Using this information, clinical trial sponsors can determine where best to locate certain trials based on the conditions prevalent in the patient population. Trialtrove and Sitetrove assist sponsors in selecting the right sites associated with the disease population they are studying.

Other sources of information are the Food and Drug Administration (FDA), Health and Human Services (H&HS) and TransCelerate, a non-profit pharmaceutical research and development organization.

Beyond clinical data, trial sponsors need to be able to incorporate operational data, financial data and real-world evidence (RWE) to get the most out of their research and development (R&D) investments.

Efficiently Sharing Information

All of this data, when aggregated, increases the statistical power of the R&D process, even potentially reducing the time to market and achieving cost savings.

Drug companies, even competitors, can realize benefits from pooling data together to better manage the drug development process. To accomplish this, they should invest in a robust electronic systems via e-source vendors such as Clinical Ink, Comprehend Systems or goBalto to capture and integrate electronic data from sites, clinicians and patients at the source.

Through this process, we can access multiple data points and reduce knowledge gaps to identify trends and outliers not visible at the site level. We can achieve a clearer, 360-degree view into the lifecycle of a clinical study and even determine new paths of study into what else could be learned.

This shared resource can also help assess the level of risk and create mitigation strategies to reduce risk.

Accessing Different Sources and Formats

One potential roadblock is that we have systems that can’t communicate with each other to facilitate data transfer. The data we provide comes in various forms from the source. It is then up to sponsors to define the format and use standard data sharing formats. Consider open standards and formats that are easy to reuse.

If you are using a different format during the collection and analysis phases of your research, be sure to include information in your documentation about features that may be lost when the files are migrated to their preservation format, as well as any specific software that will be necessary to view or work with the data. We need systems that can pool data and talk to one another. Clinical trials sponsors should determine if they can add these systems on to their existing infrastructure or purchase new systems with this capability.

Cleansing, Prepping and Analyzing Data

Recently, pharma companies have increased spending on Big Data analytics and artificial intelligence (AI) analytics modeling. Advances in data analytics and visualization enable researchers to explore and interact with large-scale, often aggregate, bodies of data from many sources. This data, combined with machine learning and AI, can help with the design and implementation of clinical trials, achieving higher levels of success.

These rapidly maturing tools can also help reduce the amount of time spent cleansing and prepping data. Using them to analyze data sets can help with more robust analysis and computations, making the data pool more helpful for users.


By looking at data through a more holistic lens, clinical trial sponsors can be more proactive by helping to identify challenges before they occur. By contributing to a larger body of knowledge, they can better access a more nuanced data picture and realize the full potential of their work. Completing the circle, clinical trials themselves can also benefit from tapping into multiple sources of data, strengthening their results, reducing risk and expense and even increasing speed-to-market.

Want to learn more? Contact Actalent now.

Relevant Insights