News Stay informed about the latest enterprise technology news and product updates.

The Role of Business Intelligence in Clinical Trial Automation

Richard Phillips describes the automation of clinical trials and describes a solution that is mostly possible with today's technologies and standards - with one big "if."

This article originally appeared on the BeyeNETWORK.

Clinical trials are horrible things. Although Biopharma is led by scientific endeavor, and only has patient safety and therapeutic benefits in mind, the process of bringing drugs to market is long, complex, bureaucratic and, above all else, expensive.

(Note – we’ll use the word “drugs” within this article to fit in with our colloquial tone. Please substitute “biologics,” “devices,” or “superintelligent nanomachines” as appropriate for your interest.)

There are, of course, essential reasons for the rigor with which clinical trials are conducted, and we are not going to go into that history here. What is, however, an undeniable consequence of this rigor is that technology has not kept up. Looking at the trials process from outside reminds one of the fact that if one could find a living Neanderthal man, shave him and dress him in a suit, he would not look out of place on the proverbial Clapham omnibus. Clinical trials systems are like our cave-dwelling protagonist – relational databases and analytical software from the 1970s, hastily shorn and clothed in a GUI.


Possibly as bad – the processes around the execution and analysis of clinical trials are a slavish, box-ticking Sisyphean nightmare before we add in the problems caused by lead-footed tools and technologies.

Whether the chicken (of the processes) or the egg (of the tools) came first isn’t important any more. What is interesting is that the tools simply reproduce “the way things are done,” and the processes reflect “what the system needs us to do.” What results is Frankenstein’s monster, a “system” made up of incompatible parts – each designed for a fragment of the overall process.

What’s the Business Intelligence Angle?

Well, we’ll get to that. Through our work with certain technologies, we find ourselves supporting clinical trials operations, data management and analysis – each of which have distinct operational needs – but the linking factor is the tools in use. We’re not going to name any names, but the same tools commonly used by biostatisticians and data managers to do their jobs are also used in other industries as pure-play business intelligence (BI) tools.

Let’s look at one particularly troublesome piece of the process in slightly closer detail – the portion between “deciding to write a protocol” and “study lock.”


What’s at the center of this process is data – actually a flow of data, from patient / investigator, through capture, to management and analysis. But, which data is collected is a question answered by the nature of the science being done, defined in the study protocol. And how to analyze that data is described somewhat in the Protocol and, in greater detail, in its derivative, the statistical analysis plan. What we have here is a set of business requirements (the protocol), and a technical specification (the analysis plan).

However, clinical trials are led by medics, not software architects – so the possibilities presented by having this richly descriptive information already written down are lost to them.

Another detail we can exploit is that (as you might expect from such a data-intensive set of processes) there exist certain standards for data representation. These structured document representations (predominantly XML variants) are at various stages of maturity, but they do give us a framework for improving the process.

One standard of particular interest is the structured protocol representation. This is a structured (as the name suggests) document that completely and unambiguously describes a clinical trial – including entry criteria, study design, endpoints – everything that is needed to carry out the science properly. It’s not a stretch to say that once you understand the experimental design and know the data being collected, you have everything you need to carry out your analysis and package up the submission.

The Flies in the Ointments

Before we go off on a techno-utopian rant and are never seen again, it’s worth acknowledging that there is a serious flaw with this vision. That flaw is, of course, the people involved with it.

One of the first clinical trial process problems we tried to solve with technology was data capture – the interaction between the investigator executing the trial and the systems. In the distant past, we employed rooms full of data entry clerks, who would double-enter from paper CRFs which had been hand-coded by data monitors. We technologists, however, thought it would be much more interesting to create software thath would enable paper CRFs to be eliminated in favor of CRFs made of glass – to have the data entry burden shifted on to the investigator instead, while we polished our databases and puzzled over ways to apply ICD and MedDRA codes to the data that would surely surge through our systems in real time.

Eventually, and without going into detail about the problems this caused at the time, we are at a place where the next best step for data capture in clinical trials is to integrate our systems with existing EMR and LIM systems at the investigative sites. Now, as the proverb goes, we have two problems – how to assure the quality of that data and how to carry out the integration.  We need to be conversant with multiple standards and know how to map one representation on to another without losing information in the process.

Currently, then, we might as well acknowledge that there is an essential human component in qualifying and validating what is going on under the covers. With that said, let’s try to describe how the future of clinical trials might look.

The Brave New World

It’s 2012. In a poorly air-conditioned office just outside Philadelphia, approval to begin a Phase III study for an existing compound – the objective is a label expansion for a rare but distinct new indication.

The main indication’s Phase III protocols are scanned using a semantic search engine by the protocol author. She has already entered some relevant search terms describing the compound, study design and population. Similar documents are identified by the smart search and presented for review – a subset are marked as “hits,” and our author now creates a work area in the protocol design tool, reserves a protocol reference and marks the project as underway in the resource management system. The work area contains metadata about the study – the compound (linked to existing information elsewhere in the system), the indication, the high level study design.

The new protocol is pre-populated using this study metadata. Our author uses a wizard-like interface to work through the template, supplying information that backfills the global study metadata where needed. While she works, software agents with specific missions carry out analyses on the patient population in the global patient registry, supplemented of course by public EMR data supplied via a contract with Google Health.

When our author comes to describe the patient population, the medical economics software agent has pre-populated the protocol – with a margin note to the author detailing some incompatibilities between the available patient population and the protocol duration. This problem, and many others, are addressed by our author – with each modification captured by the change control system, and all changes tagged along the way using electronic signatures.

Once the protocol exits the approval cycle, we find that investigators have already been invited to participate based on clusters identified through analysis of the registry and confirmed by blind interrogation of EMRs.

Now skip ahead – the protocol is well underway, and patient recruitment is proceeding more or less as planned. As patients enter the trial, demographic, clinical and medical history information is automatically fed into the CTMS via secure protocols. Lab information likewise enters the system without manual programming, and terms and medications are coded using automated processes and standard dictionaries. Some trial-specific information needs to be collected, but this is done by integrating with site’s PHR system so as to not interrupt existing clinical processes.

Throughout this period, regular safety analyses are available to medical reviewers on demand. Since data capture is automatic, snapshots of complete QA’d data are frequent. Case histories are readily available (it’s just data, presented using style sheets, after all), and the extensive business rules repository allows contextual links to be created from data being viewed to all other relevant data – from medications to AEs, from labs to history.

At some point, the system determines that study endpoints are met, according to the protocol. Responsible parties are notified (probably via a printer in a basement spitting out a greenbar report, which then goes into internal mail). Final validation checks are carried out, and the appropriate signoffs are obtained.

With data in a stable state, the study database is locked. The final version of the structured statistical analysis plan is used to generate the analysis code, as well as the standard tables, figures and listings to support the submission. The code and data are crosslinked with the results generated, and version controlled in the study outputs repository.


While it’s fun for us to exercise our SF-writing chops, this is the point: a solution like this is mostly possible using technologies and standards that are available today. T his solution reduces practically everything to data integration – we can derive all downstream artifacts from the protocol alone, if the protocol is written properly and in such a way that we can exploit it using the machinery.

It turns out that this is the big “if.” The structured protocol representation standard is weak, and clearly not ready for prime time. As Winston Churchill said; “Give us the tools, and we’ll finish the job.”

What Next?

There is a certain amount of resistance to embarking on a program to develop a system like this. This resistance is born from the fact that the standards mentioned above are not fully baked – the perceived risk is that the standards might change, or end up not being standards at all. However, there are benefits to tearing the rug out from under the way things are currently done. These benefits exceed the risk associated with a small amount of reconfiguration to comply with different document standards. Think about this: add one day to the duration of a clinical trial, and it costs $1 million. Lose a day from your market exclusivity due to a delay in submission, and it could cost you $10 million. Then look at the duplication of effort and wastage that happens every day, on dozens of trials, on data management, programming and reprogramming. We think the case speaks for itself.

Dig Deeper on Business intelligence best practices