The Clinical Data Pipeline

The Analytical Healthcare Repository (AHR) is a collaboration framework to centralize the management of patient information for analytical purposes.

This article originally appeared on the BeyeNETWORK.

Until the recent adoption of electronic patient information systems, hospitals were unable to automate the analysis of detailed clinical data. Now the demand from research and physicians for data and analytical applications is overwhelming hospital IT departments. Researchers seeking Clinical and Translation Science Awards (CTSA) grants cannot effectively compete without access to systems that supply information about patient cohorts. Physicians and hospital administrators’ compensation is increasingly dependent on compliance to external measures such as P4P, JCAHO, and the Leapfrog Group. Systems to support quality and patient safety initiatives in hospitals are too primitive to keep pace with increasing demands. The critical fuel for both groups is high quality clinical data. Independent attempts to supply the data have failed because of redundant efforts that limit resources and result in fragmented data marts. The Analytical Healthcare Repository (AHR) is a collaboration framework to centralize the management of patient information for analytical purposes. The resulting reliable and efficient flow of data empowers researchers and clinicians to achieve higher performance goals. Researchers are able to publish leading results first, while clinicians deliver higher quality care, increase patient safety, and lower costs.

The demand for quality assessment and aggregate analytical tools in healthcare is growing faster than the available resources and tools to support them. Providers and researchers alike are encountering data access, analysis, and management barriers. The Analytical Health Repository (AHR) is a new approach to analytical healthcare applications that addresses the full spectrum of analytical needs by delivering a data warehouse comprised of electronic medical record (EMR) data, genomic information, and patient billing data with the capacity to simultaneously support applications for healthcare operations and clinical research.

EMR application designs closely resemble electronic versions of preexisting paper forms. While EMR applications are useful for individual patient data management, their architecture and data storage methods have little utility for aggregate analysis. Currently, few institutions have repositories for research patient tracking and only rudimentary reporting systems for quality management. Evolution in the healthcare industry such as the implementation of clear incentives and standards by hospitals, insurance pay for performance (P4P) programs, and public reporting such as JCAHO, have altered the roles of institutions, quality/safety standards, research goals, and general analytical reporting methods.

The challenge of developing comprehensive quality reporting has yet to be fully addressed because hospitals lack the necessary infrastructure and resources to support the diverse set of analytical applications needed for the management and distribution of clinical data. Analytical applications are becoming critical for hospitals to achieve their core missions, including implementing chronic disease management programs, improving the quality and safety of patient care, receiving grants for government funded research, improving financial outcomes from P4P contracts, and partnering with pharmaceutical companies for clinical trials. The AHR has the capacity to satisfy the data access requirements of providers and clinical researchers alike through a single development investment funded by both groups.

For example, translational medicine researchers lack efficient tools to find accurate research cohort information. Currently, individual patient data access is locked by HIPAAand institutional review board (IRB) standards and procedures. Time and money is wasted when researchers have to pass through a lengthy process to find compatible cohorts for their studies. When file access is granted, data is often missing, inaccurate, and/or reported in a manner that renders it useless for individual and aggregate research purposes. When electronic resources are available, multiple custom queries must be run directly against source systems to gather all the information needed, creating redundancy in research procedure. This can lead to patient safety issues including crashes in the mission critical electronic medical record systems, leaving hospital staff to operate offline without records for several days at a time.

Today much of this work is done manually and inefficiently, through individual reviews of patient charts and correspondences with providers. The U.S. Government recognizes that this inefficient process is costing tax dollars and has begun to provide grants through the CTSA program with the goal of helping institutions to build functional data repositories to increase the speed of innovation and reduce costs. Unfortunately, building a repository independently for research may both limit the value of the data for quality but also be redundant to a repository for clinical operations.

From research cohort identification to aggregate operations assessments and individual-level practitioner quality/performance analyses, the AHR is a design for centralizing data aggregation and cleansing for analytical applications to meet rapidly growing demands while consuming a minimum amount of resources. By consolidating hospitals’ clinical data across applications into a single multi-functional high quality data repository, the AHR resolves data access and aggregate analytics issues. An AHR warehouse is cost-efficient for hospitals and data warehouse developers.

Since the highest costs in analytics are the human resources required to cleanse and manage data interfaces, redundant efforts to accomplish this are very costly. Redundancy across teams is eliminated as data is integrated once into the AHR through a process of data cleansing, normalization, extraction, transformation, and loading (ETL). The AHR warehouse is designed to support the needs of multiple analytical requirements unique to healthcare including de-identification for research and assignment of patients to provider panels for quality projects. This eliminates the need to hire multiple vendors to construct individual custom developed warehouses for research and operational purposes. The AHR is also stored on one hardware system, eliminating the costs associated with purchase, storage, and maintenance of extensive redundant IT infrastructure.

Packaged adapters and custom interfaces are used to transmit data from clinical applications into the AHR. A multitiered, hierarchically based data security model allows report developers to meet regulatory requirements by limiting access to data pertinent to individual practitioners or clinics. A provider working for an operational purpose is permitted to search his or her patients on the individual level or in panels, and receive cumulative performance reports about their own patient care. Department heads, clinic administrators, and nurse practitioners can track all of the patients and providers within their clinic through security that assigns patients through provider and clinic relationships. As a result, Health Insurance Portability and Accountability Act (HIPAA) record access standards are satisfied and individual practitioners can be held accountable for personal quality performance.

The integration process is not a simple task – software must be developed to cleanse, normalize, extract, and transform the data from a variety of source systems. Given the amount of customization found in EMR deployments, a deep understanding of source systems – including data fields and values used by these source systems – is required for this process. Data analysts need to create repositories that are easy to query and obtain meaningful results. Therefore, constructing repositories requires the development of software to handle missing and inaccurate data, and to cleanse it of errors and inconsistencies. For example – clinical applications may each have separate coding systems for the same medication. The AHR represents the medication consistently with a single identification across data from multiple clinical applications and codes it using medical ontologies designed to standardize content within and across organizations. Data also needs to be standardized as current data repositories have misaligned data; two repository systems might have the same information, but in different formats. Initially, this might not appear to be a critical problem, but extraction and transformation can become a daunting task if the cleansing and normalization processes must be repeated so as to fixdata quality issues and move data into different data warehouses. In the simplest case duplicate data can cause errors in analytical programs that count records. In more complex scenarios, patients may actually be put at risk when a medication is recalled or a lab test is missing because information is fragmented across systems that fail to identify the patient as a single individual. Also, when multiple data warehouses are used, the hospital encounters a data quality problem where data in different stages of cleansing or normalization results in errors and/or mismatches. They provide different answers of varying quality, depending on the data mart queried or the method of requesting the data. A central data warehouse avoids conflicting data outputs and allows teams to identify the problems’ sources and fix them once for all users.

One reason that analytical applications are challenging in the healthcare industry is that the transactional EMR applications have been designed to handle a complex set of data to catalog human health issues across a wide variety of areas including inpatient, outpatient, specialties, and urgent care. As each discipline is complex, the systems that have been designed to handle them have focused on creating a record to manage the density of a single patient modeled in systems like Cache, which generate hierarchical or object oriented data sets for each patient. These types of data structures are more difficult to query directly in large volumes, in a manner similar to a relational database. Additionally, they are difficult to combine with existing relational database content.

In the process of directly reporting large volumes of data within transactional applications, they risk overload and crash. This causes a risk of system “down time,” which is costly to hospitals in terms of patient safety and lost productivity. Even healthcare applications with relational data models require care to avoid running analytical queries against production applications. Another benefit of an AHR strategy is that even the queries to extract data from production applications for use in analytical applications can be strenuous for the existing systems. With the consolidation of requests to a single extract for a variety of analytical needs, the load placed on mission critical EMR applications can be reduced and simplified. This makes it easier to create frequent requests that reduce the latency of data in applications that use the AHR as a primary data source so as to improve the potential impact on patient care from data that is closer to real-time.

AHR for Clinical Operations

As hospitals’ operations needs are multidimensional in nature, EMRs have been encouraged. National programs like Doctor's Office Quality – Information Technology (DOQIT) are encouraging the rapid adoption of EMRs and other technologies in large and small hospitals alike. The electronic data created through this transformation is available for analytical applications, but is not effectively leveraged into high value applications. The high level of priority and effort invested in launching and adopting EMR applications in recent years has consumed all available IT resources during deployment. Therefore, the full potential of the data in these systems is often not realized.

The result is that small teams with limited resources have built data repositories and ad hoc data sets like Excel spreadsheets and Access databases specific to their own needs that contain redundant overlapping functionality. The separate data management systems are implemented with differential characteristic designs and may meet the needs of a specific group, but create high operations costs and low reusability of the knowledge contained in them. Specialized databases address the needs of specific hospital departments but practitioners cannot assess their current aggregate performance, create comprehensive metrics dashboards, or develop proactive chronic disease solutions without aggregate assessment capabilities in a centralized way.

Chronic diseases account for seventy-five percent of the nation’s healthcare expenditures. While many issues contribute to the high cost of chronic disease treatment, it is often a problem of data mismanagement: EMRs have missing, old, or bad data. For example, a patient might be diabetic, but is not labeled so in a system. As a result, the patient is not placed in a chronic disease management (CDM) program and might have frequent emergency room visits. The cost incurred for ER visits is greater than proactive long-term management. While CDM programs are working to reduce costs, they are currently relying on insurers' billing information, which lacks clinical details. The AHR improves CDM programs by providing precise, up-to-date clinical data in concordance with billing information, and eliminates wasteful chronic disease costs acquired through expensive reactionary treatment cycles.

Quality reports are developed not only as individual practitioner dashboards, but as aggregate hospital performance reports as well. Quality reports devised from AHR data are applicable for meeting Joint Commission (JCAHO) accreditation requirements, reaccredidation metrics, reimbursement for Medicare/Medicaid, developing a strong client base, and building a positive reputation. New institutions can implement quality reporting at inception so as to meet the JCAHO standards for accreditation. As on-site assessments are the foundation of the accreditation process, new institutions can use quality reports to devise plans of action to meet the types of benchmarks JCAHO sets for maximum achievable expectations and respond to inquiries during site visits with rapid proof of clinical documentation. Joint Commission surveys focus on individual programs; therefore, the institution can devise multidimensional reports to target specific areas.

New and previously accredited institutions must submit periodic performance reviews, plans of action, evidence of standards compliance, and other reports to JCAHO for reaccredidation. Performance areas vary and include patient-focused functions such as treatment rates, infection control, and patient safety. ORYX long-term care, ambulatory services, organization functionality including leadership and human resources, and structures in staffing such as medical staff and nursing are target areas as well. AHR quality reports have the capacity to address these areas and more because they use current data at all levels of analysis from detailed patient charts to work flow processes, and highlight areas for improvement. Hospitals can address problems before reports become public as the AHR transforms data into a tangible proactive tool for change.

JCAHO has developed a dimension of forced visibility. Once surveyed, hospitals’ reports are available online for viewing by government agencies, the press, and the public. Bad scores tarnish public perception, leading to decreased utilization. Hospitals can be fined or disaccredited by the U.S. government’s hospital oversight agencies. Positive and laudable reports yield benefits for institutions including reimbursement for Medicare/Medicaid, increased attendance, praise from the press, and new avenues to hire better staff and obtain modern equipment. By having visibility into possible reporting errors before they occur such as incorrect patient risk adjustments affecting O/E (observed vs. expected) ratios, hospitals can correct data reporting systems before an irreparable public backlash comes from a negative report.

AHRs with quality assessment reporting capabilities have many financial rewards for hospitals as well. As the emphasis on P4P grows, so do standards for services and rewards for compliance. Providers and associations have large amounts of money at risk and no visibility into their achievement of goals to earn money. Hospitals often have no visibility into their current status relative to the P4P measure until their insurance partners inform them that they are unlikely to make the performance metric. This problem can be observed when hospitals hurry to meet a requirement such as eye exams for diabetics within a time-constrained and measured pay period and therefore, yield greater rewards from the P4P initiatives.

Data warehousing is also being used to improve hospital workflows. The data gathered is analyzed to find general administrative inefficiencies such as long wait times, resource capacity gaps, mishandling of data, improper methods of data accesses and/or documentation errors. Such assessments lead to administrative stewardship, fluid work flows, and quality service for patients.

Additionally, individual clinics can improve patient services through the development of closed loop EMR workflows. Using reports, clinic administrators can access all their patients’ records to focus on areas of risk or work for a specific time frame, and develop daily work schedules, proactively schedule visits for high risk patients, and monitor correspondences including patient visit notices, reminders, and test results based on analytics rather than depending on patients to schedule a visit. When integrated with the AHR, these applications solve many of the problems associated with at-risk patient management by alerting providers of overdue labs. With direct access to lab orders through the EMR and distribution of performance reports including individual patient information to providers, these applications help providers to isolate gaps in their care and take actions directly from the report.

AHR for Research

The AHR has multiple levels of utility for translational research. A visual query utility to generate aggregate figures from a hospital's AHR allows researchers to evaluate cohort participation potential and submit data requests faster to the IRB. The anonymous aggregate results eliminate the risk of litigation that results from the highly stringent patient confidentiality standards set by HIPAA and IRB. Access to this data lowers the cost of research cohort identification, recruitment, and tracking. In turn, researchers have more time to write grants, conduct experiments, and innovate faster.

Upon receiving approval to conduct a study, researchers use the AHR for population-based research, genomic research, clinical trials, and the development of new interventions and technologies. The data stored in EMRs is ideal for population-based and chronic disease studies for illnesses such as diabetes and asthma. The bio-informatics research-oriented applications are used for large-n analysis of patient data – a revolutionary methodology – to find cohorts, correlations, and monitor aggregate change. Research costs are reduced as a result of accessibility to better organized and standardized data; faster results lead to new developments, innovations, and solutions.

The growing field of genomic research is intrinsic to medicine. A central repository closes the gap between genomic and clinical data and makes it possible to combine genetic data with clinical phenotypes in an ethical manner. The potential role of the AHR in genotype profiling is enormous. For example, researchers can manage data for smaller clinical trials tailored to specific genes or profiles using cohort data gathered from the AHR. The data warehouse also opens the gate for advancements in pharmaco-genomics such as genotype-specific therapies where new patients are easier to locate for clinical trials and prescription/dosage regulation is matched to genotype.

Research applications leveraging the AHR infrastructure can also function to investigate IT intervention tools to develop evidence-based medical procedures and study provider behavior. A number of studies have been executed to determine the efficacy of reminders systems and advanced features in EMR applications. In order to determine whether patient outcomes were affected in control and intervention groups, the data must be processed by an analytical system. The AHR provides the needed data for this without development of new tools and data extractions. IT interventions can be devised where doctors receive feedback based on reports from AHR data as well. Given an evidence-based report on possible hypertension in a patient, preventative steps can be taken by a practitioner to help the patient prevent future heart attacks or strokes. Medication recommendations can also be developed based on specific evidence-based parameters where risk group patient populations can benefit from specific medications that are not listed in their medical records. They can be flagged to have a visit or call scheduled to review the benefits of the medication. These types of studies can lead to non-pharmaceutical interventions as well, so that when given different information technology tools, doctors achieve better patient outcomes.

Road Map for Change

The AHR is oriented to solve problems in integration and analysis by closing the gap of data inaccessibility for the people who need it most. Hospitals and research facilities need to analyze similar data, but the applications at their disposal are purpose-specific. In order to reduce the growing costs associated with caring for patients, a consolidated solution to make data more accessible for clinical analysis and research purposes is needed at both small and large institutions. This process might appear daunting, especially to smaller hospitals that are only beginning to implement EMR and research applications.

The road to better data accessibility and management is an incremental progression. Many data management projects have failed because their scope has been too broad to achieve initial value to stakeholders, leading to ultimate abandonment. To avoid this problem, institutions should approach an AHR as building a product one module at a time. The first step is to create a roadmap that allows teams to create strategic data marts and repositories, but in a small scale for specific high value applications with a high probability of success. They differ from ad hoc departmental systems because they are built with a plan for integration as the final goal rather than as isolated projects that only need to succeed on their own. Initially, data sources can be loaded as purpose-specific data marts to cleanse and report against a specific dataset. These smaller systems will initiate development of reusable ETL processes, the addition of new data sources, and allow for the incremental implementation of analytical applications. Once users have assimilated to new user interfaces and administrators have developed data management protocols, institutions can move toward the larger AHR by increasing data refresh rates and integrating the data sources into the central repository. When source latency decreases, an overall schema can be devised to implement tried and tested analytical applications that function as part of the overall solution.

An AHR solves data management issues and eliminates redundancy. An exorbitant amount of time and money is consumed by lengthy query processes that could be redirected to improve work flow, provide better care for patients, and develop new solutions for diseases. Hospital operations teams need to meet the growing demand for quality health services, and assessments of care cannot be made without consolidated patient data and analytical applications. Innovation among researchers has been unnecessarily hindered because of the lack of effective cohort-gathering methods and impediments to data accessibility that are independent of regulatory guidelines. As these two areas are inextricably tied in developing health care solutions, data must be transformed into a usable bank and made accessible to both parties within regulated guidelines. The analytical reporting applications that interface to an Analytical Healthcare Repository can revolutionize the way hospitals manage care, develop novel patient treatment methods, and help researchers study diseases, genetics, medications, and behavior.

  • Aaron Abend 
    Aaron Abend is the CEO and Data Warehouse Architect at Recombinant Data Corporation. Aaron has been a leader in data warehousing technology for over 20 years beginning in 1984 when he helped to develop the first data warehouse solution at Metaphor with Ralph Kimball. He is currently focused on applications for using clinical data repositories that manage data from hospitals and delivery networks to support translational research (CTSA) initiatives.
  • Anna Bogdanova 

    Anna Bogdanova is a Data Analyst and Oracle developer at Recombinant Data Corporation. Her current work is focused on the development of a translational research data repository for the University of Massachusetts Medical School.

  • Dan Housman 

    Dan Housman is the Business Intelligence Manager at Recombinant Data. He is an MIT trained bio-chemist with a history of creating packaged analytical applications for high-tech, finance, and pharmaceutical industries. His current work at Recombinant is focused on helping healthcare providers use available data from clinical and administrative systems to execute quality improvement initiatives.


Dig Deeper on Customer analytics