sudok1 - Fotolia

MIT project reveals way forward for health data analysis

Health data is notoriously difficult to analyze due to privacy rules, but a new project is showing it can be done.

Healthcare research is expensive and time-consuming, and the main reason why is the difficulties of collecting data. But a new initiative out of MIT is hoping to change that dynamic by making patient data more accessible to healthcare researchers.

The project, which is operating in partnership with technology vendor Philips, anonymizes data collected through Philips' eICU telehealth platform and makes it available to researchers via PhysioNet, a repository of open-source patient data built to support health data analysis.

It builds on a smaller initiative MIT runs out of its Institute for Medical Engineering Science. The MIMIC database collects anonymized datasets retrieved from Boston's Beth Israel Deaconess Medical Center. But MIT researcher Leo Celi said this project, while valuable, isn't as broad as researchers would like because it only draws data from one source. He said the new project will be more effective because it will collect data from all across the country.

"The main limitation of our approach has been that the data has been coming from one center," Celi said. "The type of collaboration that's required is deeper."

There's a huge opportunity for making data analysis a bigger part of healthcare. Right now many patients are treated as completely unique individuals, even though they might have many things in common with other patients who have previously been treated for the same condition. But doctors can't see what works in specific situations because most health data is never analyzed in a systematic way.

We need more guidelines for taking care of specific patients in specific contexts.
Leo Celiresearcher, MIT

This has a lot to do with health data privacy laws, including the Health Insurance Portability and Accountability Act, which places strict limitations on what doctors and hospitals can do with patient data. Some researchers have blamed these rules for the lack of analytics in healthcare.

At the same time, the data that currently exists could represent a treasure trove for researchers. Since patients are often treated differently, each patient visit essentially amounts to a clinical trial, the results being stored in patients' records. Opening up these records for health data analysis could speed up the pace of medical research.

But health data privacy laws aren't going anywhere, so any project to open up clinical data has to address this dilemma. "Privacy becomes an important issue. It can be possible to identify a patient" when patient data is used in research, Celi said.

The MIT initiative relies on anonymizing the data to protect patient privacy. Every new patient record that comes into the database is run through an algorithm that determines if any element of the record could identify the patient. Things like name, address and insurance account number are stripped from the record. It also removes any unstructured text. What's left is information on treatments the patient received and how they responded.

Celi said this is a major improvement on the quality of health data that is typically available for analysis. Generally it's much easier for researchers to access insurance claims data than actual clinical treatment data because these records contain less sensitive information. But they also don't have the level of detail on treatments that clinical notes have.

So far, researchers have used the data to develop improved guidelines for when patients should receive blood transfusions and recommendations on the effects of antidepressant drugs among ICU patients.

There are still risks. Two recent studies have shown that it is possible to still identify individuals from anonymized datasets. But Celi said the potential benefits from health data analysis are great, and data needs to be made more portable.

"We need more guidelines for taking care of specific patients in specific contexts," he said. "There should be these databases that are looking for those signals. If we can get that we'd be in a better position to recommend the best treatment for every patient."

Next Steps

How to analyze data to improve population health management

Four data analysis methods for clinical and health care data

Vendors get creative with personal data mining for health care

Dig Deeper on Business intelligence data mining