Healthcare research is expensive and time-consuming, and the main reason why is the difficulties of collecting data. But a new initiative out of MIT is hoping to change that dynamic by making patient data more accessible to healthcare researchers.
The project, which is operating in partnership with technology vendor Philips, anonymizes data collected through Philips' eICU telehealth platform and makes it available to researchers via PhysioNet, a repository of open-source patient data built to support health data analysis.
It builds on a smaller initiative MIT runs out of its Institute for Medical Engineering Science. The MIMIC database collects anonymized datasets retrieved from Boston's Beth Israel Deaconess Medical Center. But MIT researcher Leo Celi said this project, while valuable, isn't as broad as researchers would like because it only draws data from one source. He said the new project will be more effective because it will collect data from all across the country.
"The main limitation of our approach has been that the data has been coming from one center," Celi said. "The type of collaboration that's required is deeper."
There's a huge opportunity for making data analysis a bigger part of healthcare. Right now many patients are treated as completely unique individuals, even though they might have many things in common with other patients who have previously been treated for the same condition. But doctors can't see what works in specific situations because most health data is never analyzed in a systematic way.
We need more guidelines for taking care of specific patients in specific contexts.
Leo Celiresearcher, MIT
At the same time, the data that currently exists could represent a treasure trove for researchers. Since patients are often treated differently, each patient visit essentially amounts to a clinical trial, the results being stored in patients' records. Opening up these records for health data analysis could speed up the pace of medical research.
But health data privacy laws aren't going anywhere, so any project to open up clinical data has to address this dilemma. "Privacy becomes an important issue. It can be possible to identify a patient" when patient data is used in research, Celi said.
The MIT initiative relies on anonymizing the data to protect patient privacy. Every new patient record that comes into the database is run through an algorithm that determines if any element of the record could identify the patient. Things like name, address and insurance account number are stripped from the record. It also removes any unstructured text. What's left is information on treatments the patient received and how they responded.
Celi said this is a major improvement on the quality of health data that is typically available for analysis. Generally it's much easier for researchers to access insurance claims data than actual clinical treatment data because these records contain less sensitive information. But they also don't have the level of detail on treatments that clinical notes have.
So far, researchers have used the data to develop improved guidelines for when patients should receive blood transfusions and recommendations on the effects of antidepressant drugs among ICU patients.
There are still risks. Two recent studies have shown that it is possible to still identify individuals from anonymized datasets. But Celi said the potential benefits from health data analysis are great, and data needs to be made more portable.
"We need more guidelines for taking care of specific patients in specific contexts," he said. "There should be these databases that are looking for those signals. If we can get that we'd be in a better position to recommend the best treatment for every patient."