It's easy to get the impression that the killer applications for big data technology are marketing and advertising -- especially with companies like Google, Yahoo and Facebook having invented the tools used today for crunching ginormous data sets that don't fit neatly into relational databases. The big data pioneers, after all, are right in front of us almost every day. And they make a great deal of money serving up targeted advertisements and predicting what we're likely to purchase.
There is certainly value in doing a better job of marketing and advertising, and anyone with a business can attest to that. But it's nice to step back once in a while -- amid all of the pop-ups and banner ads -- and ponder some of the greater ramifications that big data technologies hold in store for mankind. Two areas in particular where humans can expect great things are health and science, and one of the many people helping doctors lead the way is Cloudera co-founder and chief scientist Jeff Hammerbacher, who was interviewed recently by PBS' no-frills talk show host Charlie Rose.
A former head of Facebook's data science operation, Hammerbacher told Rose that he helped start Cloudera, which offers a Hadoop distribution and related tools for big data management and analysis, because he wanted to "build a company that was going to be the engine of production for robust open source tools that can be used to do science faster." Among other things, Hammerbacher and his team at Cloudera are currently working to provide doctors at Mt. Sinai Hospital with a scalable infrastructure for data storage and analysis. The goal is to help Mt. Sinai scientists do their jobs faster and at a lower cost.
"I'd like [doctors and scientists] to be able to use that infrastructure in the short term to improve the quality of health care delivery, lower the cost of health care delivery and potentially discover new therapeutics or diagnostics," he said. "At the very long term, what really draws me to the medical domain is an interest in understanding how the brain works and in particular how the brain breaks."
But for Hammerbacher, who himself has been diagnosed with bipolar and general anxiety disorders, the implications for big data reach far beyond the realm of mental health, which is another reason why he decided to create Cloudera.
For more on health and science
See how Columbia University Medical Center doctors are using steaming analytics to save lives
Find out how CERN physicists are using NoSQL data management technology
"I felt like there was a big bottleneck in a lot of different scientific labs, ranging from astronomy to high energy physics to oceanography," he said, "where they were generating tremendous volumes of data and they didn't have the software and hardware infrastructure to be able to capture and analyze that data effectively."
As a longtime tech reporter, I try to maintain objectivity and rarely go out of my way to sing the praises of founders of software companies. But I don't mind saying that I was genuinely impressed with Hammerbacher's commitment to scientific advancement. He reminded us that not everyone involved in big data initiatives is focused on advertising and customer analytics.
Inspired by the Hammerbacher interview, I did an Internet search to learn about more areas of health and science that stand to benefit greatly from big data analysis. Here are just a few examples of what I came up with:
- Stanford researchers used big data concepts to mine the text of doctors' notes in order to identify problematic reactions to prescription drugs. The researchers analyzed 18 years of notes on 1.8 million patients. Using information about symptoms and prescriptions, they were able to identify potentially adverse drug interactions.
- Astronomers working with the Kepler space telescope are collecting information about 200,000 stars every 30 seconds and then analyzing the data. The process has led to the discovery of numerous planets outside our solar system.
- And, as Hammerbacher noted, "agribusiness" companies like Monsanto and DuPont Pioneer are analyzing troves of data about climate, soil and other variables to determine the best geographical locations to plant specific seeds. That should lead to valuable new information about how to increase crop yields.
It's nice to know that big data is good for a great deal more than simply selling things.