Guide to big data analytics tools, trends and best practices
A comprehensive collection of articles, videos and more, hand-picked by our editors
Like many other organizations that have embarked on big data programs, healthcare services provider UPMC sees the flood of information it's generating as a blessing and a curse. "We're both drowning in big data and starving for it," said Lisa Khorey, vice president of enterprise systems and data management at the Pittsburgh-based organization.
We spent a lot of time designing this architecture and then picking the [technology] elements that would fulfill each job.
VP of enterprise systems and data management, UPMC
But UPMC is 20 months into a five-year plan to harness a wide variety of that data for analytics uses -- and to bring in the big data management and analytics technologies needed to support the effort. In fact, company officials made a conscious decision not to invest in all of the required tools up front, said Khorey, who took part in a panel discussion on big data trends at the Oracle Industry Connect conference in Boston last week.
"I don't think you have to buy it all on day one," Khorey said in an interview after the panel discussion. She added that UPMC's chief financial officer encouraged the phased approach by telling the big data project team "not to overload the buggy." The health system did select an initial set of hardware and software at the outset, including Hadoop and products from Oracle, IBM and Informatica. It plans to add predictive analytics tools this summer at the initiative's two-year mark; prescriptive analytics technologies will follow 18 to 24 months later.
Khorey noted, though, that project advocates got executive backing and organizational support for the full five-year program before beginning any deployments. That was crucial to making the three-step technology selection process work: She said the prudent approach wouldn't be feasible "if we had to re-justify everything each year, because that takes a lot of energy."
Sweating the big data technology details
UPMC, a sprawling organization that operates 22 hospitals and about 400 outpatient facilities, also developed the plan for its big data systems with clinical precision, according to Khorey. "We spent a lot of time designing this architecture and then picking the [technology] elements that would fulfill each job," she said. For example, a Hadoop cluster is being used to capture and stage data on its way to a data warehouse; in addition, data discovery tools can be run against the Hadoop data to find relevant information for planned analyses.
The company didn't set up a formal committee to evaluate and select the big data technologies, but Khorey said a cross-functional group has been involved in developing the business requirements and technical specifications, as well as assessing the available options. IT is at the head of the table on that process, she said. But physicians and representatives from UPMC's life sciences operations also have a say on the technology plans and decisions.
The end goal is to enable collaborative analysis of genomics data and information on patient outcomes, physician performance, the cost and quality of care, and other metrics -- all in an effort to improve treatment and care delivery. "This is completely about outcomes, outcomes, outcomes," Khorey said. "We're seeking a scientific orientation so we practice [healthcare] based on measurements."
Thus far, UPMC has built the big data infrastructure, captured some initial data sets and run several proof-of-concept projects. Planned next steps include working to prove that the analytical processes can be repeated across different data sets and starting to deploy data and self-service analytics tools for use by business analysts, data scientists and other end users. Starter sets of clinical and cost data are due to be made available in June, and Khorey said there will be "constant data landings" over the next few years as the program proceeds.
Initial big data deposit likely to lead to more
Canadian Imperial Bank of Commerce (CIBC) is also in the early stages of a big data analytics program. The Toronto-based bank is testing marketing analytics, fraud detection and financial risk assessment applications. As part of the pilot projects, it's working with various vendors and "playing around with different technologies," said Sam Dotro, CIBC's executive director of enterprise architecture. That includes Oracle's Big Data Appliance, Cloudera's Hadoop distribution, and a mix of business intelligence tools, added Dotro, who works in the bank's New York offices.
More on managing big data programs and choosing tools
Find out why clear thinking is required in selecting big data software
Get expert advice and examples of real-world projects in our big data analytics guide
Read about Time Warner Cable's experiences in evaluating big data tools
Dotro also took part in the panel discussion at the Oracle conference. In a follow-up interview, he said the technology evaluation process is being driven by his group, but with "a lot of collaboration" and input from CIBC's business units. The company has set up a 20-member executive committee with representatives from IT, data security, corporate operations, and the business units to plan out the big data architecture. The process "is somewhat of a democracy," Dotro said. "But ultimately, it's the business that dictates this."
And demonstrating a business case for the proposed big data implementation is an important next step. In the coming months, CIBC executives will review the results of the pilot projects and decide what to move forward on. "But for sure, it's happening," said Dotro, who expects to get approvals for deployments and perhaps begin some of them this year.
For one thing, the bank's CEO has made the big data strategy one of his priorities, according to Dotro. In addition, competitive forces -- a common driver of big data programs -- are pushing the bank to step up its analytics game. CIBC's data analysts typically look at only "small chunks of data" now, he said. The big data applications will make more information available -- often in real or near real time -- in an effort to boost functions, such as marketing and customer service. The result, Dotro said, will be a more data-driven -- and hopefully more successful -- company.