News Stay informed about the latest enterprise technology news and product updates.

Spark Streaming project looks to shed new light on medical claims

To give healthcare providers a real-time view of the claims processing operations its systems support, RelayHealth is augmenting its Hadoop cluster with Spark's stream processing module.

RelayHealth, a unit of McKesson Corp. that runs claims processing applications for healthcare providers, has been a Hadoop user since 2012. Its Hadoop cluster holds 150 TB of claims-related data from hospitals and health systems, which is used to meet regulatory compliance requirements, track the progress of claims, and do analytics aimed at improving the claims and billing process. But now the Alpharetta, Ga., company wants to do more with the Hadoop data for its customers -- and more quickly.

Its new goal is to provide more real-time analytics to clients so they can get immediate insight into their business operations, enabling them to make fast adjustments in an effort to improve efficiency and resolve problems before they take a big financial hit. And to make that happen, RelayHealth is expanding its Cloudera-based cluster and turning to Spark Streaming, the stream processing component of Apache Spark.

The expanded analytics initiative is being driven partly by demand from customers looking to upgrade from after-the-fact reports to real-time alerts about operational issues, according to Raheem Daya, director of product development and manager of the Hadoop platform at RelayHealth. "The expectation is that if there's information they need, they need to have it immediately available," Daya said. "They want actionable intelligence quickly, and that's the model we're moving toward."

For example, he pointed to accounts receivables. Ideally, medical claims are processed promptly by insurers so providers can send final bills to patients and get payments back in a timely manner. But Daya said claims frequently get held up at the approval stage because of eligibility questions and other issues. Predictive models can look for claims that are likely to be flagged; armed with that information, workers in a provider's billing department can take action to try to expedite the approval process.

A partial first step on better analytics

RelayHealth deployed Spark, an open source data processing engine that can work with Hadoop or on a standalone basis, in late 2013. But Daya's team initially used the software's primary batch-processing functionality to pull in transaction data from an HBase database tied to the Hadoop cluster for analysis through machine learning algorithms. That helped, he said -- but it didn't provide the real-time information the company and its customers were looking for. In batch mode, it can take two to three hours to use incoming data to score a predictive model for accuracy, Daya noted. And then the scoring process typically needs to be repeated, often multiple times, as the model is refined.

Really, it's a universal shift from doing things the way we've always done them to being more data-driven.
Raheem Dayadirector of product development, RelayHealth

To try to accelerate things, the company is implementing the Spark Streaming module, with an expected go-live date this month. In tests, the stream processing technology has been able to score models in seconds, Daya said. He added that if Spark Streaming is set to pull in data every five to 10 minutes, "you're potentially going from waiting an entire day to get a result to waiting minutes."

One reason for waiting to put the data streaming software to use was the need to expand the Hadoop cluster, partly to handle the increased data volumes that Spark Streaming will generate. RelayHealth pulls an average of 28 million transactions per hour into the cluster, and Daya said more processing power had become necessary to get data both into and out of the system. An increase from 10 to 45 compute nodes is due to be completed shortly before Spark Streaming is turned on.

Another prerequisite step was upgrading to a new version of Cloudera's Hadoop distribution that became available in December with support for Spark Release 1.2.0. Korin Reid, a data scientist at RelayHealth, said the machine learning library built into earlier releases of Spark was "very primitive," making it hard to build good algorithms. Reid added that she has started using the library in Spark 1.2.0 to expand the set of algorithms the company plans to employ for analyzing the claims data.

Be prepared for what's coming

Daya said creating an overall IT architecture that can take advantage of stream processing technology is a must for a successful deployment. In addition to expanding its cluster, RelayHealth is adding the Apache Kafka message queuing technology to take data from HBase and feed it into Spark. Upstream business systems also need to be able to handle the real-time analytics information coming their way from Spark, including automated updates and actions, he said.

And business processes at the operational end likely will have to be modified to some degree, which can create people issues for project teams to contend with. "Really, it's a universal shift from doing things the way we've always done them to being more data-driven," Daya said. "It requires buy-in from senior management that this is a direction you want to go in."

William McKnight, president of McKnight Consulting Group, agreed that data streaming and real-time analytics applications call for "new ways of thinking" in many organizations. "There are a lot of mind things that go along with it," he said, adding that business managers and workers may need to be convinced of the wisdom of changing internal processes to get the full benefit of the new capabilities.

Craig Stedman is executive editor of SearchBusinessAnalytics. Email him at cstedman@techtarget.com, and follow us on Twitter: @BizAnalyticsTT.

Next Steps

Mixing Hadoop, Spark and other tools punches up big data systems

Hadoop project management essentials

Analytics and healthcare: A hot topic, but users face some hurdles

This was last published in February 2015

PRO+

Content

Find more PRO+ content and other member only offers, here.

Essential Guide

Guide to big data analytics tools, trends and best practices

Join the conversation

9 comments

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

Is your organization using stream processing technology to provide real-time analytics capabilities?
Cancel
Currently, our organization has not begun to use stream processing technologies that assist and help with real-time analytics capabilities because we are not yet in a growth position to warrant such heavy tech. We anticipate growth through the third fiscal quarter of 2015 that will then make the need to investigate stream processing for real-time analytics a necessity. When that point comes we will begin to implement the software options for stream processing analytics.
Cancel
Thanks for your comment, carol482. I think a lot of companies are in the same kind of place as your organization is. Hopefully, by the time yours is ready to take the plunge, some of the early adopters will have tangible results to help plan the way forward. What kind of applications do you see an eventual need for real-time analytics in?
Cancel
Where real-time analytics are concerned there is a growing need for mobile apps that can provide frequent updates and notifications when something changes from established patterns within the system being monitored. In addition, an app that allows the user to make a custom user interface for the analytic functions is something that would be welcomed. 
Cancel
I'd like results from the proposed machine learning before getting fully behind this. I've seen similar systems struggle with such large quantities of data before.
Cancel
Thanks for your comment, Alex47. It's definitely still early in the game for Spark and its machine learning library. A good number of companies are deploying the streaming module -- it will be interesting to track their progress and see how it goes.
Cancel
Both Relay Health and Medivance Billing are scaling quickly in the healthcare industry. Anxious to see how the rollout turns out.
Cancel
We're looking to publish an update on the RelayHealth project in the near future, jwmann2, based on a presentation by Raheem Daya at the TDWI conference in Boston. Watch this space!

Is your organization using or looking to use Spark and Hadoop?
Cancel
Here's our follow-up story on RelayHealth:

http://searchdatamanagement.techtarget.com/news/4500251353/Take-measured-steps-to-build-a-Hadoop-cluster-architecture
Cancel

-ADS BY GOOGLE

SearchDataManagement

SearchAWS

SearchContentManagement

SearchCRM

SearchOracle

SearchSAP

SearchSQLServer

SearchSalesforce

Close