Tyagi: 'Big data' technology a solution in search of right problems
Date: May 24, 2012Just because you have a data problem doesn’t mean you have a “big data” problem, according to Vineet Tyagi, associate vice president of technology at software development services provider Impetus Technologies Inc. The key to knowing if big data technology can help your organization, Tyagi says, is to look for particular characteristics that may be affecting the performance of its data warehouse, business intelligence and analytics systems.
Companies often “come into big data technology usage imagining that they would get some tremendous benefit,” Tyagi says. “What we’ve seen is that there are certain types of problems that lend themselves very well to being solved [by] and getting benefits from big data technologies.”
The examples that Tyagi cited include backups in data analytics reporting because of a lack of processing power and what he calls the “broken rule” -- a case in which a company’s data warehouse system is constantly breaking down because the organization is unable to manage the volume of the data being collected for analysis and the velocity at which the data is being generated and updated.
In this video interview recorded at SearchBusinessAnalytics.com’s “Delivering Deeper Insights with Big Data and Real-Time Business Intelligence” seminar, Tyagi spoke further with Editorial Director Hannah Smalltree about big data software options and their potential uses as well as best practices for managing big data analytics projects.
In the video, viewers will learn about:
- How to figure out if big data technology is the answer to the problems they’re trying to solve
- The different categories of big data technologies that are available today
- Steps companies can take to start building up their big data skills and gain experience in using big data technology
- Common pitfalls and misconceptions that organizations run into on big data implementations
- Key best practices that can help ensure the success of a big data analytics project
- How companies can take advantage of big data analytics tools to meet their business needs and find value in data
Read the full transcript from this video below: Tyagi: 'Big data' technology a solution in search of right problems
Hannah: Hello and welcome. I'm Hannah Smalltree, Editorial
Director for SearchBusinessAnalytics.com and related sites. I'm here at our seminar on delivering
deeper insight with big data and real-time technologies. I'm talking with Vineet Tyagi. He is Head
of Innovation Labs for Impetus Technologies. Thank you for joining me, Vineet.
Vineet: Good morning, Hannah. Thank you for welcoming me and having me here. I'm excited to be a
part of this seminar.
Hannah: Now, your firm has already done several big data projects with clients and that's how you
came to the best practices you presented today. I found your first recommendation really
interesting, which was to determine whether the problem you're trying to solve is a big data
problem or not. What are some of the characteristics of a big data problem?
Vineet: That's a fairly interesting one because working for almost last four years helping
customers, what we've realized is that a lot of times when people move into solving problems, what
we've seen is that they come into the big data technology usage imagining that they would get some
tremendous benefit and other things.
What we've seen is that there are certain kind of problems that lend themself very well to being
solved and getting benefits out of the big data technologies. Those type of problems, their
characteristics are something like, if you have a compute issue where there's a lot of port link
happening, if you are seeing a lot of your data analytics backing up because you are not able to
process them in time. That typically is kind of a sign for us to look at, that you can solve this
problem with big data technologies.
The other characteristic, I call it the broken rule, which is you're in the situation where
everything is broken and you don't seem to be able to fix it. Now, if it's your enterprise data
warehouse or your analytics engine or whatever, it's broken because you're not able to manage the
data, you're not able to store it, you're not able to just keep up with the volumes or the velocity
of acquisition of that data.
That also is a good characteristic that's able to tell us that this is a situation or a context
where perhaps big data technologies can make a difference.
Hannah: What kinds of technologies are we talking about when you say big data technologies? What
are some of the categories of technologies our viewers might encounter?
Vineet: Absolutely. Hannah, when we look at technology, the technology landscape of what we call
the big data technologies, I categorize them in three layers, so to say. The first layer is what we
call the data infrastructure layer technologies, which are solving the issues of how can you store
ginormous amounts of data. Ginormous amounts are hundreds of data bytes or even petabytes. People
are pushing the scale of storing data. Your typical storage does not work here.
So, there are specialized ways of storing this data, so technologies would use distributed file
systems, storage on their networks, they're concerns in these technologies. So, I try to solve some
of the challenges of reliability, redundancy, to make sure that the data remains accessible and
highly available. That's the data infrastructure storage, there's once you have the data that you
can store, one part of the problem is solved. The next part is how do you analyze and compute on
this data?
The next category of technologies that exist are the paddle computing, or the high performance
computing elements like MapReduce. These are a category of technologies that take the processing.
What they then do is they break the large amount of work that has to be done on the ginormous
amount of data and they paddleize this.
What they end up doing is that they would break this down into smaller pieces of work and then use
a cluster of, say, commodity machines as worker nodes to go out and then do the processing for that
bit of the data. Once all of these workers have finished, then you can correlate the results back,
aggregate them and you can do that in a business-oriented way that doesn't take days, it can be
matter of a few hours or minutes to process this large amount of data.
The third layer of technologies which exists is in the area of visualization. See, what has
happened in the industry is that the world petered off the last 20, 30, 40 years, that we've been
storing data and analyzing. We have been used to seeing a lot of structure in the data and with
structure, there is a lot of relation. You can visualize it from the perspective of how the data is
related to each other, but when you talk about big data, you are looking at unstructured elements
and you're looking at unrelated data.
So, when you try to find insights by correlating this data, there are new forms of visualizations
which help you get insights into that data, heat maps and other things. That's the visualization
there. That's what I call the three layers. One's from the data to the processing, then to the
visualization.
Hannah: How can companies get experience with these technologies? Do you recommend a proof of
concept project or maybe some kind of smaller project for people to experiment and get their feet
wet with these new technologies?
Vineet: Yes. That's fairly key. I think big data technologies are here to stay and everybody will
have a big data problem even if they don't realize that they have a big data problem, they'll very,
very soon, very, very soon have a big data problem. Companies have to start thinking about and have
to get started on this journey.
I think the first step in this journey for them would be to unlearn. I know it's a little difficult
because, like I said, we've been used to seeing a lot of structure in the data. We've been used to
working with data in a typical fashion. So, to understand the parallelism in compute and these new
technologies, we've got to at least explore them with an open mind, which requires some amount of
unlearning of what we have
learned.
Not to say those technologies are bad, it's just that this technology works this way. Number one,
unlearn, and I would say once you have unlearned that, then it's a process of exploration and that
process of exploration requires that you try to get data because big data sometimes solves problems
of delivering data in sites that we don't even know exist.
Okay? So, solving the problems that you don't even know exist is taking a different mindset
altogether. So companies have to approach that with an open mind and create enough data within the
company for people to experiment with. I agree, the best step to do that would be a POC or a small
experiment where they can familiarize themselves with these technologies.
Hannah: Now, I know it's still early on, but you've seen a lot of these projects already. Are there
any common pitfalls for organizations to watch out for as they get into these big data
projects?
Vineet: Definitely. We've been early adopters. We've been helping companies do this for the last
four years. What we've learned is that every mistake has a context and every failure happens in a
context, but if I have to answer your question, I broadly categorize them into two
categories.
One category of problems, and the pitfalls that we've seen, is that organizations approach big data
technologies as a solution to the problem that they are having of maybe performance, or scaling, or
whatever. That's not necessarily the technology issue there. Right? It's not about using this
technology to solve a problem which you have today.
Big data is mostly a business problem. People tend to forget about the business context. That's the
important one. The second kind of category of problems that we've seen is that big data is a buzz
word today, and everybody wants to do something about big data. That's another category of pitfalls
that we've seen is that people rush into it because they want to be into this. I say it's a double
edged sword. If you don't know what you're doing and you're rushing into this, it is bleeding and
so then you can cut yourself really bad if you don't know what you're doing.
Hannah: Now, could you offer us three best practices for people to succeed with these big data
projects, other than hiring a firm like yours?
Vineet: Yes. Definitely they can hire a firm like ours, but if I have to say and answer that
question, what I would say is that for me, the first one would be model requirements. What we've
seen is in this domain that people don't have tools or frameworks today of how you define your big
data requirements.
That's something that I spoke about in my presentation where I created a model that I'm
recommending people start using. It's a very simple model of how much you're storing, what rate are
you acquiring that data, and what type of data are you storing? For each one of the concerns that
we have to look at in all modeling requirements, I would say critical success factor number one,
pay attention, and focus on gathering big data requirements.
Second, I would say is plan for change in the organization. I know change has its own challenges,
but when you move into a big data mindset, there is experimentation which is required. It's
unlearning of certain mindsets that you acquired, a learning of technology, the way you manage your
infrastructure, the way you manage your data. All of that has to be unlearned and relearned in the
new context.
So, plan for that change, plan for training people. That is also a big change and invest in and
training of people as well. That's the second critical success factor, I would say. Then, again,
the third one is start small. In all this big data, our recommendation is always start small. Do a
POC. Do an experiment, prove to yourself that it works before you go big and start building big on
the big data technologies.
Hannah: Let's talk briefly about use cases. You said earlier, everybody would have a big data
problem. For companies that maybe aren't a giant internet or social media company, the average
organization, where should they look for these kinds of issues? What kinds of big data problems
might they encounter?
Vineet: Hannah, what I would say is more than the problem, the biggest problem is it becomes
apparent to people. It's something that they would see they cannot ignore. I call it the big data
opportunity, and the problem is that people will miss the big data opportunity because they're
missing the insights in the
data that they don't know exist.
That's a difficult challenge to solve. Challenge is people do not know that they can do so much
with the data. Every business today, if you look at, we've evolved. 20 years back, who had
Facebook? Who had your social presence? We're leading digital lives today. Customers lead a digital
life. There's so much of data that companies gather about customers. Every department in an
organization today is ID automated.
There's so much data that all these servers and at every point where the data flows through, all
the while, that we capture, now companies are throwing away this data today, because they don't
know what to do with that. The first step is that if they start capturing this data and then
looking at this data and say, "Gee. Okay. What can I do with all this data now that I stored with
this?" Newer and newer insights would come upright.
Then opportunities for a company to improve their efficiencies, number one, just by getting
insights from their operational transactional data. There are new business opportunities that can
come in from analyzing things like predictive analysis, how can you predict behavior of your
customers? Social analytics, how can you know how people perceive your image of your organization,
or of your products?
All of these are opportunities which exist. I just wanted to mention that I was at a conference
and, in terms of the opportunity that can really exist, there was a startup that, and I got a
person who has taken the Dow Jones, the industrial data and the census data and the data of, say
for example, music.
What kind of music is selling? What genre of music? What type of music? How it was marketed.
They correlated this data together to arrive at the model that can predict that, given the economic
circumstances, what kind of music should you be producing? At what price point should you be
selling that? That's marketing insight. That's tremendous insight, these kinds of predictive
models. Now if you go looking for solving this problem, nobody has this problem. These are totally
unrelated sets of data. These are the kind of use cases I'm talking about that when you throw away
data, you are losing out opportunities.
Hannah: Any final advice on being successful with big data?
Vineet: I would reiterate what I said earlier. I would say two very important things are critical
for being successful in the big data journey that they're thinking of. The first one, again, would
be it requires unlearning and it requires perseverance, start small.
Hannah: Thank you so much for talking with me today. Vineet Tiyagi, Head of Innovation Labs for
Impetus Technologies.
Vineet: Thanks, Hannah. It's a delight to be here at the seminar.
Hannah: Thank you all for joining us here today. Remember, you can visit
SearchBusinessAnalytics.com for more articles, videos and other resources about big data, real-time
and lots of other related topics. Thank you again for tuning in. Have a great day.
Business Intelligence Strategies for the CIO
Join the conversationComment
Share
Comments
Results
Contribute to the conversation