Tyagi: 'Big data' technology a solution in search of right problemsDate: May 24, 2012
Just because you have a data problem doesn’t mean you have a “big data” problem, according to Vineet Tyagi, associate vice president of technology at software development services provider Impetus Technologies Inc. The key to knowing if big data technology can help your organization, Tyagi says, is to look for particular characteristics that may be affecting the performance of its data warehouse, business intelligence and analytics systems.
Companies often “come into big data technology usage imagining that they would get some tremendous benefit,” Tyagi says. “What we’ve seen is that there are certain types of problems that lend themselves very well to being solved [by] and getting benefits from big data technologies.”
The examples that Tyagi cited include backups in data analytics reporting because of a lack of processing power and what he calls the “broken rule” -- a case in which a company’s data warehouse system is constantly breaking down because the organization is unable to manage the volume of the data being collected for analysis and the velocity at which the data is being generated and updated.
In this video interview recorded at SearchBusinessAnalytics.com’s “Delivering Deeper Insights with Big Data and Real-Time Business Intelligence” seminar, Tyagi spoke further with Editorial Director Hannah Smalltree about big data software options and their potential uses as well as best practices for managing big data analytics projects.
In the video, viewers will learn about:
- How to figure out if big data technology is the answer to the problems they’re trying to solve
- The different categories of big data technologies that are available today
- Steps companies can take to start building up their big data skills and gain experience in using big data technology
- Common pitfalls and misconceptions that organizations run into on big data implementations
- Key best practices that can help ensure the success of a big data analytics project
- How companies can take advantage of big data analytics tools to meet their business needs and find value in data
Read the full transcript from this video below: Tyagi: 'Big data' technology a solution in search of right problems
Hannah: Hello and welcome. I'm Hannah Smalltree, Editorial
Director for SearchBusinessAnalytics.com and related sites. I'm here at our seminar on delivering
deeper insight with big data and real-time technologies. I'm talking with Vineet Tyagi. He is Head
of Innovation Labs for Impetus Technologies. Thank you for joining me, Vineet.
Vineet: Good morning, Hannah. Thank you for welcoming me and having me here. I'm excited to be a part of this seminar.
Hannah: Now, your firm has already done several big data projects with clients and that's how you came to the best practices you presented today. I found your first recommendation really interesting, which was to determine whether the problem you're trying to solve is a big data problem or not. What are some of the characteristics of a big data problem?
Vineet: That's a fairly interesting one because working for almost last four years helping customers, what we've realized is that a lot of times when people move into solving problems, what we've seen is that they come into the big data technology usage imagining that they would get some tremendous benefit and other things.
What we've seen is that there are certain kind of problems that lend themself very well to being solved and getting benefits out of the big data technologies. Those type of problems, their characteristics are something like, if you have a compute issue where there's a lot of port link happening, if you are seeing a lot of your data analytics backing up because you are not able to process them in time. That typically is kind of a sign for us to look at, that you can solve this problem with big data technologies.
The other characteristic, I call it the broken rule, which is you're in the situation where everything is broken and you don't seem to be able to fix it. Now, if it's your enterprise data warehouse or your analytics engine or whatever, it's broken because you're not able to manage the data, you're not able to store it, you're not able to just keep up with the volumes or the velocity of acquisition of that data.
That also is a good characteristic that's able to tell us that this is a situation or a context where perhaps big data technologies can make a difference.
Hannah: What kinds of technologies are we talking about when you say big data technologies? What are some of the categories of technologies our viewers might encounter?
Vineet: Absolutely. Hannah, when we look at technology, the technology landscape of what we call the big data technologies, I categorize them in three layers, so to say. The first layer is what we call the data infrastructure layer technologies, which are solving the issues of how can you store ginormous amounts of data. Ginormous amounts are hundreds of data bytes or even petabytes. People are pushing the scale of storing data. Your typical storage does not work here.
So, there are specialized ways of storing this data, so technologies would use distributed file systems, storage on their networks, they're concerns in these technologies. So, I try to solve some of the challenges of reliability, redundancy, to make sure that the data remains accessible and highly available. That's the data infrastructure storage, there's once you have the data that you can store, one part of the problem is solved. The next part is how do you analyze and compute on this data?
The next category of technologies that exist are the paddle computing, or the high performance computing elements like MapReduce. These are a category of technologies that take the processing. What they then do is they break the large amount of work that has to be done on the ginormous amount of data and they paddleize this.
What they end up doing is that they would break this down into smaller pieces of work and then use a cluster of, say, commodity machines as worker nodes to go out and then do the processing for that bit of the data. Once all of these workers have finished, then you can correlate the results back, aggregate them and you can do that in a business-oriented way that doesn't take days, it can be matter of a few hours or minutes to process this large amount of data.
The third layer of technologies which exists is in the area of visualization. See, what has happened in the industry is that the world petered off the last 20, 30, 40 years, that we've been storing data and analyzing. We have been used to seeing a lot of structure in the data and with structure, there is a lot of relation. You can visualize it from the perspective of how the data is related to each other, but when you talk about big data, you are looking at unstructured elements and you're looking at unrelated data.
So, when you try to find insights by correlating this data, there are new forms of visualizations which help you get insights into that data, heat maps and other things. That's the visualization there. That's what I call the three layers. One's from the data to the processing, then to the visualization.
Hannah: How can companies get experience with these technologies? Do you recommend a proof of concept project or maybe some kind of smaller project for people to experiment and get their feet wet with these new technologies?
Vineet: Yes. That's fairly key. I think big data technologies are here to stay and everybody will have a big data problem even if they don't realize that they have a big data problem, they'll very, very soon, very, very soon have a big data problem. Companies have to start thinking about and have to get started on this journey.
I think the first step in this journey for them would be to unlearn. I know it's a little difficult because, like I said, we've been used to seeing a lot of structure in the data. We've been used to working with data in a typical fashion. So, to understand the parallelism in compute and these new technologies, we've got to at least explore them with an open mind, which requires some amount of unlearning of what we have
Not to say those technologies are bad, it's just that this technology works this way. Number one, unlearn, and I would say once you have unlearned that, then it's a process of exploration and that process of exploration requires that you try to get data because big data sometimes solves problems of delivering data in sites that we don't even know exist.
Okay? So, solving the problems that you don't even know exist is taking a different mindset altogether. So companies have to approach that with an open mind and create enough data within the company for people to experiment with. I agree, the best step to do that would be a POC or a small experiment where they can familiarize themselves with these technologies.
Hannah: Now, I know it's still early on, but you've seen a lot of these projects already. Are there any common pitfalls for organizations to watch out for as they get into these big data projects?
Vineet: Definitely. We've been early adopters. We've been helping companies do this for the last four years. What we've learned is that every mistake has a context and every failure happens in a context, but if I have to answer your question, I broadly categorize them into two categories.
One category of problems, and the pitfalls that we've seen, is that organizations approach big data technologies as a solution to the problem that they are having of maybe performance, or scaling, or whatever. That's not necessarily the technology issue there. Right? It's not about using this technology to solve a problem which you have today.
Big data is mostly a business problem. People tend to forget about the business context. That's the important one. The second kind of category of problems that we've seen is that big data is a buzz word today, and everybody wants to do something about big data. That's another category of pitfalls that we've seen is that people rush into it because they want to be into this. I say it's a double edged sword. If you don't know what you're doing and you're rushing into this, it is bleeding and so then you can cut yourself really bad if you don't know what you're doing.
Hannah: Now, could you offer us three best practices for people to succeed with these big data projects, other than hiring a firm like yours?
Vineet: Yes. Definitely they can hire a firm like ours, but if I have to say and answer that question, what I would say is that for me, the first one would be model requirements. What we've seen is in this domain that people don't have tools or frameworks today of how you define your big data requirements.
That's something that I spoke about in my presentation where I created a model that I'm recommending people start using. It's a very simple model of how much you're storing, what rate are you acquiring that data, and what type of data are you storing? For each one of the concerns that we have to look at in all modeling requirements, I would say critical success factor number one, pay attention, and focus on gathering big data requirements.
Second, I would say is plan for change in the organization. I know change has its own challenges, but when you move into a big data mindset, there is experimentation which is required. It's unlearning of certain mindsets that you acquired, a learning of technology, the way you manage your infrastructure, the way you manage your data. All of that has to be unlearned and relearned in the new context.
So, plan for that change, plan for training people. That is also a big change and invest in and training of people as well. That's the second critical success factor, I would say. Then, again, the third one is start small. In all this big data, our recommendation is always start small. Do a POC. Do an experiment, prove to yourself that it works before you go big and start building big on the big data technologies.
Hannah: Let's talk briefly about use cases. You said earlier, everybody would have a big data problem. For companies that maybe aren't a giant internet or social media company, the average organization, where should they look for these kinds of issues? What kinds of big data problems might they encounter?
Vineet: Hannah, what I would say is more than the problem, the biggest problem is it becomes apparent to people. It's something that they would see they cannot ignore. I call it the big data opportunity, and the problem is that people will miss the big data opportunity because they're missing the insights in the
data that they don't know exist.
That's a difficult challenge to solve. Challenge is people do not know that they can do so much with the data. Every business today, if you look at, we've evolved. 20 years back, who had Facebook? Who had your social presence? We're leading digital lives today. Customers lead a digital life. There's so much of data that companies gather about customers. Every department in an organization today is ID automated.
There's so much data that all these servers and at every point where the data flows through, all the while, that we capture, now companies are throwing away this data today, because they don't know what to do with that. The first step is that if they start capturing this data and then looking at this data and say, "Gee. Okay. What can I do with all this data now that I stored with this?" Newer and newer insights would come upright.
Then opportunities for a company to improve their efficiencies, number one, just by getting insights from their operational transactional data. There are new business opportunities that can come in from analyzing things like predictive analysis, how can you predict behavior of your customers? Social analytics, how can you know how people perceive your image of your organization, or of your products?
All of these are opportunities which exist. I just wanted to mention that I was at a conference and, in terms of the opportunity that can really exist, there was a startup that, and I got a person who has taken the Dow Jones, the industrial data and the census data and the data of, say for example, music.
What kind of music is selling? What genre of music? What type of music? How it was marketed.
They correlated this data together to arrive at the model that can predict that, given the economic circumstances, what kind of music should you be producing? At what price point should you be selling that? That's marketing insight. That's tremendous insight, these kinds of predictive models. Now if you go looking for solving this problem, nobody has this problem. These are totally unrelated sets of data. These are the kind of use cases I'm talking about that when you throw away data, you are losing out opportunities.
Hannah: Any final advice on being successful with big data?
Vineet: I would reiterate what I said earlier. I would say two very important things are critical for being successful in the big data journey that they're thinking of. The first one, again, would be it requires unlearning and it requires perseverance, start small.
Hannah: Thank you so much for talking with me today. Vineet Tiyagi, Head of Innovation Labs for Impetus Technologies.
Vineet: Thanks, Hannah. It's a delight to be here at the seminar.
Hannah: Thank you all for joining us here today. Remember, you can visit SearchBusinessAnalytics.com for more articles, videos and other resources about big data, real-time and lots of other related topics. Thank you again for tuning in. Have a great day.