Sergey Nivens - Fotolia
Data management and analytics professionals have many tools that make organizing and analyzing data easy. But in his new book, Understanding Big Data Scalability, Cory Isaacson argues that the need for effective data governance models has never been greater. Even in a self-service and NoSQL world, organizations still need to pay attention to how applications access data, where data is stored and who has access. SearchBusinessAnalytics talked with Isaacson, CEO at database management company CodeFutures Corporation, to learn more.
In the book you talk about dissatisfaction with big data definitions. How do you define big data?
Isaacson: The basic definition I use is, as soon as you have a problem where your database is not performing the way you need, you have a big data problem. I don't think it has to be any more specialized than that.
A theme of the book is making sure organizations are adhering to good data governance practices. Many businesses today are interested in self-service applications that may make it hard to maintain good governance. How can a business move toward self-service while maintaining effective data governance?
Isaacson: I think it depends on the point of view and who's really doing it. If you have data scientists who are playing around, they can typically control the schema. But it's very easy to get lured into the idea that I don't need to worry about a schema or what my relationships are, and very soon you get caught up in not being able to support what you need. Data relationships don't disappear just because you have a NoSQL database. Understanding your data structure and designing it well is actually a higher burden in that case, not a lower one. People often make the mistake of not doing that up front and then a few months or weeks into the application, they can't get the data out the way they need it and they have to refactor the whole thing.
So does that mean they'd have to start over from scratch?
Isaacson: I've seen production applications in something like Mongo and people are excited because they can get things up and running quickly. In a couple weeks they have an application going, and then three months down the road they have to completely refactor the data model. The point is, if you do that work up front, you're not going to run into those kinds of issues.
Another chapter talks about scaling applications. Where are some of the challenges here and how can it be done effectively?
Isaacson: The main point is that scaling applications is generally very easy because in a typical infrastructure you have a load balancer on application servers and you can scale those as much as you want, and there are lots of tools and frameworks to do that. But as soon as you go past the threshold where the database can handle the load, you're into a database scalability problem, and that's where it gets really challenging. The point of the books is that you should think about these things upfront.
That also seems like it goes back to the data governance model issue.
Isaacson: Any software problem boils down to process discipline. The idea that you can do everything in an ad hoc way and everything is going to be fine is not realistic. I think at some point in the future, when we have better software and compute mechanisms than we do today, that will be more possible, but even then, I think you'll have to think about how you're doing things.
How to build an effective data governance model
Avoiding the worst data governance mistakes
Data governance takes on big data