Structuring a big data strategy
A comprehensive collection of articles, videos and more, hand-picked by our editors
Tagged.com specializes in connecting people. But to power those connections, the social media site manages and analyzes data that easily crosses into big data territory.
The site's co-founder and chief technology officer Johann Schleier-Smith describes Tagged's data as fairly structured, but arriving in massive quantities of more than 10 terabytes every month. As a digital company where the product is the data, Tagged is focused on processing and analyzing that data quickly, operating in real time to, for example, offer users recommendations of potential people they may like to meet.
SearchBusinessAnalytics.com news editor Nicole Laskowski talked with Schleier-Smith, co-founder and chief technology officer of the San Francisco, Calif.-based Tagged, about his company's successes and struggles with big data analytics. Read a partial transcript of the interview below, and listen to the full podcast to hear more of Johann Schleier-Smith's thoughts on big data.
In your opinion, how hyped do you think big data is?
Johann Schleier-Smith: Big data is something that we sort of stumbled into, realizing that we had a problem managing the volume way back in 2005, which was when Tagged was really starting to take off. And I think over the years the term has come to be applied more broadly as more and more companies are utilizing the data that's available to them. Is it hyped? Well, perhaps, but I think there really is a huge opportunity in big data, and the attention it's receiving is warranted.
I think there really is a huge opportunity in big data, and the attention it's receiving is warranted.
Johann Schleier-Smith, co-founder and chief technology officer, Tagged.com
We don't often hear about the other side of big data analytics, which is how hard this stuff is. How hard is big data analytics?
Schleier-Smith: For us, some of the challenges really come around making all of that data accessible, available and useful. The tools and platforms for storing the data [are commonly available] at this point. Five years ago, that would have been different. The challenge might have been around how do we keep it all and how do we make it cost-effective and so forth. A lot of that is under control. Now the challenge is that the company has in our case 170 people, and they need to get access to the data and make it useful. And some of them, of course, need more access and deeper access than others. But organizing that, making sure you have best practices, that things are done consistently correct, that's the big challenge for us.
For businesses looking to delve into this big data analytics space, what tips could you provide for them?
Schleier-Smith: Having the right tools really is key. So whether that's Hadoop or Greenplum or some other solution, you want to make sure you are set up to collect lots and lots of data -- that there's plenty of space, that you're not worrying about that, that you can log a lot. Another thing that's really useful in that context is to spend some time really thinking about what data you're collecting. At Tagged, we have a completely different data storage format for the operational system, that live database that serves the website, and the system that's used for analytics. They're entirely separate. We don't transfer the data, we don't do any sort of ETL, and that means we are a lot more effective and efficient in our analytics.
Can you talk about where big data and big data analytics fit into your overall analytics strategy?
Schleier-Smith: The nature of the Tagged business -- with millions of people visiting the site everyday -- means we end up with big data pretty quickly and almost across the board. That said, it's not the only thing we think about in terms of our analytics strategy. Certainly there's a layer of routine operational monitoring, reporting, alerts. There's also a great emphasis, for us, on predictive analytics, which oftentimes occurs in the context of big data.