Data and analytics are hot topics, with thousands of words a day published by journalists, academics, consultants...
and marketers. But one group is rarely heard from outside of technical forums -- the developers and architects who build the platforms and applications that help users analyze data.
In this monthly series, I'll talk with the engineers behind significant BI and analytics products to help us all better understand how they bring the technologies to life. First up is Tableau Hyper, an in-memory data engine that Tableau Software released in January 2018 to speed up data ingestion and analytical query processing in its self-service BI software.
Hyper, which debuted in Tableau 10.5, runs queries up to five times faster than the original Tableau Data Engine, according to the company. Tableau says it also provides a 3X performance boost on the creation of Tableau data extracts (TDEs), which are data snapshots that can be saved and then used in different dashboards and data visualizations.
I chatted recently with Tobias Muehlbauer and Allan Folting, who are both senior managers of engineering on the Tableau Hyper engine team; Folting leads the engineering work in the U.S., while Muehlbauer heads Tableau's Munich development office. As an engineer myself (who built rival products to Tableau at Microsoft and Qlik), their audacity astonished me.
Not only did they integrate a new analytics engine into the core Tableau software in two years, but they also migrated every customer that upgraded to 10.5 to the new engine, both on premises and in the cloud. And they did that while Tableau hired a new CEO, changed its chief development officer and switched to subscription-based software licensing.
So, how did this demanding and daring project begin?
Perhaps surprisingly, Folting first praised the earlier engine. "It was amazing tech in its time and hard to beat in many ways," he said. But he also spoke frankly about its shortcomings, particularly on TDEs: "In 2015, we did a deep internal review of our extracts. For a growing number of customers, extracts were less than ideal, particularly with bigger data sets and more complex analysis."
Stumbling onto Hyper
In 2015, as Tableau's architects weighed the idea of investing more in Tableau Data Engine, some of them met Muehlbauer and other members of a German academic team at a conference in Melbourne, Australia.
The German team had created an initial version of what became Tableau Hyper. Already seven years old at that point, the technology grew out of their dissatisfaction with the available choices in the database market, which had splintered into distinct transactional, analytical, geospatial, NoSQL and big data systems. Muehlbauer and his colleagues committed to build a new engine from the ground up, maintaining core principles of transactional database design while challenging the need to specialize.
"A lot was due to modern hardware," Muehlbauer said. "CPUs had become more complex, with multiple cores, pipelines, deep caches and cache hierarchies. Memory sizes had grown, so the working [data] set was much larger than before. These features enabled new designs that brought the different workloads back together." But it was hard for existing database systems to adapt to those trends, he noted.
A prototype of the engine, originally called HyPer, became a commercial product that was spun out of the Technical University of Munich into a new company named Celerata, with Muehlbauer as CEO. Tableau acquired the technology in early 2016, eight months after the initial meeting in Melbourne.
New data engine opens up new horizons
According to Folting and Muehlbauer, the goals of the new engine were a perfect match for Tableau, which was interested in workloads akin to online transaction processing (OLTP) jobs. That intrigued me. Why should a self-service BI and data visualization product need OLTP-like functionality? The answer is ... still intriguing to me.
Allan Foltingsenior manager of engineering for Tableau Hyper
With Hyper, Tableau solved its problem of efficiently creating and updating TDEs; the engine also powers some smart optimizations and Tableau Prep, the company's new data wrangling tool. But the development team has yet to enable real-time data updates, although Tableau Hyper has that capability. What's coming in the future? No one at Tableau is saying, but you can be sure they're looking at many opportunities.
As I said above, I'm impressed by Tableau's boldness and expertise in swapping out the data engine in its software. In doing so, it held off on introducing new functionality in Hyper, which Folting described as not an easy decision but a necessary one. "We didn't want to maintain two code bases, or to give customers a weird, ugly choice," he said. "So, we decided to improve existing scenarios with better scale and performance, but not to add new user functionality so much."
Still, during the two years of development before the release of Tableau Hyper, the engineers found many complex dependencies between the existing engine and both Tableau Desktop and Tableau Server. Indeed, the team in the U.S. had to rewrite more of the client and server architectures to work with Hyper than expected, Folting said.
Meanwhile, Muehlbauer and the team in Germany faced new challenges, not least the tremendous range of workloads and queries run by real-world Tableau customers. It turns out, he said, that users love text fields and string functions, often performing complex calculations with strings of text and dates. In some cases, Hyper needed numerous micro-optimizations so it wouldn't break existing dashboards, he added.
Another challenge: integrating people
In my own engineering work, I've generally found that integrating development teams is harder than integrating technologies. From what I heard, Folting and Muehlbauer would likely agree.
The Tableau Hyper team now includes about 60 people: half in Munich developing the core engine, and half in the U.S. split between Tableau's headquarters in Seattle and an office in Palo Alto, Calif., working on technologies around Hyper such as extract management tools. Developers from very different backgrounds and separated across numerous time zones had to get to know each other. "For people who were less outgoing, it was difficult to make that connection," Folting said.
The new team members also had to learn Tableau's coding styles and practices. That alignment was gradual: Developers have strong opinions about small things. But the team leaders said they eventually learned how to guide conversations around to where they were productive -- more or less.
Looking back, what was the most difficult part of the whole process? Messaging to customers on what to expect from Tableau Hyper, Folting said without hesitation.
"It's not as if on day one every scenario would automatically see a 10x performance improvement," he noted. "After all, the original engine was itself an achievement, and we still needed optimizations for some scenarios. We're proud of what we achieved, but the messaging could have been better."