It’s been more than two years since Gannett tapped Qubole, a data-activation company, seeking to more efficiently optimize and analyze hundreds of terabytes of daily data across its more than 300 digital, mobile and print publications. Now, the publisher is doing more than storing data.
Qubole’s cloud-based Presto engine has allowed Gannett to consolidate log-level data from hundreds of customers into one scalable “data lake.” Gannett can now query an entire day’s worth of data — about 70 million records across 300 dimensions — in seconds.
“Our number one reason for choosing Qubole was to take advantage of cloud economics: only pay for what you use,” Gannett’s Senior Director of Data Solutions Oskar Austegard said in a statement. “With our data operationalized, we are now looking at how to use our first-party data and the insights derived from that data to improve the experience of both our B2C and B2B customers.”
Realizing the benefits
Gannett first transitioned to Qubole in early 2016, and has since drastically increased the amount of data it processes. The publisher is also leveraging increased volume and processing power. In just two years, Gannett processed seven-times more data — going from 100 to 700 terabytes — without the need for additional administrative support.
In addition to increasing the value of its data, Gannett has grown from hosting four computing clusters to around 25, and has gone from having fewer than 10 data analysts to 40 users today.
The cloud-based platform has given Gannett the freedom to take on extra large-scale processing without impacting its normal operations. It has also enabled faster data discovery and integration — Gannett analysts can query tens of millions of records in seconds, allowing the publisher to introduce standardized reports across the organization and move to a self-service reporting environment.
Because Qubole helps build data models to do things like identify a subscriber’s lifetime value and predict which customers may be about to cancel their subscriptions, the publisher has also improved customer retention.
Improved data insights mean the publisher can make better content recommendations as well and analyze which micro-segments of the population respond to advertisements from its B2B customers.
Gannett stores its data in a single, flexible lake residing on AWS S3, while its computing “platform” is composed of EC2 instances that can be elastically scaled as needed