Advanced Computing in the Age of AI | Sunday, May 26, 2024

Unleashing Near Real-Time Insights with Starburst’s Icehouse Architecture 
Sponsored Content by Starburst

The data industry loves coming up with new solutions to old problems. Starting with the database, followed by the data warehouse, and then the data lake. Now, most of what we talk about is the data lakehouse. However, we should all take less interest in the latest term of the day and instead pay attention to actual adoption patterns.

That’s why when Justin Borgman, CEO of Starburst, published his Icehouse manifesto shortly after I joined—noting the adoption of Trino and Apache Iceberg amongst data leaders like Netflix, Apple, Shopify, and Stripe —I sat up a little straighter in my chair. “Now, this is interesting.”

Over the past few months, I’ve had the opportunity to talk to several Fortune 500 customers about their interest in the Icehouse architecture and translate those learnings into what we are building here at Starburst. I’d like to summarize my learnings so far with you.

Why “Icehouse”?

For over 40 years, data warehouse vendors have locked customers into proprietary data formats and SQL language implementations. With high switching costs, customers were locked-in without a viable alternative—Until “Icehouse”.

Icehouse at its core is an open architecture that provides warehouse-like capabilities on the open data lake. Historically, data lakes have been primarily seen as a low-cost storage solution, with limited value for interactive analytical use cases. The lack of DML (data manipulation language) and ACID (Atomicity, Consistency, Isolation, Durability) compliance made it hard for organizations to adopt data lakes over data warehouses for business and mission-critical use cases.

Icehouse changes all of that. Icehouse is made up of two key components - the open-source Trino query engine and the Apache Iceberg table format. The Trino query engine allows for fast, massively parallel, interactive analytics at petabyte scale. And the Apache Iceberg table format provides a full warehouse experience on the data lake, including time travel, DML, and ACID compliance.

Why Starburst’s implementation of “Icehouse”?

At this point you might be asking yourself, why more teams haven’t adopted this open, high-performance, and scalable architecture. The answer is simple. Most data teams don’t have the resources or expertise needed to deploy and operate an Icehouse at scale in production.

Building and operating an Icehouse at scale requires significant upfront and ongoing data engineering investment. Investment areas include ingesting the data, cleaning and normalizing raw data, preparing the data for consumption, optimizing file and table structures, and provisioning and maintaining infrastructure, not to mention evolving requirements for security, data privacy, governance, and regulatory compliance.

Starburst’s Icehouse implementation in Starburst Galaxy automates all of this work. With Icehouse in Starburst Galaxy, our goal is to automate the lakehouse process from ingestion through querying and governance. This will allow data teams of all sizes to reap the benefits of the Trino and Iceberg architecture without the burden of building and maintaining a custom solution themselves.

Beyond what is possible with open-source Trino and Iceberg, Starburst Galaxy also adds unique capabilities that unlock greater value for users, like near-real-time analytics access, industry-leading price-performance, automated table optimization, automated data quality checks, AI-based automatic data tagging and classification, smart indexing and caching, and granular access controls for governance. (For more information, refer to our press release and launch blog.)

Final Thoughts

Today, more than ever before, data is at the heart of innovation—from medical research to autonomous driving, from generative AI to risk management, from oil & gas exploration to customer experience.  At Starburst, we believe that Icehouse is the convergent design for data architecture on which the vast majority of these use cases will be built.

The existing paradigm built around traditional data warehouses has proven too rigid and too expensive for emerging needs and innovation, and specialized solutions such as streaming databases are often too complex or too specific for broad adoption. The Icehouse architecture is heading towards the de facto solution, with the best combination of price and performance for both analytical and data-intensive applications.  Starburst is proud to be on the front lines, supporting the open-source communities of Apache Iceberg and Trino, while heavily investing in new product capabilities to make our customers more productive and more efficient with their data.

You can sign up for early access to Starburst’s managed Icehouse here.