Oracle recently announced MySQL HeatWave Lakehouse with query performance that is touted to be 17x faster than Snowflake and 6x faster than Redshift based on a 400 TB workload.
MySQL HeatWave Lakehouse will enable customers to process and query hundreds of terabytes of data in object store in a variety of file formats, such as CSV and Parquet, as well as Aurora and Redshift backups. MySQL HeatWave Lakehouse is the newest addition to the MySQL HeatWave portfolio, the only cloud service that combines transaction processing, analytics, machine learning, and machine learning-based automation within a single MySQL database.
Also Read: Oracle Brings MySQL HeatWave to Amazon’s AWS Cloud
Powered by the massively parallel scale-out MySQL HeatWave architecture, MySQL HeatWave Lakehouse delivers significantly better performance than competitive cloud database services for running queries and loading data, as demonstrated by industry standard benchmarks. In addition, in a single query, customers can query transactional data in the MySQL database and combine it with data in the object store using standard MySQL syntax. Oracle also announced new MySQL Autopilot capabilities that improve performance and make MySQL HeatWave Lakehouse easy to use.
Benchmarks
As demonstrated by a fully transparent, publicly available 400 TB TPC-H* benchmark, the query performance of MySQL HeatWave Lakehouse is:
- 17X faster than Snowflake
- 6X faster than Amazon Redshift
Loading data from object store into MySQL HeatWave Lakehouse is also significantly faster. For a 400 TB TPC-H* workload, load performance of MySQL HeatWave Lakehouse is:
- 8X faster than Amazon Redshift
- 7X faster than Snowflake
All of these fully transparent benchmark scripts are available on GitHub for customers to replicate.
“MySQL HeatWave is the result of years of research and advanced development, which we are turning into breakthrough innovations to address a bigger set of challenges for all MySQL customers. In fact, MySQL HeatWave Lakehouse is our third major MySQL HeatWave announcement this year,” said Edward Screven, chief corporate architect, Oracle.
Customers migrating from AWS, Google, and on-premises have been using MySQL HeatWave for a broad set of use cases including marketing analytics, particularly real-time analysis of advertising campaign performance and customer data analytics to build effective campaigns. Migrating AWS customers include leaders in the automotive, telecommunications, retail, high-tech, and healthcare industries.
Innovative new capabilities for MySQL HeatWave Lakehouse
- Larger data size, standard MySQL syntax: Customers can query up to 400 TB of data with MySQL HeatWave Lakehouse, and the HeatWave cluster scales to 512 nodes. Customers use standard MySQL syntax for querying the data.
- Identical performance and compression: MySQL HeatWave offers the same query performance for data stored inside MySQL database or on object store—as demonstrated by both 10TB and 30TB TPC-H benchmarks. Furthermore, the amount of compression achieved and the amount of data which can be processed per node is the same in both instances.
- Support for multiple file formats: With MySQL HeatWave Lakehouse, customers can load and process data stored in a variety of file formats, such as CSV and Parquet, as well as Aurora and Redshift backups from AWS. This enables customers to leverage the benefits of MySQL HeatWave even when their data is not stored inside a MySQL database. The query performance is the same regardless of the file format in which the data is stored.
- Ability to query data in MySQL and combine it with data in object store: With MySQL HeatWave Lakehouse, customers can query their OLTP data stored inside MySQL database and combine it with data stored in the object store. Any change made to the OLTP data is updated in real time and reflected in the query result.
New MySQL Autopilot capabilities for MySQL HeatWave Lakehouse
MySQL Autopilot provides machine learning-based automation for MySQL HeatWave. Existing MySQL Autopilot capabilities such as auto provisioning and auto query plan improvement have been enhanced for MySQL HeatWave Lakehouse, which further reduces database administration overhead and improve performance. In addition, a number of new MySQL Autopilot capabilities are now available for MySQL HeatWave Lakehouse.
- Auto schema inference: Autopilot automatically infers the mapping of the file data to datatypes in the database. As a result, customers don’t need to manually specify the mapping for each new file to be queried by MySQL HeatWave Lakehouse—thereby saving time and effort.
- Adaptive data sampling: Autopilot intelligently samples portions of files in object storage, collecting accurate statistics with minimal data access. MySQL HeatWave uses these statistics to generate and improve query plans, determine the optimal schema mapping, and for other purposes.
- Auto load: Autopilot analyzes the data to predict the load time into MySQL HeatWave, determines the mapping of the datatypes, and automatically generates the loading scripts. Users don’t have to manually specify the mapping of files to database schemas and tables.
- Adaptive data flow: MySQL HeatWave Lakehouse dynamically adapts to the performance of the underlying object store. As a result, MySQL HeatWave can get the maximum available performance from the underlying cloud infrastructure which improves overall performance, price performance, and availability.
MySQL HeatWave Lakehouse is now available in Beta for customers to try and is slated for general availability by first half of FY’23.