Where Teradata could go with his data lakehouse

Couldn’t attend Transform 2022? Check out all the top sessions in our on-demand library now! Look here.

Last week Teradata offered his long-awaited answer to the rise of the data lakehouse. As VentureBeat’s George Lawton reported, Teradata has always differentiated itself by expanding the capabilities of analytics, first with massive parallel processing on its own specialized machines, and more recently with software-defined devices tailored for variations in workloads – from compute-intensive to IOPS (input/output operations per second)-intensive. And since acquiring Aster Data Systems more than a decade ago, Teradata has moved from solving major analytics problems to solving each analysis problem with a diverse portfolio of analytic libraries that extend SQL into new areas such as path or graph analysis.

With the cloud, we waited for Teradata to fully leverage cloud object storage, which is the de facto data lake. So last week’s duplicate announcements of VantageCloud Lake Edition and ClearScape Analytics were the logical next steps on Teradata’s journey to the data lakehouse. Teradata is finally making cloud storage a first-class citizen and opening it up to its broad analytics portfolio.

But unlike Teradata’s previous moves towards parallel and polyglot analysis, where it led the way, this time with the lake house, the company has. The announcement might not have mentioned the word Lakehouse, but that’s what it was all about. As we noted a few months ago, almost everyone in the data world, including Oracle, Teradata, cloudera, talent, google, HPE, fivetran, AWS, dremio even Snowflake felt compelled to respond to Databrickswho introduced the data lakehouse.

Teradata’s path to the data lakehouse

Nevertheless, Teradata approaches the data lakehouse with some unique twists and turns and it’s all about optimization. Teradata’s secret sauce has always been about highly optimized compute, interconnections, storage and query engines, along with workload management designed to use compute resources up to 95%. When standard hardware became good enough, Teradata introduced IntelliFlex where performance and optimizations could be configured via software. The ability to optimize for hardware not invented here opened the door for Teradata optimization for AWS, and later the other hyperscalers.


MetaBeat 2022

MetaBeat will bring together thought leaders to offer advice on how metaverse technology will change the way all industries communicate and do business October 4 in San Francisco, CA.

Register here

Teradata introduced VantageCloud a year ago, and at the end of last year ran a benchmark of 1000+ nodes which no other cloud analytics provider has matched to date. But this was for a more conventional data warehouse with common block storage.

The complication in realizing the Lakehouse was developing a table format for data in cloud object storage. That enables all the intricacies of data warehouses, such as ACID transactions, which are essential for ensuring data consistency, more granular security and access controls, and raw performance. Databricks fired the first shot with Delta Lake, and more recently, other providers from Snowflake to Cloudera and others have embraced Apache Iceberg, with the common thread that it’s all based on open source technology. For Lake Edition, Teradata went its own way with its own data lake table format, which the company claims has superior performance compared to Delta and Iceberg.

The other side of Lakehouse’s coin is software. Aside from the SQL engine, which is designed to handle large, complex queries that can join into hundreds of tables, Teradata has a large portfolio of analytic libraries that run in the database. This is one of Teradata’s best kept secrets. Largely the legacy of the Aster Data acquisition over a decade ago, these analytics were specifically tuned to exploit the underlying parallelism, and they went well beyond SQL, encompassing features such as n-Path, graph, time series analysis, and machine learning, all accessible through SQL extensions.

Teradata, which formally brands the portfolio as ClearScape Analytics, is finally drawing attention to the fact that it is a holistic analytics platform and not just a data warehouse, data lake or lake house. As part of the announcement, Teradata beefed up time series and MLOps content. But when we deal with the data lake, data scientists are very stubborn about choosing their own languages ​​or tools. And so VantageCloud will also support a ring-our-own-analytics option for those who want to write and work Python from Jupyter notebooks or their own workbenches, and currently has integrations with Dataiku, KNIME, and Alteryx. ClearScape analytics will be available for both VantageCloud Lake Edition and the standard Enterprise Edition.

Lake Edition and ClearScape Analytics are promising starts for Teradata as a data lakehouse. There’s no question that Teradata’s scale and support for polyglot analytics made Lakehouse a matter of when, not if. And branding the analytics portfolio is more than just a marketing exercise, as it finally shines a spotlight on what used to be a closely guarded secret: Teradata’s differentiation goes beyond its optimized SQL engine and infrastructure to include analytics built for that engine. optimized. VantageCloud comes full circle with the analytics portfolio by unleashing the portfolio on cloud object storage and, with usage-based pricing, potentially opening up the portfolio to more discretionary workloads compared to the days when customers were operating on-premises with firm capacity caps.

A wish list for Teradata

That leaves our wish list for what Teradata should do next. In summary, we want Teradata to venture further out of its comfort zone to attract a new audience of users. Granted, with the Lakehouse, the challenge isn’t unique to Teradata, as Databricks, for example, tries to attract business analysts while Snowflake drives data scientists to court.

To attract that new audience, Teradata needs to lower barriers to entry and bring open source more level with its own environment. With Lake Edition, Teradata has drastically reduced the entry-level price to $5,000 per month. That’s a marked drop from the six- and seven-figure annual contracts that Teradata customers typically pay, but we’d like to see Teradata move forward with a freemium offering that allows new users to kick the tires. Heck, even incumbents not known for their discounted pricing, like Oracle, have embraced free tiers.

In terms of open source, there are a number of paths that we would like to see Teradata develop further. The first is to attract non-Teradata users to ClearScape Analytics through optimized APIs to open source Delta and/or Iceberg data lakes. While the performance may not be on par with Teradata’s proprietary data lake table format, it could be made “good enough”.

Conversely, we would like to see parallel efforts with so-called BYO analytics, where the Python crowd is drawn to optimized APIs using Teradata’s proprietary data lake table format. For example, we’d like to see Teradata partner with Anaconda for sap performance of the Conda Python library portfolio, just like Anaconda is already doing Snowflake. In the end, it’s all about the analytics.

The mission of VentureBeat is a digital city square for tech decision makers to learn about transformative business technology and transactions. Discover our briefings.