Couldn’t attend Transform 2022? Check out all the top sessions in our on-demand library now! Look here.
The modern data stack (MDS) is the basis for digital disruptors. Think Netflix. The company pioneered a new business model around video as a service, but much of their success relies on real-time streaming data.
They use analytics to send highly relevant recommendations to viewers. They monitor real-time data to maintain constant visibility into network performance. They sync their database of movies and shows with Elasticsearch, making it quick and easy for users to find what they’re looking for.
This must be in real time and 100% accurate. Old-school extract, transform, load (ETL) is just too slow. To meet this need, Netflix has built a change data capture (CDC) tool called DBLog that captures changes to MySQL, PostgreSQL, and other data sources and then streams those changes to target datastores for search and analysis.
Netflix required high availability and real-time sync. They also needed to minimize the impact on operational databases. Database log CDC keys, where changes to target databases are replicated in the order they occur, so that changes are committed as they occur, without locking records or otherwise pinning the source database.
Contents
Event
MetaBeat 2022
MetaBeat will bring together thought leaders to offer advice on how metaverse technology will change the way all industries communicate and do business October 4 in San Francisco, CA.
Register here
Data is at the heart of what Netflix does, but it’s not alone. Companies like Uber, Amazon, Airbnb, and Meta thrive because they truly understand how to make data work to their advantage. Data management and data analytics are strategic pillars for these organizations, and CDC technology plays a central role in their ability to perform their core business.
The same can be said of just about any company that is at the forefront of today’s business environment. If you want your company to operate as an A-player, you need to modernize and control your data. Your competitors are certainly doing it already.
Sub-second integration is the new standard at Airbnb and Uber
In today’s world, a strong customer experience demands real-time data flows. Airbnb recognized the value of CDC technology in creating a great CX for their customers and hosts. They too have built their own CDC platform, which they call Lumbar puncture. Airbnb’s dynamic pricing, ad availability, and reservation status require impeccable accuracy and consistency across systems. When an Airbnb customer books a visit, they expect the workflows to be very fast and 100% accurate.
For Uber, immediacy is arguably even more important. Whether a customer is waiting for a ride to the airport or ordering a food delivery, timing is critical. Like Netflix and Airbnb, they developed their own CDC platform to synchronize data in multiple data stores in real time. Once again, a common set of requirements emerged. Uber wanted their solution to be extremely fast and fault-tolerant, with no data loss. They also needed a solution that wouldn’t degrade the performance of their source databases.
Change data registration for the rest of us
Again, CDC fits the bill. In the past, nighttime ETL in batch mode may have been enough to provide a daily executive update or operational reports. Today, real-time is increasingly the norm. If information is power, direct access to information is turbo power.
Therefore, CDC is quickly becoming a fundamental requirement for the modern data stack. However, it’s all well and good that big companies like Netflix, Airbnb, and Uber have the resources to build custom CDC platforms — but what about everyone else?
Out-of-the-box CDC solutions fill that gap, delivering the same low-latency, high-quality streaming pipelines without the need to build from scratch.
Unfortunately, not all are created equal. Most companies use a collection of systems that handle Enterprise Resource Planning (ERP), Customer Relationship Management (CRM), or specialized operational functions such as procurement or HR. These run on different database platforms, with incongruent data models. If a company uses mainframe systems, it is likely dealing with mysterious data structures that don’t easily fit alongside modern relational data.
This makes heterogeneous integration particularly important. It requires connection to multiple data sources and targets, including transaction databases such as SAP, Oracle, IBM Db2, and Salesforce. It means delivering real-time streaming data to platforms such as Databricks, Kafka, Snowflake, Amazon DocumentDB, and Azure Synapse Analytics.
Real-time CDC automation
To drive artificial intelligence (AI) and advanced analytics, enterprises need to push their data to a common MDS platform. That means taking information from multiple sources, transforming it into a unified model for analytics, and delivering it to a modern cloud-based data platform.
Change data capture technology serves as a critical link in the data-driven value chain – first by automating data ingestion from source systems, then transforming it on-the-fly and delivering it to a cloud data platform. Real-time CDC automation ensures that the right information gets to the right place, right away.
Because they only focus on data that has changed, streaming CDC pipelines offer huge efficiencies over the batch mode operations of the past. The best CDC solutions can deliver more than 100 terabytes of data from source to destination in less than 30 minutes, without data loss.
The shift to cloud computing is in full swing. Cloud analytics, in particular, offers clear benefits to companies that truly understand the transformative role of data. Leading companies in every industry align their strategic views on data analytics. They digitize their interactions with customers and use algorithms to study data, gain insights and take action. AI and machine learning absorb massive amounts of information, discover correlations and identify anomalies.
Whether you’re at the forefront of digital disruption or just trying to keep up with the pack, CDC technology will play a vital role in realizing the modern data stack and opening the door to digital transformation.
Gary Hagmueller is CEO at Arcion.
DataDecision makers
Welcome to the VentureBeat Community!
DataDecisionMakers is where experts, including the technical people who do data work, can share data-related insights and innovation.
If you want to read about the very latest ideas and up-to-date information, best practices and the future of data and data technology, join DataDecisionMakers.
You might even consider contributing an article yourself!
Read more from DataDecisionMakers
Janice has been with businesskinda for 5 years, writing copy for client websites, blog posts, EDMs and other mediums to engage readers and encourage action. By collaborating with clients, our SEO manager and the wider businesskinda team, Janice seeks to understand an audience before creating memorable, persuasive copy.