How Vitess uses VReplication to Build Core Features

This post dives into what VReplication is and how it gets used in Vitess.

This is the final post in my Vitess series. We have covered a lot of ground so far. We started by taking about what motivated Vitess. Then we covered an overview of the Vitess feature set and an overview of the Vitess architecture. Most recently we covered the primitives that Vitess uses for sharding.

This final post will dive into Vitess VReplication. VReplication is at the core of how Vitess implements many of its coolest features. VReplication is used to implement resharding, materialized views, online schema migrations, change data capture (CDC) and more.

This post is going to take a bottom up approach. We will start by understanding the lowest level primitives involved. We will then work our way up to higher levels of abstractions and we will finish by illustrating how these abstractions can be used to implement a diversity of features. Lets get started…

Let’s take a look at this diagram to understand the data plane level components involved in VReplication.

Here we see that a VTTablet host runs a component called VStreamer. VStreamer fetches events from the MySQL bin log and makes them accessible to other components upstream of VTTablet.

So lets say a write arrives at a VTTablet host, the following will happen

  • The write will be recorded to MySQL
  • As part of persisting this write the write will also be recored in the MySQL bin log (more details on MySQL bin log here).
  • VStreamer will fetch updates from the bin log and expose those events for upstream components to consume.

Now that we understand the components involved at the data plane layer let’s add in the components that live in VTGate.

Notice that the data plane level components in this diagram look the same as previously shown. The only difference is we now show a new component at the VTGate layer called VStream. A VStream is responsible for aggregating the events from one or more VTTablet VStreamers. So VStreamer exposes bin log events for a single shard, and VStream exposes bin log events from a collection of shards.

Now that we understand the components involved to support streaming events from the MySQL bin logs we are ready to introduce what VReplication is.

VReplication is a low level feature in Vitess that enables creating one or more VStreams. Each VStream has a source and a target. The source is where data is being read from and the target is where data is being read into. The target could be another database, a new shard or a client that just wants to consume events.

VReplication will typically operate over more than a single VStream because in a sharded database typically more than a single VStream is needed to achieve some end objective.

Lets make a few more notes about VReplication before diving into its applications

  • Resumable: VReplication uses MySQL GTIDs in order to record progress and enable resuming from a given point. The ability to resume is critical to enabling replicating large amounts of data over long periods of times. To learn more about GTIDs you can checkout of this page.
  • Copy and Replicate Phase: VReplication runs in two phases. The first phase is a copy phase in which data is copied from source to target in chunks as quickly as possible. Once the copy phase has brought the target nearly up to date to the source then VReplication enters the replicate phase. The replication phase runs until the underlying VStream is explicitly torn down by the user. During the replication phase the target is continually kept up to date with the source.
  • Cutover: Vitess topology server can be updated to control traffic routing. While this is not technically a VReplication feature — they work hand in hand. Depending on the use case that VReplication is being used for there might be need to redirect read and/or write traffic to the VReplication target. Through updating the routing rules in topology server this routing can be controlled.

At this point we have all the background we need to understand how VReplication can be used to implement many of the key features in Vitess. Let’s look at some examples.

Resharding

Using Vitess you are able to merge or split shards. Let’s suppose you have a database that has two shards named -80 and 80- and suppose we want to split -80 into two new shards -40 and 40-80. In order to do this we will issue a reshard command to Vitess. This high level resharding feature is implemented using VReplication. Let’s take a look at the steps involved in this reshard.

  • Vitess will establish N VStreams where N is the number of source shards multiplied by the number of target shards. In this case we have a single source shard -80 and two target shards -40 and 40-80 — therefore there will be two VStreams. The first VStream will be responsible for copying data from the source of -80 to the target of -40 the second VStream will be responsible for copying data from the source of -80 to the target of 40-80.
  • These VStreams will copy data from all tables in source to all tables in target. But only rows will be selected which should exist in the target shards.
  • Once the VStreams have brought the target shards nearly up to date with the source shards the operator will trigger a switch traffic command. This switch traffic command will update topology server to cause reads and/or writes to be directed to the target shards.
  • Once reads and writes are happily happening on the new target shards the VStreams can be torn down and the old shard can be deleted from the database.

Online Schema Migrations

Let’s say you want to make a schema change which requires backfilling data. This could include a schema change like adding a new secondary index to your table. Or it could include adding a new column that has some default value other than the null value.

Applying these types of schema changes can get pretty hairy especially in a sharded database. Let’s just take a few issues

  • If the schema change is rolled out to some shards and then fails you can end up in a very bad state where your database does not have a consistent schema across all shards.
  • In order to do the backfill in a way that is consistent you need to lock the table it is being applied to. This means write downtime for your users.

Vitess provides support for online schema migration with nearly no write downtime. It supports this using VReplication. It is done using the following steps

  • New tables are created with the new schemas.
  • VStreams are created to replicate data from the source table to the new target table. As this replication is happening any backfills that need to happen can be done.
  • Once the target tables are sufficiently brought up to data there is a cutover event (very much like what was done in resharding). After this cutover event the new tables start to get live traffic and the old tables can be torn down.

Through understanding how resharding and schema changes are built on top of VReplication it should be clear that VReplication is a very powerful general purpose primitive. In fact it is used for many more features in Vitess including: CDC, Materialized Views and Data Rollups. You can read more about all the things VReplication gets used for here.

The reason I think VReplication is such a powerful primitive is copying data from source to target in a resumable and resilient fashion enables the underlying data to stay live for users while introducing a new version of the data that can be atomically switched over to. This new version of the data could be new in terms of schema or in terms of which nodes it lives on or in terms of what aggregation it uses. Each of these different modifications of the data during VReplication manifests themselves as a different high level Vitess features.

Well that is all I have to say about Vitess for now. Thank you very much for reading. I hope you learnt something.

Next up I am going to go back in time a bit and study up on Bigtable — so look forward to those posts coming out soon.

Until next time — cheers.

Senior software engineer with an interest in building large scale infrastructure systems.