Why Sharding Gets Hard, and Why You Need Vitess

Database Deployment Version 1
Database Deployment Version 2
Database Deployment Version 3
  1. Their total data volume was no longer limited to what could fit on a single host.
  2. They were able to get write throughput beyond what they could get on a single host.
  1. The applications at TubeYou had to start understanding which shard to route their queries to. Baking in this logic to each application was a pain.
  2. It took TubeYou a long time and a lot of developers to figure out how to migrate their single node database to a sharded database without taking downtime. This was a very expensive migration that they really did not want to do again.
  3. TubeYou had to start dealing with a whole cluster of database nodes rather than just one. This meant that failures started happening at a higher frequency and the oncall would would have to wake up often in the middle of the night to figure out how to repair nodes which had failed.
Database Deployment Version 4
  1. There were so many database nodes that nodes where failing all the time. These failures had to detected and new nodes needed to be brought into the cluster. These new nodes had to catch their state up to match the existing nodes in the cluster.
  2. When the master would fail a failover would need to be triggered.
  3. Schema rollouts across all shards in all regions where very complex.
  4. A topology service had to be introduced to keep track of leader location and route user queries to the correct locations.
  • The features Vitess offers
  • Its high level architecture
  • How query routing works
  • How reads and writes serviced
  • How failures handled
  • What we can learn from Vitess



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Andrew Dawson

Andrew Dawson

Senior software engineer with an interest in building large scale infrastructure systems.