Declarative vs Imperative Schema Management Tools

Andrew Dawson
4 min readSep 16, 2022

--

Schema management tools are used to version control your schema and bring your schema to a certain state as part of your release process. Most schema management tools work by applying an ordered set of scripts to a database. Each of these scripts is called a migration script. These scripts get included in source control. When some new code is written which requires a schema change a new migration script is added. Then when the application is released some either automated or manual process will run the scripts to update the database prior to the code being released.

Let’s call schema management tools of this form imperative schema management tools. They are imperative because the schema author defines how the database should be updated by defining a set of migration scripts.

Imperative schema management tools must include rollback scripts in addition to migration scripts. In an ideal world every schema migration would go off without a hitch, but in the real world there will be problems which require schema rollbacks. In order to perform schema rollbacks, a rollback script gets written along side its corresponding migration script. Then in order to rollback the schema the rollback scripts needed to be run in reverse order.

Imperative schema management tools must explicitly define a schema version. The schema version will get reflected both in the migration scripts and in the database itself. This version will get used to determine which set of migration scripts need to get run. For performance and safety reasons it’s not desirable to run all the scripts on every release, so this version gets used to control which set of scripts to run. The current version in the database acts a lower bound, and the current version in the migration scripts acts as an upper bound. All schema migrations within this window bound need to get run in order to move the database to the new desired state.

While imperative based schema management is very common and there are a lot of tools written this way, there might be a better way to do schema management. Instead of this imperative based approach, another approach is to declarative define what you want the schema to look and let a tool figure out the steps to get to that goal state. In a declarative based approach to schema management there are no migration scripts, there are no rollback scripts and there are no database versions. The only thing that gets checked into source control is the desired schema. The schema management tool is responsible for computing a diff between the currently applied schema and the goal state schema. The tool will generate a series of operations to run in order to move the schema from its current state to the declared desired state.

In a declarative based approach to schema management much of the complexity is moved from the schema author into the schema management tool. This transfer of complexity comes with several upsides to the schema author that we will consider now —

Improved Readability

In a declarative based approach there is a single place which defines the whole schema. Anyone who wants to understand the desired state of the schema just needs to go to that file or directory and read a bunch of CREATE TABLE / CREATE INDEX / etc… statements. In contrast, in an imperative based approach there are an arbitrary number of migration scripts that must be read through in order to infer the desired state of the database.

I have seen a “solution” to this problem with imperative schema management before in which a fully resolved schema was manually kept up to date right along side the migration scripts. Every time a new migration script was added, the author would also be responsible for editing the fully resolved schema definition. This was bad because it created two sources of truth which could easily diverge.

Implicit Rollbacks and Versioning

In an imperative based approach to schema management rollback scripts have to be explicitly defined and versions have to be explicitly managed. However neither of these complexities needs to be managed in a declarative based approach.

Versions do not need to be explicitly managed because the schema management tool figures out the difference between the goal state and the current schema. There is no window of operations that needs to be constructed. Note that in a declarative based approach there are still versions of the schema, but they are simply reflected as the git history of the schema directory.

Rollbacks in a declarative based approach are identical to upgrades. Regardless of what the goal state is or what the ground reality schema is, the tool takes care to diffing the two and figuring out how to move the ground reality to the goal state. So in order to rollback, the git sha of the desired schema will be checked out and applied using the schema management tool.

Improved Setup Time

A downside of imperative based schema management is there is no upper bound on the number of migration scripts that can accumulate. This means when a new environment needs to be setup it can take a long time to run through all the historical migration scripts. This can become a problem for environments that need to be frequently setup and torn down like development clusters.

In order to address this problem in an imperative style migration tool, the migration scripts would need to be periodically flattened into a base snapshot. This base snapshot would be applied first and then the new migration scripts would be run on top of this. This is not ideal because it makes the history of the schema hard to follow.

In a declarative approach to schema management the schema is always declared in a the maximally flattened way. This keeps the version control history easy to read and helps with improved setup time of new clusters.

The tool I have been using for declarative based schema management recently is skeema. This tool offers all the benefits described in this article and more. There is a great blog post from the creator of skeema which compares an imperative based approach to schema management to a declarative based approach that I recommend folks read it if they want to learn more.

Thanks for reading.

--

--

Andrew Dawson
Andrew Dawson

Written by Andrew Dawson

Senior software engineer with an interest in building large scale infrastructure systems.

No responses yet