Zero Downtime Migrations with ActiveRecord and PostgreSQL

Photo by bioxzers As our database and traffic continued to grow, we discovered that certain migrations which appeared harmless actually caused table locking and, in some cases, downtime! Instead of requiring all of our engineers to remember each one of these special cases, we’ve decided to automatically audit migrations at development and test time using the […]

Photo by bioxzers

As our database and traffic continued to grow, we discovered that certain migrations which appeared harmless actually caused table locking and, in some cases, downtime!

Instead of requiring all of our engineers to remember each one of these special cases, we’ve decided to automatically audit migrations at development and test time using the zero_downtime_migrations gem.

This gem detects problematic migrations and raises an exception with instructions on how to perform the same operation the “zero downtime way” before it ever has a chance to reach production. It catches things like:

Let’s take a look at what happens when we add a column with a default.

class AddPublishedToPosts < ActiveRecord::Migration
def change
add_column :posts, :published, :boolean, default: true
end
end

This migration looks pretty simple! Unfortunately, if the posts table is large or receives significant traffic, then this migration will end up locking the table and blocking database connections until the service goes down!

When we run rake db:migrate with this gem installed, we’ll receive an exception that displays instructions for the “zero downtime way” to perform the changes in this migration. Let’s take a closer look at this exception.

ZeroDowntimeMigrations::UnsafeMigrationError: Adding a column with a default is unsafe!
This action can lock your database table! Let's try doing this the "zero downtime way" instead.
First let’s add the column without a default. When we add
a column with a default it has to lock the table while it
performs an UPDATE for ALL rows to set this new default.
This can take a long time with significant size or traffic!
  class AddPublishedToPosts < ActiveRecord::Migration
def change
add_column :posts, :published, :boolean
end
end
Then we’ll set the new column default in a separate migration.
Note that this does not update any existing data! This only
sets the default for newly inserted rows going forward.
  class AddDefaultPublishedToPosts < ActiveRecord::Migration
def change
change_column_default :posts, :published, true
end
end
Finally we’ll backport the default value for existing data in
batches. This should be done in its own migration as well.
Updating in batches allows us to only lock 1000 rows at a time
(or whatever batch size we prefer).
  class BackportDefaultPublishedToPosts < ActiveRecord::Migration
def change
Post.select(:id).find_in_batches.with_index do |records, i|
puts "Processing batch #{i + 1}r"
Post.where(id: records).update_all(published: true)
end
end
end
If you're 100% positive that this migration is already safe, then wrap the call to `add_column` in a `safety_assured` block.
  class AddPublishedToPosts < ActiveRecord::Migration
def change
safety_assured do
add_column :post, :published, :boolean, default: true
end
end
end
# ./lib/zero_downtime_migrations/validation.rb:25:in `error!'
# ./lib/zero_downtime_migrations/validation/add_column.rb:6:in `validate!'
# ./lib/zero_downtime_migrations/validation.rb:8:in `validate!'

Check out the zero_downtime_migrations README for more info about the issues this gem addresses as well as a list of all the error messages and instructions it provides to help prevent downtime!

This gem doesn’t catch all of the special cases that we know of yet. Since we’re using Heroku Preboot for zero downtime deployments we also need to watch out for problematic migrations like:

  • Renaming columns or tables
  • Dropping columns or tables
  • Changing column types

We’ll continue to enhance and add validations to this gem over time as we encounter and learn from new problematic migrations. Try it out in your own projects and let us know what you think! Pull requests are welcome!

Our engineering team is still growing! We’re hiring engineers in our San Francisco and Columbus, OH offices. Check out our careers page to learn more. We look forward to hearing from you!


Zero Downtime Migrations with ActiveRecord and PostgreSQL was originally published in LendingHome Tech on Medium, where people are continuing the conversation by highlighting and responding to this story.

Source: LendingHome