Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration Race Condition #16

Open
mrmemes-eth opened this issue Jun 19, 2015 · 1 comment
Open

Migration Race Condition #16

mrmemes-eth opened this issue Jun 19, 2015 · 1 comment

Comments

@mrmemes-eth
Copy link

Due to the way tern manages its schema_versions table, it's relatively easy to create a situation where a slightly older migration will never get run on a developer or CI machine in a distributed team setting.

The scenario:

If two developers make separate migrations at roughly the same time and the first developer to push is also the one with the more recent migration timestamp, the migration from the second developer will not be run on their machine. This problem can promote to deployed environments where there is continuous deployment (we use Circle CI to push all passing commits to our staging env).

Potential solution:

Rather than using the current mechanism for determining if a migration has run or not:

https://github.com/bugsbio/lein-tern/blob/master/src/tern/migrate.clj#L27-L31

schema_versions could be treated as a set and when a migration's timestamp is not present, it is run.

Any thoughts?

@rsslldnphy
Copy link
Contributor

Hi @voxdolo - thanks for raising this issue! This is actually a really interesting problem - not one that we've run into yet as our team and project sizes are small.

What we basically need is a distributed consensus algorithm for deciding on a partial ordering of migrations (some migrations can be run out of order, some can't). And distributed systems are hard, which makes me think this needs careful thought before we jump into a solution. But also, because the problem is generic enough, there will be plenty of existing solutions in other domains that we can weigh up and pick the best from. However I also think that the specific form of database migrations might mean we can make several assumptions that makes the problem a lot easier than the generic problem, too.

The only immediate possible issues with the idea of using a set for schema_versions I can think of are:

  1. It assumes that any migrations received out of order can be run out of order.
  2. It requires your entire history to be pristine and complete
  3. It would require a change in the way rollbacks are recorded

With regards to 1), maybe this is fine? In most cases if two developers are simultaneously working on things that both require migrations, you'd hope they'd be independent enough that the order in which they're run in doesn't matter. But I still feel like I need some convincing that there aren't any cases where this could cause problems, and that when problems do arise, it would fail in a sensible way.

To use a contrived example, if developer A created a migration to alter a NAME column to be VARCHAR(255) and developer B created a migration that altered the NAME column on the same table to be VARCHAR(5), the order in which they were merged could end up meaning you silently have a different schema on staging and production.

It's possible there's a reasonably easy way for us to work out which migrations are safe to run out of order - migrations that affect different tables should be safe, for example - but I think this could potentially become quite difficult when you start considering things like which tables + columns views depend on.

With regards to 2), my concern is that with the current mechanism, as long as the most recent entry in schema_versions matches the last migration that was run, tern should work as expected. However, if the check is done on the contents of schema_versions as a whole, everything needs to match up completely and for the entire history of your migrations. Which it should, if you correctly use tern for everything... but being pragmatic about things, sometimes the history of a project isn't always as pristine as you'd like it to be. Maybe you messed up a migration and then manually fixed it. Maybe you created a new DB by doing a schema dump and didn't bring along the contents of schema_versions... treating schema_versions as a set could cause tern to try to re-run lots of very old migrations. Maybe we could limit the check to only take into account the last n migrations to mitigate this problem? What do you think?

With regards to 3) - currently, if you run lein tern rollback, no rows are deleted from the schema_versions table - new rows are simply appended, so the most recent row actually now contains the older version number. So schema_versions contains not just a history of the migrations that have been applied, but also the rollbacks that have been performed. This means that taking the contents of the table as a set, with the current table structure, a rolled-back migration would still be present in the set, and so would not be re-run by the next tern migrate. To fix this we'd need to either change the way rollbacks work to either a) delete the corresponding row from schema_versions or b) alter the table to add a type column (i.e. does this row relate to migration or a rollback?) which enables us to correctly build the set. Either way upgrading to this new version would need to be carefully managed.

I'm interested in what your thoughts are on the above 3 things, and if there's anything else you can think of. I'm going to continue pondering it too.

Obviously the current scenario you describe, in which migrations could be missed, is far from ideal. If this is a problem you're facing, I wonder if it's worth getting a partial solution in place quickly while we try to come up with a better solution for the long run? What I'm thinking is a fairly simple approach: build a set from schema_versions and use it to check which migrations (or maybe only the last 10 migrations?) have been run. If any migrations newer than the oldest un-run migration have been run, throw an error. This at least means migrations won't be missed, and will let a human work out what the best thing is to do. If this check passes, run migrations based on the current mechanism, which will mean rollbacks will still work. Does this sound useful?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants