Altering Large Tables on Live Databases
You may know that we currently leave altering some key tables because we have production databases that are giant and an alter would bring down said sites. About a week or two ago I did some research and found a tool that can help us with this and I wanted to put this on everyone's radar.
The tool we use is called pt-online-schema-change:
The tool cleverly uses regular MySQL commands to slowly, but surely alter a table without affecting database access as much as possible. It's something we could do ourselves, but since the commands are tedious it's nice to have an automation script.
I've wrapped the tool into my internal jobber utility which is not open for use due to security, but we'll look into granting access to certain folks in ops maybe. Here is an example table alter output:
[2018-09-10 16:22:58] - Running AlterTableJob on 5019960 xamarin.vanillacommunities.com cl40011 db4 [2018-09-10 16:22:58] - Running pt-online-schema-change [2018-09-10 16:22:59] - No slaves found. See --recursion-method if host db4.cl40011.vanilladev.com has slaves. [2018-09-10 16:22:59] - Not checking slave lag because no slaves were found and --check-slave-lag was not specified. [2018-09-10 16:22:59] - Operation, tries, wait: [2018-09-10 16:22:59] - analyze_table, 10, 1 [2018-09-10 16:22:59] - copy_rows, 10, 0.25 [2018-09-10 16:22:59] - create_triggers, 10, 1 [2018-09-10 16:22:59] - drop_triggers, 10, 1 [2018-09-10 16:22:59] - swap_tables, 10, 1 [2018-09-10 16:22:59] - update_foreign_keys, 10, 1 [2018-09-10 16:22:59] - Altering `vf_xamarin_5JE2B`.`GDN_User`... [2018-09-10 16:23:00] - Creating new table... [2018-09-10 16:23:00] - Created new table vf_xamarin_5JE2B._GDN_User_new OK. [2018-09-10 16:23:00] - Altering new table... [2018-09-10 16:23:00] - Altered `vf_xamarin_5JE2B`.`_GDN_User_new` OK. [2018-09-10 16:23:00] - Creating triggers... [2018-09-10 16:23:00] - Created triggers OK. [2018-09-10 16:23:00] - Copying approximately 366861 rows... Copying `vf_xamarin_5JE2B`.`GDN_User`: 51% 00:27 remain Copying `vf_xamarin_5JE2B`.`GDN_User`: 93% 00:04 remain [2018-09-10 16:24:07] - Copied rows OK. [2018-09-10 16:24:07] - Analyzing new table... [2018-09-10 16:24:07] - Swapping tables... [2018-09-10 16:24:07] - Swapped original and new tables OK. [2018-09-10 16:24:07] - Dropping old table... [2018-09-10 16:24:07] - Dropped old table `vf_xamarin_5JE2B`.`_GDN_User_old` OK. [2018-09-10 16:24:07] - Dropping triggers... [2018-09-10 16:24:07] - Dropped triggers OK. [2018-09-10 16:24:07] - Successfully altered `vf_xamarin_5JE2B`.`GDN_User`. [2018-09-10 16:24:07] - 1.2m
For those that are interested in learning more about database management you may want to study the tool a bit. Basically what it does is the following:
- Create a new table that has the old structure.
- Alter the table to the new structure.
- Create triggers to synchronize any insert/updates from the old table.
- Insert old data from the old table.
- Rename the new table over top the new table.
This process definitely increases database load, but it keeps resource locking down to an absolute minimum. In the above example I would estimate a strait table alter would be much quicker, but would probably cause the site to lock up during its time. For even larger sites an alter would take down the entire cluster.