Migrating MySQL onto SSDs to help with performance. Let's see how this works out.

42% of U7 hosts, which are on the new cluster, got their MySQL migrated onto SSDs now! 🎉 will probably be done today.

Show thread

fun fact: each run contains progressively larger hosts. The first ones were ~3GB of MySQL data, the current ones are ~17GB and the largest one is >80GB. That one will take a while.

Show thread

Some explanation on what we're even up to. The process for this is fairly straight forward:

1. mount SSD storage
2. rsync w/ MariaDB running
3. stop MariaDB
4. rsync w/o MariaDB running
5. move /var/lib/mysql away
6. bind-mount SSD storage to there
7. start MariaDB

If we're lucky and no data changes between 2. and 3., step 4 is pretty much instant. That means almost no downtime. If we're unlucky, a bunch of stuff changed which leads to >20 minutes of downtime :(

Show thread

Admittedly, you can build the whole thing way more efficient and without any downtime using a primary/secondary setup. But any run-time we would have saved doing that, we'd have probably lost in dev-time. The latter one is more limited right now, so here we are.

Admittedly-admittedly it's not very smart to have MySQL on the application hosts in the first place. Changing that is a bigger (but planned!) project, though.

Show thread

aaand about 7 hours ago at 4:30 reality kicked in.

Linux has got these /dev/sdb things you can use to talk to your disks, right? Well, their names can randomly change or reorder sometimes. If you're handling multiple disks, that's a thing you should know, right? Despite knowing that, I still managed to use /dev/sdb and .../sdc directly. Two colleges colleagues even signed off on the related ansible playbook.

So I got the whole thing deployed and went to sleep at 2am.

Guess what happened next.

Show thread

After a reboot at 4:30, sdb and sdc switched places on machholz.uberspace.de. MySQL was subsequently very unhappy about suddenly not seeing any data anymore. That made our monitoring very unhappy. Which in turn lead to two of us trying to figure out what the heck happened.. at 4:45 am. So 30 minutes of debugging and MySQL downtime later, we fixed the problem on that host and all other ones by using UUIDs instead of /dev/sdc. The obvious way to go in the first place.

So, what did we learn?

Show thread
Sign in to participate in the conversation

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!