Description
During my time working with the Zooniverse, the team decided to switch from using AWS to Azure as its cloud service provider. I oversaw the migration of all data from AWS to Azure, which involved moving over 73 million files in S3 over to Azure Blob Storage, and migrating 8 production databases from one provider to the other.
S3 to Blob Storage scope of work:
- Create a new Azure adapter in order for the Zooniverse API to communicate with Azure’s media services – the existing code around media storage was written using the adapter design pattern in order to make it easy to switch service providers without needing to change the code in every place where a media file is retrieved/modified/etc. An in-depth overview of the work involved can be viewed here, and the code for this feature can be viewed here.
- Set up a canary deployment for staging and prod in kubernetes. These canary deployments have the app configured to use the azure adapter. API requests are directed to the canary deployment when a special header is set on the request, allowing dev users to test the new code and migrated media.
- Create storage locations in Azure Blob Storage that map directly to existing AWS storage locations.
- Copy media files from AWS S3 to newly created Azure storage locations using azcopy (Azure CLI utility).
- Test to confirm all files were copied successfully and that all media file operations work as expected with Azure (both rspec tests and manual testing). Deploy the changes to production.
- Update the Azure Front Door configuration to have Zooniverse’s user-friendly media URLs route to Azure Blob Storage containers instead of S3.
Postgres Database Migration scope of work:
- Create new servers in Azure to host migrated databases. For each database in AWS, create an empty database with the same name in Azure, and import the schema using a schema dump file generated via pg_dump CLI utility.
- Migrate the primary Zooniverse database using the Azure’s Database Migration Service (DMS).
- Migrate all other databases (sized < 10 GB) by doing a dump and restore, using pg_dump and pg_restore Postgres CLI tools on a VM optimized for the process.
Details
- Tools Azure Blob Storage, Azure Database Migration Service, Azure Front Door, azcopy (Azure CLI Tool), pg_dump & pg_restore (Postgres CLI Tools), Kubernetes