Today we had a production migration and brought over one of our legacy applications from a data center located up north to one located in South Florida.
The challenges:
1. The locations are disconnected and far from eachother
2. We are going up 3 releases. From 2000 to 2008 R2 (8.0 to 10.5)
3. We are Downgrading from Enterprise Edition to Standard Edition
4. We are going from a 32-bit architecture server to a 64-bit one (X86 to X64)
The solution. We separated this into 2 pieces. A pre-migration and production migration process.
Pre-Migration Process:
First, ensure that there are no enterprise edition features used. We were lucky this time as there were really none in the source database. But as an example, we got another server running 2005 EE which will be migrated to 2008 R2 SE and had partitioned tables. That is something to be taken care of before migration.
Second, ensure that the database has no corruption. A DBCC CheckDB needed to be performed on the source before backing it up.
Third, prepare for a test with a latest full backup. We FTP’d over the last full backup during a weekend and restored onto the target server running 2008 R2. We gzipped the files before transferring, but it was painfully slow still.
Fourth, restore and ensure that the databases open with no issues whatsoever. Update statistics and reindex any important table that is really defragmented. This is a challenge in Standard Edition as reindexing cannot be performed online.
Fifth, test and retest. This requires a lot of help from the Application Developers and QA Team. Ensure that functionality is intact and most importantly, that the database is always in a healthy condition (no excessive I/O reads, no high CPU utilization, etc).
Sixth, let the QA team sign-off.
Production Migration Process:
First. Plan the work. Create checklists with detailed steps to be performed before, during and after migration.
Second. Perform a full backup on the source server. We chose Saturday night. Disable the full backup job on the source server. Ensure that transaction log backups are still running at the source.
Third. Compress the backups, FTP to Florida, decompress and start restoring onto the target server.
Fourth. Bring over the transaction logs and continuously apply to the target server.
Fifth. In our environment we were able to disable outbound services and show a maintenance page on the web apps. This way the source database was accepting data but not processing it.
Sixth. Take a last transaction log backup, apply it to the target and open the databases.
Seventh. Update statistics, reindex for required objects and perform sanity check tests.
Eighth. Let the Infrastructure team make appropiate changes (like DNS) and enable services.
Ninth. Sync up any missing data. In our scenario, we were able to code an application that extracted missing transactions (just a few thousand) from the source server and apply to the target server. Let QA Test and Sign-Off.
Tenth. Perform a full backup and enable all maintenance jobs with appropriate schedules.
Notes:
Each migration will have its own steps and some will be more difficult than others. I have specified very high level bullet points of what needed to happen for this particular migration.
What if the unplanned happens:
First, every checklist should have a rollback plan. In our case, we had steps to follow in case we needed to stop the migration. On the DB side was pretty simple as we just stop restoring onto the target server, go back to the source and enable maintenance jobs. Of course there are other steps that needed to be followed by development and infrastructure members.
But today we had an interesting scenario. While recreating indexes on a 25 Million row table, the DBA stopped the SQL Agent, which was part of the process outlined on the checklist. The problem, it was stopped through the Services Panel instead of the SQL Configuration Manager. So what happened? The Clustered service freaked out and decided to perform a fail-over.
The instance started but the database status was “in recovery”. As per the SQL Log, it was going to take like 2 hours to rollback all the changes. We had 2 options; 1 to just execute the rollback plan, or 2 drop the database and restore it once more.
We decided to execute option 2 as the transaction logs were small. The bulk was restoring the full backup, but fortunately it was done in under 30 minutes; not bad for a 120 GB database.
Now how did we drop the “in recovery” database? We opened the SQL Configuration Manager, offlined the database instance, deleted the datafiles from the “in recovery” database and re-started the instance. Of course the SQL log was complaining about missing files, so we went ahead and dropped the database.
We are now live on a new 2008 R2 instance, and are current with backups. Interesting to see that with database compression a 120 GB database backup uses just 25 GB of disk space.
This was our last SQL Server 2000 instance.