We recently migrated a database from SQL Server 2005 X86 to 2008 R2 X64. Running an Enterprise Edition License, we were able to take advantage of partitioning. It had several tables with similar structures; row count north of 2 Billion rows.
Unique constraint: TypeId, ContentData. Added InsertionDate to comply with partitioning requirements.
Table had a partition per month.
On the first week of each new month we marked the previous month partition as read-only.
A one-time File backup was performed to the previous month partition.
Deltas (new data) would always be located on the last partition
The current months’ partition would be the only piece left in Read-Write mode.
Searching for ContentData for a particular TypeId scanned as many times as partitions used
Bulk Data comparison between new and old TypeId + ContentData performed an index seek on each partition
1 2 3 4
SELECT ContentData INTO #temp FROM sampletable WHERE TypeId =41;
The table is mostly used for retrieving and comparing data. The retrieval call will always pass in TypeId as parameter. The system now performs an index scan for 1 partition (1 scan count). For data comparison, an index seek is performed for a unique value under 1 partition only. Data compression reduced I/O reads by 50%.
In the past, data extraction and comparison scanned through all the partitions generating excessive I/O calls, but we did not need to worry about continuously backing up older partitions as they were already flagged as read-only and backed up.
With this exercise we have effectively increased performance at the expense of maintenance. Well worth the change.
Over a year ago we realized that the DBA team spent a very long time digging through data we had everywhere, counting billions and billions of rows for various systems, and that needed to be beautified in Excel with charts and the whole nine yards to be delivered to business owners.
That prompted us to take the decision and move to the next level. At the beginning, we called it our future Data Warehouse, then DataMart (as we were going to start with just one subject area), then Reporting Solution and finally decided to just call it “Business Intelligence”.
To speed up the process, we hired 2 consultants. A Data Analyst and a BI Architect. Everything started pretty good on the first week by deciding on what needed to be done first for us to start gathering requirements. Unfortunately other high priority tasks took over and the Data Analyst switched their duties to something else. That left us with just one BI Architect.
Anthony Sammartino (Twitter | Blog), of TekPartners Business Intelligence Solutions, assumed the role of Data Analyst, Project Manager and BI Architect. This was not easy as he needed to spend time creating proper questionnaires and meeting with different business owners to start gathering requirements. Then he needed to come up with proper schedules and plan the work.
As not everything worked according to plan (business owners needed to prioritize other work) we could not gather enough requirements from all of the different businesses. Plus there was still this fuzzy understanding of what BI meant.
We approached a different strategy and started pinpointing the type of reports that were generated mostly from the transactional data. Anthony came with the great idea of replicating it to a new DB instance and let our users connect and just query it all day long. We setup a dedicated Distribution and Subscriber Servers under SQL Server 2008 Evaluation Edition. We named the Subscriber our Operational Data Store (ODS).
As we continued moving data from different Publishers we realized that the ODS had become a somewhat important system as more users started using it, and even the DBA team realized that was easier to join data from multiple businesses in one instance. We were still in Beta phase as we needed to prove that this was a viable purchase.
Then SQL Pass Summit 2009 came by, I went to Seattle, learned a ton of new stuff about Performance Tuning and BI. I came back to Florida and planned a migration of our Distribution and ODS to SQL Server 2008 R2 November CTP Edition. I had just two things in mind, PowerPivot and the new features provided by Reporting Services.
We executed the plan, rebuilt both servers with the latest CTP release, re-installed the SSL certificate for SSRS and we were done. This was performed on a weekend and was completely transparent to the end users.
We continued working with this environment, faced some challenges with multi-million row partitioned tables with no primary keys which could not be replicated, with the lack of redundancy and of course the CTP edition we were running on. But again, this was still in a Beta Phase.
Months came by, we continued deploying more subscribers, enhancing transactional reports and enabling filtered indexes and compression on our ODS to reduce the amount of I/Os.
A decision was taken as we wanted this system to go live with the RTM version of SQL Server 2008 R2. Licensing was purchased and new a hardware was queued up to be purchased later.
While this happened, we focused our efforts again on BI. We brought in Robin Haropulos from TekPartners. She eventually assumed the Project Manager Role and concentrated on one of our largest business units. The one that brought in 100 million records a day.
The team sat down with the business owners, end users, analysts and pretty much everyone that accessed that data, crunched from legacy transactional systems.
I decided that we needed a good ETL resource that knew SSIS inside and out. Luis Figueroa (Twitter| Blog), also from TekPartners, came on board, and while he was not yet an expert, he proved to us that he could master it in a matter of months. I bet on that and thankfully I was right.
I also wanted to have an additional resource on board who had a good knowledge on replication and the need to learn SSAS. Rich Bartho (Twitter| Blog) switched his main duties and started helping us out on this new project.
We came up with 28 dimensions, and 4 facts for the first release. We made sure the SSIS performed all its transformation as an isolated entity and only touched the source to extract the data. We also took advantage of Change Data Capture and Replication in order to have the systems pull from a replicated site instead of the main source and minimize production overhead.
After the BI architecture design, project plan, and first fact and dimension requirements were complete, Anthony trained the entire team in the areas of SSAS cube design and development, Replication design, and BI Project Management to ensure we were on the correct path to complete the project successfully before he completed his assignment with us at Mediawhiz. I quickly moved another resource, James Yagielo (also from TekPartners), to help us out with the ETL packages. The project was running on track, the Infrastructure team finalized with the build of 4 new instances and a High Availability Active/Active Cluster with plenty of RAM and I/O to spare. We were hooked up to 48 spindles and 3 GB/s teamed dedicated network connections to the SAN. They also took care of the Reporting Services instances and created 4 VMs.
We went ahead and soft launched over 10 days ago, fired up a new distribution instance, a ODS/Historical instance, Dimensional instance and of course the SSIS/SSAS dedicated box. This solution can scale to multiple nodes if additional CPU cycles are required; one of the advantages of clustering. Reporting Services was launched as a scaled out deployment into multiple VMs and load balanced by F5.
We ensured that all the data was consistent, checked that the packages were efficient, performed health checks by looking at the wait stats and tuned the instances based on what we found.
Two days ago, we presented the solution to the business owners, with multiple sample reports generated by SSMS, Regular Pivot Tables and PowerPivot for Excel. We sliced and diced at the client level, server level and the results were coming back pretty fast. They were extremely happy as they just found a way to slice “everything by everything”. I even heard the statement ”This will be the company’s biggest asset”.
We will continue adding more facts in the next 3 months and be done with the bulk for this business unit. Then we will extend the current solution to include additional business units.
We will also finalize with the migration of the remaining objects, like Reports running on the old instance, remaining ODS tables, new replication deployments and of course the migration of a legacy SQL Server 2005 SSAS Cube.
We had a cube already? Yes, but we never called it BI, we called it just enhanced reporting. The Cube itself is fine but the problem is in the legacy ETL. That code will be wiped out and redeployed with best practices onto our SSIS instance.
All our solutions are running under SQL Server 2008 R2 RTM, Enterprise Edition.
Well, that’s it for now. I am lucky to be surrounded by a talented team that has allowed us to get to the next level. And this is just the beginning.
Sam Saffron commented on my blog post asking me if I could share my thoughts on an issue he was experiencing. This was on a newly migrated SQL Server 2008 R2 database solution running on Nehalem Processor based CPUs. I went ahead and shared what I experienced in my environment.
It turned out that the issue was not Hyper Threading related, but a Bios misconfiguration instead. This is an excerpt of his blog post:
In the process of asking the question and researching it ourselves, we discovered the answer. These Dell servers were inexplicably shipped with BIOS settings that …
did not allow the host operating system to control the CPU power settings
did not set the machine to high performance mode
did not scale CPU speed under load properly
… kind of the worst of all worlds. But Kyle quickly flipped a few BIOS settings so that the machine was set to “hyperspeed mode”, and performance suddenly got a lot better. How much better?
A conclusion of this experience, check for Bios Configurations. Dell needs to tell their customers which settings can be beneficial for certain applications, like SQL Server.
Today we had a production migration and brought over one of our legacy applications from a data center located up north to one located in South Florida.
1. The locations are disconnected and far from eachother
2. We are going up 3 releases. From 2000 to 2008 R2 (8.0 to 10.5)
3. We are Downgrading from Enterprise Edition to Standard Edition
4. We are going from a 32-bit architecture server to a 64-bit one (X86 to X64)
The solution. We separated this into 2 pieces. A pre-migration and production migration process.
First, ensure that there are no enterprise edition features used. We were lucky this time as there were really none in the source database. But as an example, we got another server running 2005 EE which will be migrated to 2008 R2 SE and had partitioned tables. That is something to be taken care of before migration.
Second, ensure that the database has no corruption. A DBCC CheckDB needed to be performed on the source before backing it up.
Third, prepare for a test with a latest full backup. We FTP’d over the last full backup during a weekend and restored onto the target server running 2008 R2. We gzipped the files before transferring, but it was painfully slow still.
Fourth, restore and ensure that the databases open with no issues whatsoever. Update statistics and reindex any important table that is really defragmented. This is a challenge in Standard Edition as reindexing cannot be performed online.
Fifth, test and retest. This requires a lot of help from the Application Developers and QA Team. Ensure that functionality is intact and most importantly, that the database is always in a healthy condition (no excessive I/O reads, no high CPU utilization, etc).
Sixth, let the QA team sign-off.
Production Migration Process:
First. Plan the work. Create checklists with detailed steps to be performed before, during and after migration.
Second. Perform a full backup on the source server. We chose Saturday night. Disable the full backup job on the source server. Ensure that transaction log backups are still running at the source.
Third. Compress the backups, FTP to Florida, decompress and start restoring onto the target server.
Fourth. Bring over the transaction logs and continuously apply to the target server.
Fifth. In our environment we were able to disable outbound services and show a maintenance page on the web apps. This way the source database was accepting data but not processing it.
Sixth. Take a last transaction log backup, apply it to the target and open the databases.
Seventh. Update statistics, reindex for required objects and perform sanity check tests.
Eighth. Let the Infrastructure team make appropiate changes (like DNS) and enable services.
Ninth. Sync up any missing data. In our scenario, we were able to code an application that extracted missing transactions (just a few thousand) from the source server and apply to the target server. Let QA Test and Sign-Off.
Tenth. Perform a full backup and enable all maintenance jobs with appropriate schedules.
Each migration will have its own steps and some will be more difficult than others. I have specified very high level bullet points of what needed to happen for this particular migration.
What if the unplanned happens:
First, every checklist should have a rollback plan. In our case, we had steps to follow in case we needed to stop the migration. On the DB side was pretty simple as we just stop restoring onto the target server, go back to the source and enable maintenance jobs. Of course there are other steps that needed to be followed by development and infrastructure members.
But today we had an interesting scenario. While recreating indexes on a 25 Million row table, the DBA stopped the SQL Agent, which was part of the process outlined on the checklist. The problem, it was stopped through the Services Panel instead of the SQL Configuration Manager. So what happened? The Clustered service freaked out and decided to perform a fail-over.
The instance started but the database status was “in recovery”. As per the SQL Log, it was going to take like 2 hours to rollback all the changes. We had 2 options; 1 to just execute the rollback plan, or 2 drop the database and restore it once more.
We decided to execute option 2 as the transaction logs were small. The bulk was restoring the full backup, but fortunately it was done in under 30 minutes; not bad for a 120 GB database.
Now how did we drop the “in recovery” database? We opened the SQL Configuration Manager, offlined the database instance, deleted the datafiles from the “in recovery” database and re-started the instance. Of course the SQL log was complaining about missing files, so we went ahead and dropped the database.
We are now live on a new 2008 R2 instance, and are current with backups. Interesting to see that with database compression a 120 GB database backup uses just 25 GB of disk space.
Over 3 weeks ago we solved the issue we were having with one particular cluster node running SQL Server 2008 R2. We have been going back and forth with Microsoft to obtain an explanation on why the erratic behavior of the RDBMS engine.