Migrating SQL Server to Azure at Scale (Pass Summit 2020)

Featured

In November 2020, for the (until then) paid conference SQL Pass 2020, my colleague Avnish Rastogi and myself, presented the topic of how to migrate hundreds of SQL Server databases to Azure at scale.

We executed as Architects within of the Data Migration group at Microsoft, and worked in several engagements assessing multiple source databases and landing them into Azure. From the use cases, we compiled the steps, methodologies and services required to asses and migrate on-premises SQL Server instances to Azure Managed Instance. Leveraging PowerShell, we were able to automate and scale the migration of multiple databases.

Continue reading

VCenter Server 4.1 Running under SQL Server pushes tempdb storage allocation through the roof

And to alleviate the problem temporarily we needed to bounce the instance and clear up the temp database files.

But luckily there is a patch.

More information provided in this article.

We also needed to disable the Read Committed Snapshot Isolation Level in order to keep the growth under control. More information about it this article.

Offending query:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
SELECT HOST.NAME AS HOST, VOLUME.HOST_ID AS HOST_ENTITY_ID, CASE VOLUME_TYPE.TYPE
WHEN 'parallelScsi' THEN CAST(VOLUME_TYPE.HOST_ID AS VARCHAR(255)) + '-' + VOLUME.UUID
WHEN 'block' THEN CAST(VOLUME_TYPE.HOST_ID AS VARCHAR(255)) + '-' + VOLUME.UUID
ELSE VOLUME.UUID
END AS ENTITY_ID, VOLUME.CANONICAL_NAME, VPX_PSA_PATH.LUN_NUMBER, VPX_HOST_BUS_ADAPTER.DEVICE_VAL + ':C' + CAST(VPX_PSA_PATH.CHANNEL_NUMBER AS VARCHAR(3))+ ':T' + CAST(VPX_PSA_PATH.TARGET_NUMBER AS VARCHAR(3)) + ':L' +
CAST (VPX_PSA_PATH.LUN_NUMBER AS VARCHAR(3))
AS RUNTIME_NAME, VOLUME.DISPLAY_NAME AS NAME FROM VPX_SCSI_LUN VOLUME
INNER JOIN VPX_ENTITY HOST ON VOLUME.HOST_ID = HOST.ID
INNER JOIN VPX_PSA_DEVICE DEVICE ON VOLUME.KEY_VAL = DEVICE.LINK_KEY
INNER JOIN VPX_PSA_PATH ON VPX_PSA_PATH.SCSI_LUN_ID = DEVICE.ID
INNER JOIN (SELECT DISTINCT VPX_PSA_PATH.HOST_ID AS HOST_ID, SCSI_LUN.UUID AS UUID, CASE HOST_BUS_ADAPTER.VPX_TYPE
WHEN N'vim.host.ParallelScsiHba' THEN 'parallelScsi'
WHEN N'vim.host.FibreChannelHba' THEN 'fc'
WHEN N'vim.host.InternetScsiHba' THEN 'iscsi'
ELSE 'block'
END
AS TYPE FROM VPX_PSA_PATH, (SELECT VPX_SCSI_LUN.UUID AS UUID, VPX_PSA_DEVICE.KEY_VAL AS KEY_VAL FROM VPX_PSA_DEVICE, VPX_SCSI_LUN
WHERE VPX_PSA_DEVICE.LINK_KEY = VPX_SCSI_LUN.KEY_VAL) SCSI_LUN, (SELECT VPX_HOST_BUS_ADAPTER.VPX_TYPE AS VPX_TYPE, VPX_PSA_ADAPTER.KEY_VAL AS KEY_VAL, VPX_PSA_ADAPTER.HOST_ID AS HOST_ID
FROM VPX_PSA_ADAPTER, VPX_HOST_BUS_ADAPTER
WHERE VPX_PSA_ADAPTER.HOST_ID = VPX_HOST_BUS_ADAPTER.HOST_ID
AND VPX_PSA_ADAPTER.LINK_KEY = VPX_HOST_BUS_ADAPTER.KEY_VAL) HOST_BUS_ADAPTER
WHERE VPX_PSA_PATH.LUN_LINK_KEY = SCSI_LUN.KEY_VAL
AND VPX_PSA_PATH.ADAPTER_LINK_KEY = HOST_BUS_ADAPTER.KEY_VAL
AND VPX_PSA_PATH.HOST_ID = HOST_BUS_ADAPTER.HOST_ID) VOLUME_TYPE ON HOST.ID = VOLUME_TYPE.HOST_ID AND VOLUME.UUID = VOLUME_TYPE.UUID, VPX_HOST_BUS_ADAPTER, VPX_PSA_ADAPTER
WHERE (HOST.TYPE_ID = '1')
AND (VPX_PSA_PATH.ADAPTER_LINK_KEY = VPX_PSA_ADAPTER.KEY_VAL AND VPX_PSA_PATH.HOST_ID = VPX_PSA_ADAPTER.HOST_ID
AND VPX_PSA_ADAPTER.LINK_KEY = VPX_HOST_BUS_ADAPTER.KEY_VAL AND VPX_PSA_ADAPTER.HOST_ID = VPX_HOST_BUS_ADAPTER.HOST_ID)
AND VOLUME.LUN_TYPE='disk'
ORDER BY HOST_ENTITY_ID, ENTITY_ID, RUNTIME_NAME

Max Memory Configuration set too low [SQL Server]

We have been experiencing an issue with a clustered node where instances will run extremely slow and the only process eating CPU resources is “NT Kernel & System”. We have upgraded drivers and even wiped out and reinstalled Windows and SQL Server (both 2008 R2) and the issue reappears.

In the process of troubleshooting with Microsoft, we needed to fail over instances to collect log information. We have done this several times. Because our latest cluster implementations include Active/Active nodes, we need to adjust Max Memory to SQL Server whenever all instances are in one node.

Yesterday something did not go right and a human error caused the instances to failed. Even experienced DBAs will eventually break something, but they will do anything possible to correct. Memory was set to 27 instead of 27000 MB. No connections could be made to the instance.

There are 3 ways this situation can be resolved:

  1. Opening a dedicated connection to the instance (DAC) and changing the max memory configuration.
  2. Shutting down the instance and starting it via command prompt using the -f option. It will fire up the instance with minimal configuration. After that change the max memory configuration.
  3. Bouncing the instance and changing the memory configuration as soon as the instance starts. We are taking advantage of a fresh instance with no allocated cache yet.

Changing the memory configuration:

1
2
3
4
5
6
7
8
sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO
sp_configure 'max server memory', 27000;
GO
RECONFIGURE;
GO

Coaching point, double or even triple check before issuing a configuration command.

SQL Server Memory Configuration via SSMS

SQL Server Memory Configuration via SSMS

Excel X64 driver and SQL Server 2008 R2 on cluster node [Fix]

As we continued with the execution of our migration plans from SQL Server 2005 to 2008 R2 X64, we found challenges than needed to be resolved pretty quickly. The latest issue we encountered was the Microsoft.ACE.OLEDB.12.0 provider in SQL Server 2008 R2 X64.

With that same provider, we were able to load Excel files directly into SQL Server 2005 X64 (SP3 and SP4), but had no luck in 2008 R2.

The prerequisites are to install the Microsoft Access Database Engine 2010 Redistributable, and to add the provider, with a couple of configuration changes:

Configuring:

1
2
3
4
5
6
7
8
9
10
11
12
13
exec sp_configure 'show advanced options', 1;
reconfigure;
GO;
 
exec sp_configure 'Ad Hoc Distributed Queries', 1;
reconfigure;
GO;

EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'AllowInProcess', 1;
GO;

EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'DynamicParameters', 1;
GO;

Testing:

1
2
3
4
SELECT *
FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',
'Excel 12.0;Database=M:\MSSQL10_50.MWTOOLS\MSSQL\DATA\dreambox.xlsx;HDR=Yes;IMEX=1',
'SELECT * FROM [Subsets & Ads$] WHERE Headline IS NOT NULL')

OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "(null)"

OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "(null)"

We had no luck with the provider until we pinpointed a workaround on a blog post using the MSDASQL provider instead

Workaround:

1
2
SELECT * FROM OPENROWSET('MSDASQL',
'DRIVER=Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb);FIL=excel 12.0;DriverId=1046;DefaultDir=M:\MSSQL10_50.MWTOOLS\MSSQL\DATA\dreambox.xlsx', 'SELECT * FROM [Subsets & Ads$]')

After the change our application worked flawlessly.

Windowed Functions empowering analytics [#TSQL2sday]

T-SQL Tuesday #16

T-SQL Tuesday #16

This blog entry is participating in the T-SQL Tuesday #16 for the month of February, hosted by Jes Schultz Borland (Blog|Twitter).

Back in 2001 I was involved in a project migrating an e-commerce site to a new platform. We were mapping and moving data from an Oracle 8.1.5 to an Oracle 8.1.7 instance. The framework was the beloved Dynamo running on Sun Solaris. The Dynamo engineers decided to use sequencers on each entity (e.g. product, category, etc). I was trying to figure out the best reusable PL/SQL block to move the data creating the correct sequence until I was pointed out that Oracle 8.1.6 supported Analytical Functions. After reading and practicing I discovered the power of those functions.

For the purpose of this blog I will refer the Analytical Functions as Windowed Functions instead. They are very useful as it allows the user to crunch measures of subsets of data maintaining the “raw” detail level data. For example, it allows us to get the average price of a product category while also displaying the unit price of each product. It can also help us rank a product by its price and “window” it by category.

Examples

Basic Aggregation:

1
2
3
4
5
-- Qty of products and average price
SELECT
    COUNT(productKey) AS [Quantity]
   ,AVG(StandardCost) AS [AverageCost]
FROM [dbo].[DimProduct]

Distribution of Data:

1
2
3
4
5
6
-- Tier by Product Cost (4 tiers)
SELECT
    NTILE(4) OVER(ORDER BY [StandardCost]) AS [Tier]
   ,productKey
   ,StandardCost
FROM [dbo].[DimProduct]

Average per Category with raw data:

1
2
3
4
5
6
7
-- Take the Average for each Product Subcategory and attach to the detail level data
SELECT
    AVG(StandardCost) OVER(PARTITION BY [ProductSubcategoryKey]) AS [AverageCostSubPerCat]
   ,productKey
   ,[ProductSubcategoryKey]
   ,StandardCost
FROM [dbo].[DimProduct]

Ranking:

1
2
3
4
5
6
7
8
-- Rank Products based on the minimum and maximum cost per Product Subcategory
SELECT [ProductSubcategoryKey]
   ,RANK() OVER(PARTITION BY [ProductSubcategoryKey] ORDER BY StandardCost, [ProductKey]) AS sequence_min
   ,RANK() OVER(PARTITION BY [ProductSubcategoryKey] ORDER BY StandardCost DESC, [ProductKey] DESC) AS sequence_max
   ,[ProductKey]
   ,StandardCost
FROM [dbo].[DimProduct]
WHERE StandardCost IS NOT NULL

Averages excluding certain ranks:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Get the Average and Qty. of the products per category excluding the minimum and maximum cost per Product Subcategory (above)
WITH CTE_sequence AS (
  SELECT [ProductSubcategoryKey]
     ,RANK() OVER(PARTITION BY [ProductSubcategoryKey] ORDER BY StandardCost, [ProductKey]) AS sequence_min
     ,RANK() OVER(PARTITION BY [ProductSubcategoryKey] ORDER BY StandardCost DESC, [ProductKey] DESC) AS sequence_max
     ,[ProductKey]
     ,StandardCost
  FROM [dbo].[DimProduct]
  WHERE StandardCost IS NOT NULL
)
SELECT [ProductSubcategoryKey]
  ,COUNT([ProductKey]) AS Quantity
  ,AVG(StandardCost) AS Average_Cost
FROM CTE_sequence
WHERE sequence_min > 0
  AND sequence_max > 0
GROUP BY [ProductSubcategoryKey]

There are more functions that can be used and the user even has the option to dice the data further by using HAVING and slicing with GROUP BY CUBE | ROLLUP which plots the data in “pivotable” format.