Windowed Functions empowering analytics [#TSQL2sday]

T-SQL Tuesday #16

T-SQL Tuesday #16

This blog entry is participating in the T-SQL Tuesday #16 for the month of February, hosted by Jes Schultz Borland (Blog|Twitter).

Back in 2001 I was involved in a project migrating an e-commerce site to a new platform. We were mapping and moving data from an Oracle 8.1.5 to an Oracle 8.1.7 instance. The framework was the beloved Dynamo running on Sun Solaris. The Dynamo engineers decided to use sequencers on each entity (e.g. product, category, etc). I was trying to figure out the best reusable PL/SQL block to move the data creating the correct sequence until I was pointed out that Oracle 8.1.6 supported Analytical Functions. After reading and practicing I discovered the power of those functions.

For the purpose of this blog I will refer the Analytical Functions as Windowed Functions instead. They are very useful as it allows the user to crunch measures of subsets of data maintaining the “raw” detail level data. For example, it allows us to get the average price of a product category while also displaying the unit price of each product. It can also help us rank a product by its price and “window” it by category.

Examples

Basic Aggregation:

1
2
3
4
5
-- Qty of products and average price
SELECT
    COUNT(productKey) AS [Quantity]
   ,AVG(StandardCost) AS [AverageCost]
FROM [dbo].[DimProduct]

Distribution of Data:

1
2
3
4
5
6
-- Tier by Product Cost (4 tiers)
SELECT
    NTILE(4) OVER(ORDER BY [StandardCost]) AS [Tier]
   ,productKey
   ,StandardCost
FROM [dbo].[DimProduct]

Average per Category with raw data:

1
2
3
4
5
6
7
-- Take the Average for each Product Subcategory and attach to the detail level data
SELECT
    AVG(StandardCost) OVER(PARTITION BY [ProductSubcategoryKey]) AS [AverageCostSubPerCat]
   ,productKey
   ,[ProductSubcategoryKey]
   ,StandardCost
FROM [dbo].[DimProduct]

Ranking:

1
2
3
4
5
6
7
8
-- Rank Products based on the minimum and maximum cost per Product Subcategory
SELECT [ProductSubcategoryKey]
   ,RANK() OVER(PARTITION BY [ProductSubcategoryKey] ORDER BY StandardCost, [ProductKey]) AS sequence_min
   ,RANK() OVER(PARTITION BY [ProductSubcategoryKey] ORDER BY StandardCost DESC, [ProductKey] DESC) AS sequence_max
   ,[ProductKey]
   ,StandardCost
FROM [dbo].[DimProduct]
WHERE StandardCost IS NOT NULL

Averages excluding certain ranks:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Get the Average and Qty. of the products per category excluding the minimum and maximum cost per Product Subcategory (above)
WITH CTE_sequence AS (
  SELECT [ProductSubcategoryKey]
     ,RANK() OVER(PARTITION BY [ProductSubcategoryKey] ORDER BY StandardCost, [ProductKey]) AS sequence_min
     ,RANK() OVER(PARTITION BY [ProductSubcategoryKey] ORDER BY StandardCost DESC, [ProductKey] DESC) AS sequence_max
     ,[ProductKey]
     ,StandardCost
  FROM [dbo].[DimProduct]
  WHERE StandardCost IS NOT NULL
)
SELECT [ProductSubcategoryKey]
  ,COUNT([ProductKey]) AS Quantity
  ,AVG(StandardCost) AS Average_Cost
FROM CTE_sequence
WHERE sequence_min > 0
  AND sequence_max > 0
GROUP BY [ProductSubcategoryKey]

There are more functions that can be used and the user even has the option to dice the data further by using HAVING and slicing with GROUP BY CUBE | ROLLUP which plots the data in “pivotable” format.

Who needs DBA Skills? [#TSQL2sDay]

T-SQL Tuesday for November 2010

T-SQL Tuesday for November 2010

Databases are doing just fine. If an issue is encountered add more hardware to the mix. If that does not work then add more servers and move databases to it. No DBAs on site, just developers and a business owner that wants to save money.

What? With that first paragraph I can pinpoint at least 3 main reasons why DBA skills are required:

  1. Adding more hardware is just a tactical approach, never strategical as the new one will encounter the same issue the old one did in a matter of months.
  2. Adding more servers can be complex as databases will reside on different boxes. Messing around with linked servers and executing joins across boxes are not the best approach.
  3. Licensing costs go through the roof! Yes. I have dealt with a solution where the Lead Developer decided to add 8 Servers running SQL Server 2005 Enterprise Edition each. The tab was running over $300K just in licensing costs.

Ok, so what else will a good DBA perform to add value to any company? These come to mind:

  1. To plan a long term solution properly. Architect a database structure in order to be flexible to extensions, following 3rd and 4th normalization forms, with proper relationships, keys and indexes.
  2. To understand the business requirements and design a database layout that will approach them. Which tables will be transactional? Which ones will be historical? Which ones will receive bulk insert operations?
  3. To implement proper security.
  4. If data will grow too large, to prepare the underlying tables to support sliding windows to swap out old data.
  5. To make the business owners aware that a separate entity would be beneficial if reporting is crucial. Nothing better than replicating data to another instance and allowing it be queried all day long.
  6. To make developers understand that business logic shall not run in the database tier. Data validation, cleansing, and any process that will require scalar UDFs are bad. If there is no way to add an app tier, then CLRs can help, but these kind of solutions need to have proper justifications.
  7. To see if filtered and/or include indexes can benefit a solution.
  8. To justify if data compression can be a good trade off with CPU cycles.
  9. To suggest when mirroring, log shipping, replication and/or clustering will benefit.
  10. To understand if I/O will require additional spindles and how to prove it.
  11. To understand if an edition downgrade, release upgrade, virtualization or X64 implementation will add value to any solution.
  12. To have an up to date checklist for production database implementations
  13. To follow a good process: Unit Test, QA, UAT, Production
  14. To learn from the best and to share his knowledge with their team.

DBA skills are not just a nice-to-have. They are a requirement. Do it right the first time and avoid the hassle of fixing in the future.

Once I was approached by a developer and he literally told me:

Every night you should have a job that truncates the logs. Also, you need to shrink the databases to make them faster.

I am not making that up! It is a true story. Imagine if that developer was taking care of the databases. I believe that most developers have no business in the DB world.