Business Requirements. What are those? [#TSQL2sday]

This quick post is in reference to the T-SQL Tuesday for the Month of December of 2010, hosted by Steve Jones. The subject is “What issues have you had in interacting with the business to get your job done?”

T-SQL Tuesday, December of 2010

T-SQL Tuesday, December of 2010

Have you ever been in a situation where you were told to develop and deliver certain application because Business folks already sold it to a particular buyer, even with a preset delivery date? You are not alone. I have experienced similar ones without even knowing “What” we were supposed to deliver, less “How” to do it.

Business and Technology need to create a partnership, and communication needs to flow back and forth. This is where a very strong business analyst with technical knowledge comes into play. If business would like to have a portfolio of products of services they can sell, it needs to be communicated with the business analyst, who at the same time will ask questions to the technical team to get an idea of the feasibility and viability of the product. If there are technical questions then they can be asked to the business analyst or even directly to the business owners if required.

Technical teams cannot develop without having a solid business requirement, or a good understanding of what the business folks want. I agree that most of the times, especially with new products or services, complete business requirements are unfeasible, but the business analyst needs to write what business folks want conceptually in a technical form, pointing out relevance and priority.

Iterative development and deployment is a trend being followed by more companies nowadays. As releases get to production on a much quicker fashion it can give the opportunity to the business folks to analyze the original requirements/concepts and recommend changes. It can also build confidence with the technical team as results are being delivered in small releases but in a progressive fashion. This is key to continuously building a partnership.

My opinion is that technical teams need to understand what are they going to develop and for what purpose. Analyze and suggest the technology to be used and work with the project manager and/or business analyst in order to come up with a timeline. Interact with the business folks in order to clarify any doubts and prototype the solution before engaging into a full blown project. Approach short releases if viable and build confidence with the business team. Never be afraid to ask questions; if you do not know “what” needs to be developed rest assured that “how”, the way you will do it, is wrong.

Table Structure, Trading Maintenance with Performance [SQL Server]

We recently migrated a database from SQL Server 2005 X86 to 2008 R2 X64. Running an Enterprise Edition License, we were able to take advantage of partitioning. It had several tables with similar structures; row count north of 2 Billion rows.

Old Structure:

  1. 3 fields
    1. TypeId INT
    2. ContentData VARCHAR(100)
    3. insertionDate DATETIME
  2. Unique constraint: TypeId, ContentData. Added InsertionDate to comply with partitioning requirements.
  3. Table had a partition per month.

Pros

  1. On the first week of each new month we marked the previous month partition as read-only.
  2. A one-time File backup was performed to the previous month partition.
  3. Deltas (new data) would always be located on the last partition
  4. The current months’ partition would be the only piece left in Read-Write mode.

Cons

  1. Searching for ContentData for a particular TypeId scanned as many times as partitions used
  2. Bulk Data comparison between new and old
    TypeId + ContentData performed an index seek on each partition
1
2
3
4
SELECT ContentData
INTO #temp
FROM sampletable
WHERE TypeId  = 41;
1
Table 'sampletable'. Scan count 92, logical reads 7613, physical reads 223, read-ahead reads 7808, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Original Plan

Original Plan

New Structure:

  1. 3 fields
    1. TypeId INT
    2. ContentData VARCHAR(100)
    3. insertionDate DATETIME
  2. Unique constraint: TypeId, ContentData.
  3. Table has a partition per TypeId (range of 50 items each, e.g. 1,2, … ,50).
  4. Index on insertionDate with TypeId as include
  5. Page Compression enabled.

Pros

  1. Searching for a particular TypeId + ContentData generates a unique index seek plan, 1 scan count (1 partition).
  2. Scanning for ContentData for a particular TypeId generates an index scan for only 1 partition
  3. Bulk Data comparison between new and old TypeId + ContentData performs an index seek in one partition only
  4. Compression reduced I/O reads by 50%

Cons

  1. All datafiles are in read-write mode. Every file needs to considered in the backup maintenance plan
  2. Any partition can suffer block update even if it holds data from 4 years ago (cannot be tagged as historical)
  3. Retrieving deltas performs an index seek on every partition
1
2
3
4
SELECT ContentData
INTO #temp
FROM sampletable
WHERE TypeId  = 41;
1
Table 'sampletable'. Scan count 1, logical reads 5449, physical reads 66, read-ahead reads 5445, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
New Plan

New Plan

Conclusion:

The table is mostly used for retrieving and comparing data. The retrieval call will always pass in TypeId as parameter. The system now performs an index scan for 1 partition (1 scan count). For data comparison, an index seek is performed for a unique value under 1 partition only. Data compression reduced I/O reads by 50%.

In the past, data extraction and comparison scanned through all the partitions generating excessive I/O calls, but we did not need to worry about continuously backing up older partitions as they were already flagged as read-only and backed up.

With this exercise we have effectively increased performance at the expense of maintenance. Well worth the change.

I/O Issues on a newly deployed SQL Server cluster attached to a SAN [Windows 2008 R2]

We executed the migration of stand-alone database servers to an active/active cluster. This was for our enterprise-wide reporting solution which includes a scaled-out deployment of Reporting Services implementing F5 BIGIP as a load balancer.

I/O challenges:

We migrated over the Distributor Instance and needed to re-snapshot several tables; planned for a 36 hour downtime. Small tables were copied over quickly but we were struggling with large tables. We had IO_COMPLETION and ASYNC_NETWORK_IO as top waiters during the bulk copy. In this scenario, data was brought over to the distributor instance and then bulk copied onto the subscriber. Both instances reside on the same node, each one with their properly assigned storage on Dell Equallogic SAN.

The first thing we attempted was to correct the link negotiation from auto to 1000/full. Changes were made and we experienced no change. All instances bounced immediately after the change, a surprise as we never thought that changing the configuration for 1 NIC will affect the database instances. It was neither the heartbeat nor the iSCSI NICs the ones affected.

We continued with the bulk copy until we found that a 110 GB table was taking extremely long to complete. I decided to exclude the largest tables from replication and approach another plan for them (backup, restore, batch insert and partition switch-in – I might write about this in another blog post).

We continued with the remaining objects and declared success on a Sunday at 10 PM. Everything looked good except for four of the very large tables that were not included due to the time required to copy them over.

IO Pending Requests ms.

IO Pending Requests ms.

Re-Snapshot and subsequent Rollback:

Something interesting happened. A bulk copy was rolling back on Monday morning which took around 8 hours to complete. The log file grew to 125 GB and hit the storage limit. We were not sure why as the table was bulk copied successfully a night before. We removed the table from replication and decided to continue troubleshooting the I/O problems.

I/O resolution:

I was able to reproduce the I/O symptoms at the OS level by copying files larger than 4.4 GB. I would see rates of 15 MB/s instead of the 500 MB/s I was seeing with smaller files. We failed over expecting to isolate the issue to 1 node, but the issue reoccurred. We decided to upgrade the NIC drivers to the latest published a few weeks ago, but no improvements were experienced. The SAN admin then found an article about disabling Windows Scaling heuristics and decided to give it a try.

1
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\Tcpip\ParametersEnableWsd=0

Registry was updated, nodes were bounced (1 at a time) and tests were re-ran. No more I/O contention and replication was just flowing with a a few ms. of IO pending requests instead of 200 ~ 400 we were hitting before.

1
2
3
4
5
6
7
8
select
database_id,
file_id,
io_stall,
io_pending_ms_ticks,
scheduler_address
from sys.dm_io_virtual_file_stats(NULL, NULL) AS t1
INNER JOIN sys.dm_io_pending_io_requests AS t2 ON (t1.file_handle = t2.io_handle)

Re-Snapshot and subsequent Rollback, again:

I/O was fixed, so now we needed to re-snapshot the missing table that rolled back. We did on a Tuesday night, everything went well and then on Wednesday morning a rollback was pending. We again hit the storage limit of 125 GB for log. I asked the SAN admin to extend it to 200 GB, asked one of the DBAs to exclude the table for replication and let the undelivered commands flow to the subscriber.

Undelivered Transactions:

The rollback took another 6 hours to complete and at that point I was just waiting for replication to catch up. 30 minutes later, nothing happened. We continued waiting 30 more minutes and nothing happened either. Jobs were bounced, SQL Server Agent restarted on the Disitributor and Subscriber and nothing happened either. I asked a question on Twitter and was suggested to call Microsoft. Before I did, I called a friend of mine, Anthony Sammartino (Twitter | Blog), and left a message. While calling Microsoft, Anthony called me back and explained that there might be some errors with DMLs attempted on the subscriber. He discussed it with one of the DBAs and we were able to get replication going by changing the distributor agent profile. We had over 5 million undelivered commands pending.

Stabilization of the Environment:

By 8 PM we were up to date with all tables except the bulk copy that rolled back. I decided to stop making changes to the environment for the long Thanksgiving weekend and ensured that the environment was stable enough to continue working on it on the next Monday.

Re-Snapshot and subsequent Rollback resolution:

We figured that a filter was later giving an exception letting the publisher reinitialize replication for the object issuing a re-snapshot. It had been performing bulk copies and then rolling back several times during the night. The filter value was defined as column NOT LIKE ‘%value%’.

We decided to exclude the filter and re-snapshot the table today. 15 minutes later, 60 million rows were inserted into the subscriber. No issues so far.

Things are now looking much better.

I am my own brand [Un-SQL]

I am done with all my meetings, prepared for tomorrows migration and deleted all the fluff from my inbox. Now I have time to add a post for the Un-SQL Friday, an idea by the @MidnightDBA, yes the one that is rough to vendors that do not know anything about databases and DBAs wanna-be that are just incompetents (won’t argue with that). The original post is located here.

Anyways, I am my own brand. Does it matter? Maybe. That is why I created ozamora.com. That is my brand. But, do I want to be known? Not necessarily want but would like to be seen as someone that has helped building solutions or solved a problem, because that is pretty much what I do on a daily basis. I also like sharing what I have learned from others.

DJ Zamo

Now I wish I had a chance to post most interesting stuff I do so I can go back to it and say “oh, this is how I did it” instead of trying to remember. It has happened to me several times and no, searching emails not always answers my questions.

But why ozamora? Just to keep it simple. I have used ZamoTrance, Zamo, OZ, DJ Zamo, and even some have called me Zamorita, Zamito, and the Wizard of OZ. I have not registered them yet, as I am waiting for a GoDaddy Deal ha!

Now Twitter. Yes I am late to the game and started with @Oscar_Zamora, then changed to @__ZT and finally settled with @ZamoraO? Why not @ozamora? Well you guessed it. There is someone that has registered it a long time ago and never has tweeted. Twitter needs to phase out all the stale usernames…

I did register zamorao.com, so I guess I am covering my brand.

Has it helped? Certainly. I met many people at the SQL Pass Summit this year, and they recognized me just because of my Twitter name.

I will continue with ozamora.com and pack it with information that interests me and might be useful to the community. My goal is to excel at what I do and transfer knowledge to my team. If I do not learn something new today, I feel like I just wasted it. I make sure it does not happen.

That’s it. Now follow me.

Denali, the next release [SQL Server]

After a week at the Pass Summit United 2010 and after catching up at work with several tasks, I am now able to update my blog with some exciting information. SQL Server codename “Denali” CTP1 was released formally on 11/09/2010 (version 11.0). I was able to obtain a DVD copy while at the summit, and now it can be downloaded too.

The key takeaways:

  1. AlwaysON. This is a full blown HADR solution that allows multiple asynchronous and synchronous
    SQL Server Codename "Denali" Community Technology Preview 1

    SQL Server "Denali" CTP1

    mirrors of a database with transparent fail over option to a local and remote datacenter. This is a feature I am really looking forward to.

  2. VeritPaq embedded in the relational engine. The SQL Server engineers ported the same code used for PowerPivot into the RDBMS. The feature will allow column-level ultra-high compression of repetitive data which will dramatically increase query efficiency by ten times at least.
  3. BISM with DAX support. Business Intelligence Semantic Model in Analysis Services which will allow retrieval of data using DAX queries; the same ones used in PowerPivot.
  4. Sequencer generator. It is never too late for a feature like this.
  5. Enhanced Windowed Functions.
  6. File Table Support. I have been waiting this feature since the Oracle 9i days. It will allow us to manage files directly within SQL Server.

There are certainly more enhancements which will continue to pop-up as the product matures.

Now, it is a challenge for us to learn and prepare for the new features after just upgrading to 2008 R2. Well, Microsoft is doing a pretty good job by keeping us busy and I am OK with that. They were kindly enough to publish Books Online for Denali though.