Tips and Tricks for SQL-BI: May 2010

Friday, May 28, 2010

T-SQL Puzzle 2 - CASE Statement

What will be the output of below Query:

DECLARE @f as int

SET @f = 1
SELECT CASE 1
    WHEN @f THEN 'First'
    WHEN @f THEN 'Second'
    ELSE 'Nothing' END
AS CaseStmt

Below are the options:
1. First
2. Second
3. Nothing
4. No records
5. syntax Error

SSIS Part 1 - Introduction

SQL Server Integration Services (SSIS) is a component of the Microsoft SQL Server database software which can be used to perform a broad range of data migration tasks.

SSIS is a platform for data integration and workflow applications. It features a fast and flexible data warehousing tool used for data extraction, transformation, and loading (ETL). The tool may also be used to automate maintenance of SQL Server databases and updates to multidimensional cube data. (ref:wikipedia)

While ETL processing is common in data warehousing (DW) applications, SSIS is by no means limited to just DW. At a high level, SSIS provides the ability to:

Retrieve data from heterogeneous data sources (e.g. SQL Server, Flat Files, Excel sheets, CSV files, Raw files, XML files, SharePoint Lists etc.)
Perform various transformations on the data; e.g. Aggregation, Copy/Map, Data Conversion, Derived Column, Union, Merge, Join, Multicast, Sort, Pivot & UnPivot, or any other calculations using .net scripting languages (C# or VB.NET).
Load data into any source (as mentioned above).
Define a workflow

History: The first version of SSIS was released with SQL Server 2005. SSIS is a replacement for Data Transformation Services (DTS) which was available with SQL Server 7.0 and SQL Server 2000. SSIS builds on the capabilities introduced with DTS.

In the following parts of this thread, I will be covering a number of topics that you need to understand in order to successfully build an SSIS package. High level outlines are undermentioned:

Creating SSIS packages with SQL Server Management Studio (SSMS)
Creating a simple SSIS package in BIDS
Control Flow Components
Data Flow Components
Package Configurations
Error Handling and Logging
Deploying SSIS packages
Different ways of Executing SSIS packages and many more...

Wednesday, May 26, 2010

T-SQL Puzzle 1: WHERE & GROUP BY

What will be the output of query given below. There is no FROM clause and no GROUP BY clause.

Does it throw an error?
Do you get any resultset?
Do you get a single row containing NULL?
Do you get a single row with 'Something' data?
Do you get multiple rows?
Cheers!

SELECT 'Something'
WHERE 1=2
HAVING 1=1

Note: You can easily check your answer by running the query but please check your skills before you do so.

Thursday, May 20, 2010

Using Analysis Services Execute DDL Task in SSIS

SQL Server Integration Services (SSIS) is a Business Intelligence tool used to perform Extract, Transform & Load (ETL) operations. There are few tasks available in SSIS control flow to create, drop, update, and process CUBE objects. There are different types of Analysis Services tasks available in SSIS. For example:

Analysis Services Execute DDL Task
Analysis Services Processing Task
Data Mining Query Tasks

This post will explain you how to use Analysis Services Execute DDL Task.

Analysis Services Execute DDL Task

SQL Server Analysis Services Execute DDL Task can be used to create, modify, and delete Analysis Services objects. The Analysis Services Execute DDL Task is similar to the Execute SQL Task, the difference is that using the Analysis Services Execute DDL Task we can issue DDL statements against an Analysis Services. The DDL statements can be used to create cubes, dimensions, KPI’s, Calculation, Cube Partitions Roles or any other OLAP objects.

The Analysis Services Processing Task can be used to process analysis services objects such as cubes, and dimensions.

How to use Analysis Services Execute DDL Tasks
In this example, I will be using Analysis Services Execute DDL Task to create a database in Analysis Services. To begin, suppose you have "Sales" database in Analysis Services in Development environment and you want to create the same database in new environment (e.g. Test, Production). Below are the steps to achieve this goal:
Step 1: Connect to Analysis Services in development environment and generate XMLA script of "Sales: database. Save this file with Sales.xmla name to physical location e.g. D:\Test\Sales.xmla

Stpe 2: Create new SSIS package. Add new Analysis Services connection manager. Give required Server name and leave Initial Catalog blank. Click on Test Connection to make sure it is connected and finally click OK to close Connection Manager wizard.

Stpe 3: You can add a package variable "ServerName" to assign SSAS server name. Map this variable with ServerName in Expression of connection manager properties as shown below. Make sure this variable is configured in config file.

Step 4: Create new file connection manager for xmla file. This Connection Manager will be renamed with "Sales.xmla".

Step 5: Drag and Drop Analysis Services Execute DDL Task. Rename this with "Analysis Services Execute DDL Task - Create Sales Cube". Now Double click on the task to open Analysis Services Execute DDL task Editor. Set Connection as Name "localhost.SSAS", Source Type as "File Connection", and Source as "Sales.xmla". Click OK to complete.

Step 6: Save this package. Now you can move this package along with sales.xmla file and config file to any environment and execute. It will create Sales cube in that box. Make sure that the path for xmla file is same as development environment othewise you need to add the path in config file to make it independent of environment.

Note: Before executing this package, change the value of ServerName variable with current environment in config file. Click here to see how to create config file in SSIS.

Saturday, May 15, 2010

Time Dimension

After posting script for Date Dimension, I thought of sharing script for Time Dimension as well. Some times we need to drill down the Cube data by time hierarchy e.g. AM/PM-->Hrs-->Min. Below script will create Time dimension and solve the purpose.

USE [DatabaseName]
GO

IF OBJECT_ID('DimTime') IS NOT NULL
DROP TABLE DimTime
GO

CREATE TABLE [dbo].[DimTime]
(
   [DimTimeSK] [int] NOT NULL,
   [Time] [varchar](11) NOT NULL,
   [Time24] [varchar](8) NOT NULL,
   [HourName] [varchar](5),
   [MinuteName] [varchar](8),
   [Hour] [tinyint],
   [Hour24] [tinyint],
   [Minute] [tinyint],
   [Second] [int],
   [AM] [char](2)
) ON [PRIMARY]

GO

DECLARE
   @DimTimeSK int,@Date datetime, @AM char(2),
   @hour24 tinyint, @hour tinyint,
   @minute tinyint, @second int

SET @DimTimeSK = 0

WHILE @DimTimeSK < (60*60*24)
BEGIN
   SET @DimTimeSK = @DimTimeSK + 1
   SET @Date = DATEADD(second,@DimTimeSK,convert(datetime, '1/1/2007'))
   SET @AM = right(convert(varchar,@Date,109),2)
   SET @hour24 = DATEPART(hour, @Date)
   SET @hour = CASE WHEN @AM = 'PM' THEN @hour24 - 12 ELSE @hour24 END
   SET @minute = DATEPART(minute, @Date)
   SET @second = DATEPART(second, @Date)

   INSERT INTO dbo.DimTime
   (
      [DimTimeSK]
      ,[Time]
      ,[Time24]
      ,[HourName]
      ,[MinuteName]
      ,[Hour]
      ,[Hour24]
      ,[Minute]
      ,[Second]
      ,[AM]
   )

   SELECT
   @DimTimeSK AS [DimTimeSK]
   ,right('0'+ convert(varchar,@hour),2) + ':' +
   right('0'+ convert(varchar,@minute),2) + ':' +
   right('0'+ convert(varchar,@second),2) + ' ' + @AM AS [Time]
   ,convert(varchar,@Date,108) [Time24]
   ,right('0' + convert(varchar,@hour),2) + ' ' + @AM AS [HourName]
   ,right('0' + convert(varchar,@hour),2) + ':' +
   right('0' + convert(varchar,@minute),2)+ ' ' + @AM AS [MinuteName]
   ,@hour AS [Hour]
   ,@hour24 AS [Hour24]
   ,@minute AS [Minute]
   ,@second AS [Second]
   ,@AM AS [AM]
END
GO

Friday, May 14, 2010

Date Dimension

Below is the script to create Date Dimension.

USE [DatabaseName]
GO

IF OBJECT_ID('Date','U') IS NOT NULL
DROP TABLE Date
GO

CREATE TABLE [dbo].[Date](
   [DateSK] [int] NOT NULL,
   [FullDate] [datetime] NOT NULL,
   [DateName] [char](11) NOT NULL,
   [DayOfWeek] [tinyint] NOT NULL,
   [DayNameOfWeek] [char](10) NOT NULL,
   [DayOfMonth] [tinyint] NOT NULL,
   [DayOfYear] [smallint] NOT NULL,
   [WeekdayWeekend] [char](7) NOT NULL,
   [WeekOfYear] [tinyint] NOT NULL,
   [MonthName] [char](10) NOT NULL,
   [MonthOfYear] [tinyint] NOT NULL,
   [CalendarQuarter] [tinyint] NOT NULL,
   [CalendarYear] [smallint] NOT NULL,
   [CalendarYearMonth] [char](7) NOT NULL,
   [CalendarYearQtr] [char](15) NOT NULL,
CONSTRAINT PK_Date_DateID PRIMARY KEY (DateSK)
) ON [PRIMARY]
GO
RAISERROR('Table Date created successfully!',0,1)

DECLARE @StartDate datetime, @EndDate datetime
-- Set StartDate and EndDate as per your requirement
SELECT @StartDate = '2009-01-01', @EndDate = '2010-12-31'

WHILE (@StartDate <= @EndDate )
BEGIN
INSERT INTO Date
SELECT
   CAST(CONVERT(varchar(8),@StartDate,112) AS int) DateSK
   ,@StartDate AS [Date]
   ,CONVERT(varchar(20),@StartDate,106) AS DateName
   ,DATEPART(DW,@StartDate)   [DayOfWeek]
   ,DATENAME(DW,@StartDate) [DayNameOfWeek]
   ,DATENAME(DD,@StartDate) [DayOfMonth]
   ,DATENAME(DY,@StartDate) [DayOfYear]
   ,CASE WHEN DATEPART(DW,@StartDate) IN (1,7) THEN 'WeekEnd'
             ELSE 'WeekDay' END     [WeekdayWeekend]
   ,DATEPART(WW,@StartDate) [WeekOfYear]
   ,DATENAME(MM ,@StartDate) [MonthName]
   ,DATEPART(MM ,@StartDate)   [MonthOfYear]
   ,DATEPART(QQ,@StartDate)     [CalendarQuarter]
   ,DATEPART(YY ,@StartDate)    [CalendarYear]
   ,DATENAME(YY,@StartDate)+'-'+RIGHT('0'+CAST(Month(@StartDate) as varchar),2) [CalendarYearMonth]
   ,DATENAME(YY,@StartDate)+'-Q'+DATENAME(QQ,@StartDate) [CalendarYearQtr]

   SET @StartDate = @StartDate +1
END
GO

Date Dimension is ready to use as soon as you execute this script in required database.
You can create following Calendar Hierarchy in your dimension to provide drill down featrues for Growth, Trends, and ToDate calculations (e.g. YTD, QTD, MTD).
Year-->Quarter-->Month-->Week-->Date

Thursday, May 13, 2010

SSIS Package Configuration in SQL Server 2008

After developing SQL Server Integration Service (SSIS) packages, it is highly unlikely that you can deploy the package and execute in the production environment directly. You may have to work in the development environment following Test and UAT environments and later deploying to the Production environment. For all these environments you probably have different servers and folders to work with.

Apart from different environments, sometimes, there are others changes. Sometimes the client may change the drive from C to D or D to E, or change the Database name. If you are using SMTP mail servers, you may have to change the server IP and authentication when the environment changes. So whenever the environment changes, you may have to change all the configuration settings associated with SSIS packages.

You can avoid all the headache of changing these settings by using package configuration.

Here are the steps to setup Package Configuration in SQL Server 2008.

1. Once the SSIS package is developed, Right Click on surface area and select Package Configurations...

2. It will open Package Configuration Organizer. Check Enable package configurations and click Add... button.

3. It will open Package Configuration Wizard. Click Next to proceed.

4. There are few configuration types available with package configurations. They are: XML configuration file, Environment variable, Registry entry, Parent package variable and SQL Server. Select Configuration Type from dropdown. Write configuration file name with full path in Configuration file name textbox.

5. Suppose we need to select database settings for OLEDB Connection Manager Local.MyDatabase which is the connection manager for the SQL Server database. Then you will need to select the properties you require to save as a package configuration from the following screen.

For connection manager, you can either select entire ConnectionString property or you can select ServerName, UserName, Password, and InitialCatalog to construct the connection string. I prefer latter one as it gives more options when there is a change.

6. Click on Next button followed by Finish button to complete the wizard. Now you can see config.DtsConfig file at the location you mentioned in step 3. Below is snapshot of config file (I did some formatting for better visulation)

7. That is all you have to do for this example. The next time you load the package, your package will read the configurations from an XML file. You can verify this by changing the XML file contents and reloading the package. After reloading the package, view the connection manager properties and you can see that the connection manager properties are updated from the values in the XML files are updated for the properties.

I personally prefer using Package Variables in config file instead of connection managers because that gives me facility to use in hundreds of packages where Server and Database name are same. however, you need to map the variables in connection manager properties.

Tuesday, May 11, 2010

T-SQL script for BULK INSERT

Here is the T-SQL code to BULK insert data from text file to SQL Table:

BULK INSERT [Database]..[TableName]

FROM 'D:\test.txt' -- Path of text file
WITH
(
   FIRSTROW = 1
   ,BATCHSIZE = 10000
   ,FIELDTERMINATOR = ','
   ,ROWTERMINATOR = '\n'
   ,LASTROW = 20
)
GO

Here is the description of Keywords used:

FIRSTROW
Specifies the number of the first row to load. The default is the first row in the specified data file.

BATCHSIZE
Specifies the number of rows in a batch. Each batch is copied to the server as one transaction. If this fails, SQL Server commits or rolls back the transaction for every batch. By default, all data in the specified data file is one batch.

FIELDTERMINATOR
Specifies the field terminator to be used for char and widechar data files. The default field terminator is \t.

ROWTERMINATOR
Specifies the row terminator to be used for char and widechar data files. The default row terminator is \r\n (newline character).

LASTROW
Specifies the number of the last row to load. The default is 0, which indicates the last row in the specified data file.

Attach and Detach a database using T-SQL

Attach & Detach a database -

Sunday, May 9, 2010

Count Number of Rows for Tables of a Database

Easiest way to get an exact value of Number of Rows for all the tables in a SQL Server database

1. Use DBCC UPDATEUSAGE - this updates the values of rows for each partition in a table.
2. Use undocumented stored procedure sp_msForEachTable and store the result set in a table.

Below is the query to get required output:

USE [DatabaseName]
GO

DECLARE @DynSQL NVARCHAR(255)
SET @DynSQL = 'DBCC UPDATEUSAGE (' + DB_NAME() + ')'
EXEC(@DynSQL)

IF OBJECT_ID('tempdb..#T','U') IS NOT NULL
DROP TABLE #T
GO

CREATE TABLE #T (TableName nvarchar(500),NumberOfRows int)
GO

INSERT INTO #T
EXEC sp_msForEachTable 'SELECT PARSENAME(''?'', 1) as TableName,COUNT(*) as NumberOfRows FROM ?'
GO
SELECT * FROM #T ORDER BY NumberOfRows DESC

Friday, May 7, 2010

T-SQL Query to Calculate Month End Date

Here is the easiest way to calculate Month End Date for any given date:

SELECT DATEADD(MM, DATEDIFF(MM, 0, GETDATE()) + 1, 0)-1 AS MonthEndDate

Example:
If you replace GetDate() with any date, above query will return the Month End Date for that particular month.
If GetDate() value is '2010-01-25' then Output will be '2010-01-31'
If GetDate() value is '2010-02-20' then Output will be '2010-02-28'