SQL Server Integration Services (SSIS) is a component of the Microsoft SQL Server database software which can be used to perform a broad range of data migration tasks. SSIS is a platform for data integration and workflow applications. It features a fast and flexible data warehousing tool used for data extraction, transformation, and loading (ETL). The tool may also be used to automate maintenance of SQL Server databases and updates to multidimensional cube data. (ref:wikipedia) While ETL processing is common in data warehousing (DW) applications, SSIS is by no means limited to just DW. At a high level, SSIS provides the ability to:
Retrieve data from heterogeneous data sources (e.g. SQL Server, Flat Files, Excel sheets, CSV files, Raw files, XML files, SharePoint Lists etc.)
Perform various transformations on the data; e.g. Aggregation, Copy/Map, Data Conversion, Derived Column, Union, Merge, Join, Multicast, Sort, Pivot & UnPivot, or any other calculations using .net scripting languages (C# or VB.NET).
Load data into any source (as mentioned above).
Define a workflow
History: The first version of SSIS was released with SQL Server 2005. SSIS is a replacement for Data Transformation Services (DTS) which was available with SQL Server 7.0 and SQL Server 2000. SSIS builds on the capabilities introduced with DTS.
In the following parts of this thread, I will be covering a number of topics that you need to understand in order to successfully build an SSIS package. High level outlines are undermentioned:
Creating SSIS packages with SQL Server Management Studio (SSMS)
SQL Server Integration Services (SSIS) is a Business Intelligence tool used to perform Extract, Transform & Load (ETL) operations. There are few tasks available in SSIS control flow to create, drop, update, and process CUBE objects. There are different types of Analysis Services tasks available in SSIS. For example:
Analysis Services Execute DDL Task
Analysis Services Processing Task
Data Mining Query Tasks
This post will explain you how to use Analysis Services Execute DDL Task.
Analysis Services Execute DDL Task
SQL Server Analysis Services Execute DDL Task can be used to create, modify, and delete Analysis Services objects. The Analysis Services Execute DDL Task is similar to the Execute SQL Task, the difference is that using the Analysis Services Execute DDL Task we can issue DDL statements against an Analysis Services. The DDL statements can be used to create cubes, dimensions, KPI’s, Calculation, Cube Partitions Roles or any other OLAP objects.
The Analysis Services Processing Task can be used to process analysis services objects such as cubes, and dimensions.
How to use Analysis Services Execute DDL Tasks
In this example, I will be using Analysis Services Execute DDL Task to create a database in Analysis Services. To begin, suppose you have "Sales" database in Analysis Services in Development environment and you want to create the same database in new environment (e.g. Test, Production). Below are the steps to achieve this goal:
Step 1: Connect to Analysis Services in development environment and generate XMLA script of "Sales: database. Save this file with Sales.xmla name to physical location e.g. D:\Test\Sales.xmla
Stpe 2: Create new SSIS package. Add new Analysis Services connection manager. Give required Server name and leave Initial Catalog blank. Click on Test Connection to make sure it is connected and finally click OK to close Connection Manager wizard.
Stpe 3: You can add a package variable "ServerName" to assign SSAS server name. Map this variable with ServerName in Expression of connection manager properties as shown below. Make sure this variable is configured in config file.
Step 4: Create new file connection manager for xmla file. This Connection Manager will be renamed with "Sales.xmla".
Step 5: Drag and Drop Analysis Services Execute DDL Task. Rename this with "Analysis Services Execute DDL Task - Create Sales Cube". Now Double click on the task to open Analysis Services Execute DDL task Editor. Set Connection as Name "localhost.SSAS", Source Type as "File Connection", and Source as "Sales.xmla". Click OK to complete.
Step 6: Save this package. Now you can move this package along with sales.xmla file and config file to any environment and execute. It will create Sales cube in that box. Make sure that the path for xmla file is same as development environment othewise you need to add the path in config file to make it independent of environment.
Note: Before executing this package, change the value of ServerName variable with current environment in config file. Click here to see how to create config file in SSIS.
After posting script for Date Dimension, I thought of sharing script for Time Dimension as well. Some times we need to drill down the Cube data by time hierarchy e.g. AM/PM-->Hrs-->Min. Below script will create Time dimension and solve the purpose.
USE [DatabaseName] GO
IFOBJECT_ID('DimTime') IS NOT NULL DROP TABLE DimTime GO
CREATE TABLE [dbo].[DimTime]
[DimTimeSK] [int] NOT NULL,
[Time] [varchar](11) NOT NULL,
[Time24] [varchar](8) NOT NULL,
) ON [PRIMARY]
WHILE @DimTimeSK < (60*60*24) BEGIN SET @DimTimeSK = @DimTimeSK + 1 SET @Date = DATEADD(second,@DimTimeSK,convert(datetime, '1/1/2007')) SET @AM = right(convert(varchar,@Date,109),2) SET @hour24 = DATEPART(hour, @Date) SET @hour = CASE WHEN @AM = 'PM' THEN @hour24 - 12 ELSE @hour24 END SET @minute = DATEPART(minute, @Date) SET @second = DATEPART(second, @Date)
IFOBJECT_ID('Date','U') IS NOT NULL DROP TABLE Date GO
CREATE TABLE [dbo].[Date](
[DateSK] [int] NOT NULL,
[FullDate] [datetime] NOT NULL,
[DateName] [char](11) NOT NULL,
[DayOfWeek] [tinyint] NOT NULL,
[DayNameOfWeek] [char](10) NOT NULL,
[DayOfMonth] [tinyint] NOT NULL,
[DayOfYear] [smallint] NOT NULL,
[WeekdayWeekend] [char](7) NOT NULL,
[WeekOfYear] [tinyint] NOT NULL,
[MonthName] [char](10) NOT NULL,
[MonthOfYear] [tinyint] NOT NULL,
[CalendarQuarter] [tinyint] NOT NULL,
[CalendarYear] [smallint] NOT NULL,
[CalendarYearMonth] [char](7) NOT NULL,
[CalendarYearQtr] [char](15) NOT NULL, CONSTRAINT PK_Date_DateID PRIMARY KEY (DateSK)
) ON [PRIMARY] GO RAISERROR('Table Date created successfully!',0,1)
DECLARE @StartDate datetime, @EndDate datetime -- Set StartDate and EndDate as per your requirement SELECT @StartDate = '2009-01-01', @EndDate = '2010-12-31'
WHILE (@StartDate <= @EndDate ) BEGIN INSERT INTO Date SELECT CAST(CONVERT(varchar(8),@StartDate,112) AS int) DateSK
,@StartDate AS [Date]
,CONVERT(varchar(20),@StartDate,106) AS DateName
,CASE WHEN DATEPART(DW,@StartDate) IN (1,7) THEN'WeekEnd' ELSE'WeekDay'END [WeekdayWeekend]
,DATENAME(MM ,@StartDate) [MonthName]
,DATEPART(MM ,@StartDate) [MonthOfYear]
,DATEPART(YY ,@StartDate) [CalendarYear]
,DATENAME(YY,@StartDate)+'-'+RIGHT('0'+CAST(Month(@StartDate) as varchar),2) [CalendarYearMonth]
SET @StartDate = @StartDate +1 END GO
Date Dimension is ready to use as soon as you execute this script in required database.
You can create following Calendar Hierarchy in your dimension to provide drill down featrues for Growth, Trends, and ToDate calculations (e.g. YTD, QTD, MTD). Year-->Quarter-->Month-->Week-->Date
After developing SQL Server Integration Service (SSIS) packages, it is highly unlikely that you can deploy the package and execute in the production environment directly. You may have to work in the development environment following Test and UAT environments and later deploying to the Production environment. For all these environments you probably have different servers and folders to work with.
Apart from different environments, sometimes, there are others changes. Sometimes the client may change the drive from C to D or D to E, or change the Database name. If you are using SMTP mail servers, you may have to change the server IP and authentication when the environment changes. So whenever the environment changes, you may have to change all the configuration settings associated with SSIS packages.
You can avoid all the headache of changing these settings by using package configuration.
Here are the steps to setup Package Configuration in SQL Server 2008.
1. Once the SSIS package is developed, Right Click on surface area and select Package Configurations...
2. It will open Package Configuration Organizer. Check Enable package configurations and click Add... button.
3. It will open Package Configuration Wizard. Click Next to proceed.
4. There are few configuration types available with package configurations. They are: XML configuration file, Environment variable, Registry entry, Parent package variable and SQL Server. Select Configuration Type from dropdown. Write configuration file name with full path in Configuration file name textbox.
5. Suppose we need to select database settings for OLEDB Connection Manager Local.MyDatabase which is the connection manager for the SQL Server database. Then you will need to select the properties you require to save as a package configuration from the following screen.
For connection manager, you can either select entire ConnectionString property or you can select ServerName, UserName, Password, and InitialCatalog to construct the connection string. I prefer latter one as it gives more options when there is a change.
6. Click on Next button followed by Finish button to complete the wizard. Now you can see config.DtsConfig file at the location you mentioned in step 3. Below is snapshot of config file (I did some formatting for better visulation)
7. That is all you have to do for this example. The next time you load the package, your package will read the configurations from an XML file. You can verify this by changing the XML file contents and reloading the package. After reloading the package, view the connection manager properties and you can see that the connection manager properties are updated from the values in the XML files are updated for the properties.
I personally prefer using Package Variables in config file instead of connection managers because that gives me facility to use in hundreds of packages where Server and Database name are same. however, you need to map the variables in connection manager properties.
Here is the T-SQL code to BULK insert data from text file to SQL Table:
BULK INSERT [Database]..[TableName]
FROM 'D:\test.txt' -- Path of text file WITH
( FIRSTROW = 1
,BATCHSIZE = 10000
,FIELDTERMINATOR = ','
,ROWTERMINATOR = '\n'
,LASTROW = 20
Here is the description of Keywords used:
Specifies the number of the first row to load. The default is the first row in the specified data file.
Specifies the number of rows in a batch. Each batch is copied to the server as one transaction. If this fails, SQL Server commits or rolls back the transaction for every batch. By default, all data in the specified data file is one batch.
Specifies the field terminator to be used for char and widechar data files. The default field terminator is \t.
Specifies the row terminator to be used for char and widechar data files. The default row terminator is \r\n (newline character).
Specifies the number of the last row to load. The default is 0, which indicates the last row in the specified data file.
Here is the easiest way to calculate Month End Date for any given date:
SELECTDATEADD(MM, DATEDIFF(MM, 0, GETDATE()) + 1, 0)-1 AS MonthEndDate
If you replace GetDate() with any date, above query will return the Month End Date for that particular month.
If GetDate() value is '2010-01-25' then Output will be '2010-01-31'
If GetDate() value is '2010-02-20' then Output will be '2010-02-28'