Saturday, 10 May 2008

DDD Scotland: Third Session

Integration Service in Sql 2008

Allan Mitchell www.SQLDTS.com\ www.SQLIS.com

 

Behind the scenes: Threading

Multicast Transform generates n copies of the input. This is the same in 2008 but all output are on the same execution Path instead of the same thread.

A raw file means that IIS doesn't have to translate information - this means that its ideal for staging data.

 

To work around issues with SSIS 2005 you use the union all component. This works because the union all works asynchronously. This causes another execution tree (Another thread). This is not needed in 2008.

 

Pipeline Limiter

This doesnt sound good because we are moving data! Limiter exists in 2005 so its not going to be slower in 2008 - just you didnt know about it. In 2008 they tell you about it so you now get performance counters.

Why would I want to restrict the pipeline?

  • Date = buffer = memory.
  • Memory is reused when the buffer terminates
  • push back from a component = no reuse of memory = run out of memory!

Dont use select * because it will need to allocate memory in the buffer for something we wont use.

 

To capture these event in the Feb CTP you need to check the log messages for an attribute starting with "The Component%" Expect this to change tho

You can now use C# in the script component to write/manipulate data.

Lookup Component

Caching Options

  • Full: Before its used everything gets cached - Slow startup
  • Partial: Lookup component will try for a match in the cache first, then check the file.
  • None - The last query is cached but that it. Try to order your data. This is useful for volatile data.

Full cache is fasted but might take longest to cache, and a large amount of memory.

In 2005 these cannot be passed around - so if you want to loop through a directory then it will be cached for every file! This is also not transferable.

In 2008 there is now something called a Cache Transform. This can be used as a destination.

You can cache to:

  • A file - Its a Raw File
  • In Memory

This allows the reference to be passed around.

Because you can doesnt mean you should - by default the 2005 way is used.

Row Redirection

If a lookup component gets no match then by default it fails. You have to configure a redirect down the error output to ignore it if this should not be an error.

Now in 2008 we have a no match output option.

Points to note:

Lookup transform is case sensitive when done in cache. Obviously this is not the case in SQL

Data Profiling Task

Profiles data in your database (can do quick setup or verbose). only SQl2000 or above can be profiled.

Side Note: If you are going to Sql server then use the OLE Connection manager or the sql connection manager

This is fantastic for seeing how rubbish the data is.

Useful for identifying distribution of values.

Microsoft have provided a date profile viewer to read the xml produced by the above component. Which is really powerful - you can drill into it quite easily.

This is useful for checking data sent from a provider.

Change Data Capture (CDC)

Billed as an ETL function.

Gives increased efficiency - incremental extractions.

You can identify changed rows and columns.

Essentially transaction replication. It uses the log reader.

A lot of people uses triggers - this doesnt!

Use a sp called sp_cdc_enable_table with parameters to define the table you want to do the change data capture on.

Supports net changes/all changes so you can decide how you need your data (and potentially save space)

No comments: