Thursday 6 December 2007

Regular expressions are your friend

Well apparently there are people reading this - 23 so far if Google Analytics is to be believed. Anyway on with the post...

We're importing some frankly bizarre DB2 flat files using SSIS, one of the features of which is a date format yyyy-mm-dd-hh.mm.ss.mmmmm which SQL obviously won't accept as a datetime format (they're also fixed width and delimited - work that one out). For this and other reasons the easiest thng to do is to parse the file in a Script Task. The RegularExpressions .NET class is dead handy for this (Useful resource here ).

Add a script task do the necessary getting of the file using System.IO class and split the line using the Split method (note that the seperator is a Char data type not string), we need to reference the right RegularExpression class:

Imports System.Text.RegularExpressions

Then create an immutable regular expression:

Dim dateRegEx As New Regex("[0-9][0-9][0-9][0-9][-][0-9][0-9][-][0-9][0-9][-][0-9][0-9][.][0-9][0-9][.][0-9][0-9][.][0-9][0-9][0-9][0-9][0-9][0-9]")

Then match the string you want to fix with the regular expression and do your reformatting work:

If dateRegEx.IsMatch(StringToFix) Then
revString = StrReverse(StringToFix)
outString = Replace(Replace(revString, "-", " ", , 1), ".", ":")
outString = StrReverse(Replace(outString, ":", ".", , 1)).Substring(0, 23)
End If

Not the cleverest regular expression for date-matching but sufficient for our purposes. Incidentally I found that the SSIS version of .NET didn't seem to like using the /d notation instead of [0-9]. Anyway doing this without RegEx would be a real pain.

There is actually a regular expression task at www.sqlis.com but we wanted to parse the file prior to loading into SSIS. Script Task and RegEx to the rescue...

No comments: