Monday 8 August 2011

Splitting dates over monthly periods using a 'numbers table'

One of the things you may find yourself having to do is to take a single record with a start and end date and subdivide it into monthly or weekly periods. Typically this is so that you can create a record for each month or week in a data warehouse, and have a data structure thats easy to report from. There are many ways of doing this of course, but an elegant set-based solution involves using a 'numbers table'.

Numbers tables are well documented elsewhere, however they can now be created in a trivial fashion in Denali using a Sequence, with the following script.


USE adventureworks2008r2
SET nocount ON

IF EXISTS (SELECT 1
           FROM   information_schema.tables
           WHERE  table_name = 'Numbers')
  DROP TABLE numbers

GO

CREATE TABLE numbers
  (
     NUMBER INT NOT NULL,
     CONSTRAINT pk_numbers PRIMARY KEY CLUSTERED (NUMBER) WITH FILLFACTOR = 100
  )

IF EXISTS (SELECT 1
           FROM   sys.objects
           WHERE  name = 'SeqNumbers'
AND type_desc = 'SEQUENCE_OBJECT')
  DROP sequence dbo.seqnumbers

GO

CREATE SEQUENCE SeqNumbers
AS INT
MINVALUE 0
GO

INSERT Numbers(Number) VALUES(NEXT VALUE FOR SeqNumbers)
GO 10000

This will give you a table of numbers from 0-10000.

Then a simple query will split a table into monthly records - or any other time period - by joining to the numbers table using the difference between the start and end dates of the rows you want to split. An example will probably make it clearer - we'll use the Production.BillOfMaterials in the AdventureWorks2008R2 database.

Firstly lets make the dates a little less uniform:

UPDATE  [Production].[BillOfMaterials]
SET    EndDate = '20040405'
WHERE  BillOfMaterialsID=271

UPDATE  [Production].[BillOfMaterials]
SET    StartDate='20040731', EndDate = '20040801'
WHERE  BillOfMaterialsID=1950


UPDATE  [Production].[BillOfMaterials]
SET    StartDate='20041031', EndDate = '20051130'
WHERE  BillOfMaterialsID=2899

Querying the Production.BillOfMaterials for an Id and start and end dates and quantity looks something like this:


Taking the difference between the start and end dates and joining to the numbers table gives one row per month that exists between the start and end dates. A little jiggery-pokery with the date functions gives you the start and end of each month over which the bill of materials spanned, like so;

SELECT [BillOfMaterialsID],
       [StartDate],
       [EndDate],
     DATEDIFF(m, StartDate, EndDate) NumberOfMonths,
     CAST(CONVERT(CHAR(6), DATEADD(m, Number, StartDate), 112)+'01' AS DATETIME) MonthStartDate,
     DATEADD(dd, -1, (CAST(CONVERT(CHAR(6), DATEADD(m, Number+1, StartDate), 112)+'01' AS DATETIME))) MonthEndDate,
     CAST(PerAssemblyQty/(DATEDIFF(m, StartDate,  EndDate)+1) AS DECIMAL(5,2))PerAssemblyQtyMonthly,
     PerAssemblyQty
  FROM [Production].[BillOfMaterials] b
  INNER JOIN dbo.Numbers n ON DATEDIFF(m, StartDate, EndDate)>=n.Number
  WHERE [EndDate] IS NOT NULL
You'll note that I also divided the PerAssemblyQty by the number of months (adding one because the DATEDIFF on two dates in the same month returns 0). Finally lets see the result set