Thursday, 8 August 2013

An Excel macro to simplistically seasonally adjust time series data

Introduction

There are many ways to deseasonalise seasonal time series data.

I'm going to write about one of the easier methods. I can't remember what the official name for the method is but all we're doing is dividing the data point by the average for each seasonal period.

That definition probably needs explaining. So, say, for example we have quarterly data for 3 years. We take the average for each quarter over the years i.e. average Q1 for years 1, 2 and 3. Then do the same for Q2, Q3 and Q4. After this you divide the Q1 points by the Q1 average, the Q2 points by the Q2 average, etc.

There are some things to be careful of when using this method:

  1. The more data you have the better. You'll probably need more than 3 years worth of data for this to work well.
  2. You need to be careful of outliers affecting the seasonal averages by too much when there is very little data.
  3. The data series needs to be stationary. Or at least, mostly stationary. If it's not, you tend to get step changes in the resulting seasonally adjusted data. By stationary I mean that there is no long term trend in the data.

The Macro

Here is the listing of the macro.

--------------------------------------------------------------------------
Function SeasonAverage(labelRange As range, dataRange As range) As Variant

    Dim labelSum As Dictionary
    Dim labelCount As Dictionary
    Dim labelAvg As Dictionary
    Dim totalSum As Double
    Dim totalCount As Integer
    Dim OutputRows As Long
    Dim OutputCols As Long
    Dim output() As Double
    Dim i As Integer
    
    totalSum = 0
    totalCount = 0
    
    Set labelSum = New Dictionary
    Set labelCount = New Dictionary
    Set labelAvg = New Dictionary
    
    With Application.Caller
        OutputRows = .Rows.Count
        OutputCols = .Columns.Count
    End With
    ReDim output(1 To OutputRows, 1 To OutputCols)
    
    'Is dataRange, labelRange and outputRange the same size and shape
    If dataRange.Rows.Count = labelRange.Rows.Count & dataRange.Columns.Count = labelRange.Columns.Count & dataRange.Rows.Count = OutputRows & dataRange.Columns.Count = OutputCols Then
        Exit Function
    End If
    
    If labelRange.Rows.Count < labelRange.Columns.Count Then
        For i = 1 To labelRange.Columns.Count
            If labelSum.Exists(labelRange(1, i).Value) Then
                labelSum.Item(labelRange(1, i).Value) = labelSum.Item(labelRange(1, i).Value) + dataRange(1, i).Value
            Else
                labelSum.Add Key:=labelRange(1, i).Value, Item:=dataRange(1, i).Value
            End If
            If labelCount.Exists(labelRange(1, i).Value) Then
                labelCount.Item(labelRange(1, i).Value) = labelCount.Item(labelRange(1, i).Value) + 1
            Else
                labelCount.Add Key:=labelRange(1, i).Value, Item:=1
            End If
            totalSum = totalSum + dataRange(1, i).Value
            totalCount = totalCount + 1
        Next
    Else
        For i = 1 To labelRange.Rows.Count
            If labelSum.Exists(labelRange(i, 1).Value) Then
                labelSum.Item(labelRange(i, 1).Value) = labelSum.Item(labelRange(i, 1).Value) + dataRange(i, 1).Value
            Else
                labelSum.Add Key:=labelRange(i, 1).Value, Item:=dataRange(i, 1).Value
            End If
            If labelCount.Exists(labelRange(i, 1).Value) Then
                labelCount.Item(labelRange(i, 1).Value) = labelCount.Item(labelRange(i, 1).Value) + 1
            Else
                labelCount.Add Key:=labelRange(i, 1).Value, Item:=1
            End If
            totalSum = totalSum + dataRange(i, 1).Value
            totalCount = totalCount + 1
        Next
    End If
    
    'Calculate average
    'Dim ts As String
    Dim rKey
    For Each rKey In labelSum
        labelAvg.Add Key:=rKey, Item:=((labelSum.Item(rKey) * totalCount) / (labelCount.Item(rKey) * totalSum))
    Next
    
    'Output data
    If dataRange.Rows.Count < dataRange.Columns.Count Then
        For i = 1 To dataRange.Columns.Count
            output(1, i) = dataRange(1, i).Value / labelAvg.Item(labelRange(1, i).Value)
        Next
    Else
        For i = 1 To dataRange.Rows.Count
            output(i, 1) = dataRange(i, 1).Value / labelAvg.Item(labelRange(i, 1).Value)
        Next
    End If
    
    Set labelSum = Nothing
    Set labelCount = Nothing
    Set labelAvg = Nothing
    
    SeasonAverage = output
    
End Function
--------------------------------------------------------------------------


The function takes two arguments. The first is the list of labels. These need to be the list of the seasons (Q1, Q2, etc. or January, February, etc.). The second argument is the data. The output is a variant and hence the function needs to be used as an array function.

The labels are placed in to dictionary objects and averages calculated for each label. Then the data is divided by these averages.

Example

I'm going to use an artificial set of data for my example. The data has been generated by adding a constant line of 100 to a seasonal trend (ranging from -10 to +40) and then some noise (normal distribution with a mean of zero and standard deviation of 4). This is basically the definition of a seasonal data series.

The first chart shows the raw generated data:


The next chart shows the data once the seasonal component has been removed.

And the third chart compares the generated noise with the derived noise from applying the macro.

As expected the noises compare quite well. As they should, given the mathematics involved.

Conclusion

This is a very simplistic way of seasonally adjusting time series data. It does not take into account trend breaks, outliers (although if there is enough data, this method can be used to spot them), holiday movements, trading days, variable days in the month etc.

It is quite a good first stab at adjusting data and seeing long term trends though. Also, because of the simplicity of the method you can be fairly sure that not too many artifacts are being introduced into the data.

As always, comments, potential improvements etc are welcome.

No comments:

Post a Comment