Introduction
There are many ways to deseasonalise seasonal time series data.
I'm going to write about one of the easier methods. I can't remember what the official name for the method is but all we're doing is dividing the data point by the average for each seasonal period.
That definition probably needs explaining. So, say, for example we have quarterly data for 3 years. We take the average for each quarter over the years i.e. average Q1 for years 1, 2 and 3. Then do the same for Q2, Q3 and Q4. After this you divide the Q1 points by the Q1 average, the Q2 points by the Q2 average, etc.
There are some things to be careful of when using this method:
- The more data you have the better. You'll probably need more than 3 years worth of data for this to work well.
- You need to be careful of outliers affecting the seasonal averages by too much when there is very little data.
- The data series needs to be stationary. Or at least, mostly stationary. If it's not, you tend to get step changes in the resulting seasonally adjusted data. By stationary I mean that there is no long term trend in the data.
The Macro
Here is the listing of the macro.
--------------------------------------------------------------------------
Function SeasonAverage(labelRange As range, dataRange As range) As Variant
Dim labelSum As Dictionary
Dim labelCount As Dictionary
Dim labelAvg As Dictionary
Dim totalSum As Double
Dim totalCount As Integer
Dim OutputRows As Long
Dim OutputCols As Long
Dim output() As Double
Dim i As Integer
totalSum = 0
totalCount = 0
Set labelSum = New Dictionary
Set labelCount = New Dictionary
Set labelAvg = New Dictionary
With Application.Caller
OutputRows = .Rows.Count
OutputCols = .Columns.Count
End With
ReDim output(1 To OutputRows, 1 To OutputCols)
'Is dataRange, labelRange and outputRange the same size and shape
If dataRange.Rows.Count = labelRange.Rows.Count & dataRange.Columns.Count = labelRange.Columns.Count & dataRange.Rows.Count = OutputRows & dataRange.Columns.Count = OutputCols Then
Exit Function
End If
If labelRange.Rows.Count < labelRange.Columns.Count Then
For i = 1 To labelRange.Columns.Count
If labelSum.Exists(labelRange(1, i).Value) Then
labelSum.Item(labelRange(1, i).Value) = labelSum.Item(labelRange(1, i).Value) + dataRange(1, i).Value
Else
labelSum.Add Key:=labelRange(1, i).Value, Item:=dataRange(1, i).Value
End If
If labelCount.Exists(labelRange(1, i).Value) Then
labelCount.Item(labelRange(1, i).Value) = labelCount.Item(labelRange(1, i).Value) + 1
Else
labelCount.Add Key:=labelRange(1, i).Value, Item:=1
End If
totalSum = totalSum + dataRange(1, i).Value
totalCount = totalCount + 1
Next
Else
For i = 1 To labelRange.Rows.Count
If labelSum.Exists(labelRange(i, 1).Value) Then
labelSum.Item(labelRange(i, 1).Value) = labelSum.Item(labelRange(i, 1).Value) + dataRange(i, 1).Value
Else
labelSum.Add Key:=labelRange(i, 1).Value, Item:=dataRange(i, 1).Value
End If
If labelCount.Exists(labelRange(i, 1).Value) Then
labelCount.Item(labelRange(i, 1).Value) = labelCount.Item(labelRange(i, 1).Value) + 1
Else
labelCount.Add Key:=labelRange(i, 1).Value, Item:=1
End If
totalSum = totalSum + dataRange(i, 1).Value
totalCount = totalCount + 1
Next
End If
'Calculate average
'Dim ts As String
Dim rKey
For Each rKey In labelSum
labelAvg.Add Key:=rKey, Item:=((labelSum.Item(rKey) * totalCount) / (labelCount.Item(rKey) * totalSum))
Next
'Output data
If dataRange.Rows.Count < dataRange.Columns.Count Then
For i = 1 To dataRange.Columns.Count
output(1, i) = dataRange(1, i).Value / labelAvg.Item(labelRange(1, i).Value)
Next
Else
For i = 1 To dataRange.Rows.Count
output(i, 1) = dataRange(i, 1).Value / labelAvg.Item(labelRange(i, 1).Value)
Next
End If
Set labelSum = Nothing
Set labelCount = Nothing
Set labelAvg = Nothing
SeasonAverage = output
End Function
--------------------------------------------------------------------------
The function takes two arguments. The first is the list of labels. These need to be the list of the seasons (Q1, Q2, etc. or January, February, etc.). The second argument is the data. The output is a variant and hence the function needs to be used as an array function.
The labels are placed in to dictionary objects and averages calculated for each label. Then the data is divided by these averages.
Example
I'm going to use an artificial set of data for my example. The data has been generated by adding a constant line of 100 to a seasonal trend (ranging from -10 to +40) and then some noise (normal distribution with a mean of zero and standard deviation of 4). This is basically the definition of a seasonal data series.
The first chart shows the raw generated data:
The next chart shows the data once the seasonal component has been removed.
The next chart shows the data once the seasonal component has been removed.
As expected the noises compare quite well. As they should, given the mathematics involved.
Conclusion
This is a very simplistic way of seasonally adjusting time series data. It does not take into account trend breaks, outliers (although if there is enough data, this method can be used to spot them), holiday movements, trading days, variable days in the month etc.
It is quite a good first stab at adjusting data and seeing long term trends though. Also, because of the simplicity of the method you can be fairly sure that not too many artifacts are being introduced into the data.
As always, comments, potential improvements etc are welcome.
No comments:
Post a Comment