I won't go into any more detail about auto-correlation as there are many good descriptions online:
- Wikipedia
- Mathworld
- NIST Engineering Statistics Handbook - this handbook is well worth reading for definitions of other time series related functions
There's no point repeating what others have already written. Especially when they've written it better than I ever could.
The only things I will define relate directly to the function. These are:
- Lag - this is the number of points n from point t to calculate the auto-correlation for. So for lag=1, the correlation is calculated for points (t, t+1, t+2, ...) with points (t+1, t+2, t+3, ...).
- Difference - when calculating the auto-correlation one often needs to calculate it on the series (t+1-t) rather than (t). This is to ensure some degree of stationarity and a clearer picture of the auto-correlation.
Function
=AutoCor($C$3:$C$304,$H$3:$H$23,I$2)
where, in this case cells $C$3:$C$304 hold the data, $H$3:$H$23 hold the lags and I$2 holds the number of times to difference the data.
-----------------------------------------------------------------------------
Function AutoCor(dataRange As range, lag As range, diff As Integer) As Variant
Dim x() As Double 'Array to hold data values
Dim sx As Double
Dim sy As Double
Dim s1 As Double
Dim s2 As Double
Dim s3 As Double
Dim i, j, k As Integer 'Loop variables
Dim y() As Integer 'Lag array
Dim OutputRows As Long
Dim OutputCols As Long
Dim output() As Double
'Set-up data variable
ReDim x(dataRange.Rows.Count - 1)
If diff > 0 Then
For i = 0 To dataRange.Rows.Count - 2
x(i) = dataRange(i + 1, 1).Value - dataRange(i + 2, 1).Value
Next
Else
For i = 0 To dataRange.Rows.Count - 1
x(i) = dataRange(i + 1, 1).Value
Next
End If
If diff > 1 Then
For k = 1 To diff
For i = 0 To dataRange.Rows.Count - 2 - k
x(i) = x(i) - x(i + 1)
Next
Next
End If
'Set-up lag variable
ReDim y(lag.Rows.Count - 1)
For i = 0 To lag.Rows.Count - 1
y(i) = lag(i + 1, 1).Value
Next
'Set-up output variable
With Application.Caller
OutputRows = .Rows.Count
OutputCols = .Columns.Count
End With
If OutputRows <> dataRange.Rows.Count And OutputCols <> dataRange.Columns.Count Then Exit Function
ReDim output(1 To OutputRows, 1 To OutputCols)
For j = 0 To lag.Rows.Count - 1
sx = 0
sy = 0
s1 = 0
s2 = 0
s3 = 0
For k = 0 To dataRange.Rows.Count - y(j) - 1 - diff
sx = x(k) + sx
sy = x(k + y(j)) + sy
Next
sx = sx / (dataRange.Rows.Count - y(j) - diff)
sy = sy / (dataRange.Rows.Count - y(j) - diff)
For k = 0 To dataRange.Rows.Count - y(j) - 1 - diff
s1 = s1 + (x(k) - sx) * (x(k + y(j)) - sy)
s2 = s2 + (x(k) - sx) ^ 2
s3 = s3 + (x(k + y(j)) - sy) ^ 2
Next
output(j + 1, 1) = s1 / Sqr(s2 * s3)
Next
AutoCor = output
End Function
-----------------------------------------------------------------------------
Example
For the example, I'm going to use the clothing sales indices for the UK from 1988 to 2013 from the ONS:
This is a lot of data (in times series analysis anyway) and should give a nice clear auto-correlation. I'll plot lags of 0 to 20 for differences of 0 to 4 (because I can).
Lag | 0 | 1 | 2 | 3 | 4 |
0 | 1 | 1 | 1 | 1 | 1 |
1 | 0.723 | -0.298 | -0.687 | -0.758 | -0.801 |
2 | 0.609 | -0.165 | 0.192 | 0.308 | 0.392 |
3 | 0.590 | -0.084 | 0.020 | -0.017 | -0.066 |
4 | 0.612 | -0.047 | -0.080 | -0.097 | -0.096 |
5 | 0.660 | 0.118 | 0.142 | 0.157 | 0.168 |
6 | 0.642 | -0.053 | -0.166 | -0.180 | -0.190 |
7 | 0.654 | 0.112 | 0.134 | 0.150 | 0.162 |
8 | 0.602 | -0.040 | -0.070 | -0.087 | -0.087 |
9 | 0.572 | -0.086 | 0.017 | -0.022 | -0.075 |
10 | 0.592 | -0.167 | 0.177 | 0.301 | 0.395 |
11 | 0.710 | -0.282 | -0.666 | -0.744 | -0.791 |
12 | 0.994 | 0.984 | 0.980 | 0.979 | 0.976 |
13 | 0.709 | -0.289 | -0.674 | -0.743 | -0.785 |
14 | 0.588 | -0.165 | 0.189 | 0.303 | 0.385 |
15 | 0.568 | -0.090 | 0.016 | -0.019 | -0.067 |
16 | 0.592 | -0.045 | -0.076 | -0.094 | -0.094 |
17 | 0.643 | 0.119 | 0.143 | 0.158 | 0.168 |
18 | 0.624 | -0.055 | -0.170 | -0.183 | -0.192 |
19 | 0.637 | 0.116 | 0.137 | 0.152 | 0.164 |
20 | 0.582 | -0.039 | -0.068 | -0.085 | -0.087 |
and:
The original time series has a strong month on month growth and this is reflected in the auto-correlation chart with no differencing. There are very few lags with a value less than 0.6.
However, when the data is differenced once all correlations for lags other than 1 and 12 are close to zero. This is expected in this data series as the sales of clothing are highly seasonal with Christmas being the peak period every year.
Significance
How can we tell whether any one value of auto-correlation is significant though? It is generally accepted that significance is shown by a value that is greater than the square root of 2/number of points.
In this case, this is 0.081.
Due to the number of points, several lags are deemed significant, although only lag 12 has a very high significance.
hi!can you elaborate a bit on the "difference" term? I am trying to find the correlation of a controller's output signal in order to determine its oscillation..i ve tried with the box-jenkins autocorrelation function (rk) but the results dindnt make any sense..i have thousands of sample data and all are equi-spaced..if you have any ideas that may help me please reply!
ReplyDeleteTHANK YOU!
The difference term is just the number of times the data has been differenced. By differencing we're calculating the auto-correlation on the difference between one data point and the next i.e. x(i+1)-x(i). Generally this removes trend from a data series. This would normally have to be done as most of the analysis techniques rely on this to be true (among other things). There are plenty of sites that have a lot more detail on this - national statistics sites are normally quite good. The Australian Bureau of Statistics has a good introduction to TS analysis (google 'abs time series introductory course')
DeleteBox-jenkins generally refers to ARIMA/ARMA modelling with auto-correlation being used to work out which model you should be using. I'm not sure what you're doing. If you just want to know the frequencies of the oscillation, a fourier transform may be useful. It depends on the type of data you're getting.