Stats, MR and Data: An Excel Auto-Correlation Macro

Wikipedia defines the auto-correlation as the cross-correlation of a signal with itself. For a series of data points measured through time it is the correlation between points measured a specific time apart.

I won't go into any more detail about auto-correlation as there are many good descriptions online:

Wikipedia
Mathworld
NIST Engineering Statistics Handbook - this handbook is well worth reading for definitions of other time series related functions

There's no point repeating what others have already written. Especially when they've written it better than I ever could.

The only things I will define relate directly to the function. These are:

Lag - this is the number of points n from point t to calculate the auto-correlation for. So for lag=1, the correlation is calculated for points (t, t+1, t+2, ...) with points (t+1, t+2, t+3, ...).
Difference - when calculating the auto-correlation one often needs to calculate it on the series (t+1-t) rather than (t). This is to ensure some degree of stationarity and a clearer picture of the auto-correlation.

Function

So, on to the function. It's another array function (see here for details on how to use them). I've

called the function 'AutoCor' and is used as per the example below:

=AutoCor($C$3:$C$304,$H$3:$H$23,I$2)

where, in this case cells $C$3:$C$304 hold the data, $H$3:$H$23 hold the lags and I$2 holds the number of times to difference the data.

-----------------------------------------------------------------------------

Function AutoCor(dataRange As range, lag As range, diff As Integer) As Variant

Dim x() As Double 'Array to hold data values

Dim sx As Double

Dim sy As Double

Dim s1 As Double

Dim s2 As Double

Dim s3 As Double

Dim i, j, k As Integer 'Loop variables

Dim y() As Integer 'Lag array

Dim OutputRows As Long

Dim OutputCols As Long

Dim output() As Double

'Set-up data variable

ReDim x(dataRange.Rows.Count - 1)

If diff > 0 Then

For i = 0 To dataRange.Rows.Count - 2

x(i) = dataRange(i + 1, 1).Value - dataRange(i + 2, 1).Value

Else

For i = 0 To dataRange.Rows.Count - 1

x(i) = dataRange(i + 1, 1).Value

End If

If diff > 1 Then

For k = 1 To diff

For i = 0 To dataRange.Rows.Count - 2 - k

x(i) = x(i) - x(i + 1)

End If

'Set-up lag variable

ReDim y(lag.Rows.Count - 1)

For i = 0 To lag.Rows.Count - 1

y(i) = lag(i + 1, 1).Value

'Set-up output variable

With Application.Caller

OutputRows = .Rows.Count

OutputCols = .Columns.Count

End With

If OutputRows <> dataRange.Rows.Count And OutputCols <> dataRange.Columns.Count Then Exit Function

ReDim output(1 To OutputRows, 1 To OutputCols)

For j = 0 To lag.Rows.Count - 1

sx = 0

sy = 0

s1 = 0

s2 = 0

s3 = 0

For k = 0 To dataRange.Rows.Count - y(j) - 1 - diff

sx = x(k) + sx

sy = x(k + y(j)) + sy

sx = sx / (dataRange.Rows.Count - y(j) - diff)

sy = sy / (dataRange.Rows.Count - y(j) - diff)

For k = 0 To dataRange.Rows.Count - y(j) - 1 - diff

s1 = s1 + (x(k) - sx) * (x(k + y(j)) - sy)

s2 = s2 + (x(k) - sx) ^ 2

s3 = s3 + (x(k + y(j)) - sy) ^ 2

output(j + 1, 1) = s1 / Sqr(s2 * s3)

AutoCor = output

End Function

-----------------------------------------------------------------------------

Example

For the example, I'm going to use the clothing sales indices for the UK from 1988 to 2013 from the ONS:

This is a lot of data (in times series analysis anyway) and should give a nice clear auto-correlation. I'll plot lags of 0 to 20 for differences of 0 to 4 (because I can).

Lag	0	1	2	3	4
0	1	1	1	1	1
1	0.723	-0.298	-0.687	-0.758	-0.801
2	0.609	-0.165	0.192	0.308	0.392
3	0.590	-0.084	0.020	-0.017	-0.066
4	0.612	-0.047	-0.080	-0.097	-0.096
5	0.660	0.118	0.142	0.157	0.168
6	0.642	-0.053	-0.166	-0.180	-0.190
7	0.654	0.112	0.134	0.150	0.162
8	0.602	-0.040	-0.070	-0.087	-0.087
9	0.572	-0.086	0.017	-0.022	-0.075
10	0.592	-0.167	0.177	0.301	0.395
11	0.710	-0.282	-0.666	-0.744	-0.791
12	0.994	0.984	0.980	0.979	0.976
13	0.709	-0.289	-0.674	-0.743	-0.785
14	0.588	-0.165	0.189	0.303	0.385
15	0.568	-0.090	0.016	-0.019	-0.067
16	0.592	-0.045	-0.076	-0.094	-0.094
17	0.643	0.119	0.143	0.158	0.168
18	0.624	-0.055	-0.170	-0.183	-0.192
19	0.637	0.116	0.137	0.152	0.164
20	0.582	-0.039	-0.068	-0.085	-0.087

Plotting differences 0 and 1 gives:

and:

The original time series has a strong month on month growth and this is reflected in the auto-correlation chart with no differencing. There are very few lags with a value less than 0.6.

However, when the data is differenced once all correlations for lags other than 1 and 12 are close to zero. This is expected in this data series as the sales of clothing are highly seasonal with Christmas being the peak period every year.

Significance

How can we tell whether any one value of auto-correlation is significant though? It is generally accepted that significance is shown by a value that is greater than the square root of 2/number of points.

In this case, this is 0.081.

Due to the number of points, several lags are deemed significant, although only lag 12 has a very high significance.

2 comments:

Anonymous4 July 2014 at 07:10
hi!can you elaborate a bit on the "difference" term? I am trying to find the correlation of a controller's output signal in order to determine its oscillation..i ve tried with the box-jenkins autocorrelation function (rk) but the results dindnt make any sense..i have thousands of sample data and all are equi-spaced..if you have any ideas that may help me please reply!
THANK YOU!

Wednesday, 29 May 2013

An Excel Auto-Correlation Macro

Function

Example

Significance

2 comments: