Introduction
Partial auto-correlation (PACF) is useful in time series analysis. It is generally used with the auto-correlation coefficient for determining the order of the ARIMA processes to be fitted to whatever data set you may have.
According to Dr. Roland Fuss 'the partial auto-correlation describes the supplementary information provided by the additional lag'. Lag in this sense being the time difference between one point and another.
Essentially you use the ACF and PACF to determine what equation to fit to a set of time series data.
The equations that describe the PACF are as follows:
The value in the first equation is what we're interested in.
The Macro
The heart of the macro is contained within the following few lines of code:
1 t(0, 0) = 1
2 t(1, 1) = p(1)
3 For k = 2 To maxLag
4 'Calculate factors to take away from p(i)'
5 totalt = 0
6 For j = 1 To k - 1
7 If k - 1 <> j And k - 2 > 0 Then
8 t(k - 1, j) = t(k - 2, j) - t(k - 1, k - 1) * t(k - 2, k - 1 - j)
9 End If
10 totalt = totalt + t(k - 1, j) * p(k - j)
11 Next
12 t(k, k) = (p(k) - totalt) / (1 - totalt)
13
14 Next
All this macro is doing is taking the auto-correlations, held in p(), and calculating the partial auto-correlations, the t() values or pi in the equation. It's very simple but does assume that all values of t(k,j) for smaller values of k and j have already been calculated.
Line 8 holds equation 2 and calculates the value of t() when k and j are not the same. Otherwise it is assumed that the values have already been calculated.
Lines 6 to 12 hold equation 1.
The bulk of the macro loads the data set, calculates the auto-correlations and only then calculates the partial auto-correlations. It's detailed next.
The Complete Code
And here is the complete PACF function for Excel:
Function PAutoCor(dataRange As range, Optional lag As Variant, Optional diff As Variant) As Variant
Dim x() As Double 'Array to hold data values'
Dim lags() As Integer 'Array to hold lag values'
Dim sx As Double
Dim sy As Double
Dim s1 As Double
Dim s2 As Double
Dim s3 As Double
Dim i, k As Integer 'Loop variables'
Dim outputRows As Long
Dim outputCols As Long
Dim output() As Variant 'Temporary output array'
Dim vertOutput As Boolean
Dim vertInput As Boolean
Dim vertLag As Boolean
Dim numLags As Integer
Dim maxLag As Integer
Dim a As Integer
Dim b As Integer
Dim dataSize As Integer
Dim p() As Double
Dim t() As Double
Dim j As Integer
Dim totalt As Double
'How many lags are there to calculate for?'
With Application.Caller
outputRows = .Rows.Count
outputCols = .Columns.Count
End With
ReDim output(1 To outputRows, 1 To outputCols)
'Vertical or horizontal output?'
If outputRows > outputCols Then
vertOutput = True
numLags = outputRows
'Set all output() to "#N/A"'
For i = 1 To outputRows
output(i, 1) = CVErr(xlErrNA)
Next
Else
vertOutput = False
numLags = outputCols
'Set all output() to "#N/A"'
For i = 1 To outputCols
output(1, i) = CVErr(xlErrNA)
Next
End If
'Vertical or horizontal input?'
If dataRange.Rows.Count > dataRange.Columns.Count Then
vertInput = True
a = 1
b = 0
dataSize = dataRange.Rows.Count - 1
Else
vertInput = False
a = 0
b = 1
dataSize = dataRange.Columns.Count - 1
End If
ReDim x(dataSize)
'Check that lag is there'
If IsMissing(lag) Then
ReDim lags(numLags)
'Populate lags'
For i = 1 To numLags
lags(i) = i
Next
ElseIf TypeName(lag) = "Range" Then
'Horizontal or vertical lag'
If lag.Rows.Count > lag.Columns.Count Then
vertLag = True
'Check that lag range matches output range in size'
'Need to change this for maxLag'
If lag.Rows.Count <> numLags Then numLags = lag.Rows.Count
ReDim lags(numLags)
maxLag = 0
'Populate lags'
For i = 1 To numLags
lags(i) = lag(i, 1).Value
If lags(i) > maxLag Then maxLag = lags(i)
Next
Else
vertLag = False
'Check that lag range matches output range in size'
If lag.Columns.Count <> numLags Then numLags = lag.Columns.Count
ReDim lags(numLags)
maxLag = 0
'Populate lags'
For i = 1 To numLags
lags(i) = lag(1, i).Value
If lags(i) > maxLag Then maxLag = lags(i)
Next
End If
Else
'Should I check for array/single values?'
ReDim lags(1)
'Need to check if integer'
lags(1) = lag
'Set numLags to 1 just in case'
numLags = 1
End If
'Check that diff is there'
If IsMissing(diff) Then
diff = 0
End If
'Fill data array x()'
If diff > 0 Then
For i = 0 To dataSize - 1
If vertInput Then
x(i) = dataRange(i + 1, 1).Value - dataRange(i + 2, 1).Value
Else
x(i) = dataRange(1, i + 1).Value - dataRange(1, i + 2).Value
End If
Next
Else
For i = 0 To dataSize
If vertInput Then
x(i) = dataRange(i + 1, 1).Value
Else
x(i) = dataRange(1, i + 1).Value
End If
Next
End If
If diff > 1 Then
For k = 1 To diff
For i = 0 To dataSize - 1 - k
x(i) = x(i) - x(i + 1)
Next
Next
End If
'Redim for maxlag'
ReDim p(maxLag)
ReDim t(maxLag, maxLag)
'Calculate autocorrelation'
For i = 0 To maxLag
sx = 0
sy = 0
s1 = 0
s2 = 0
s3 = 0
For k = 0 To dataSize - i - diff
sx = x(k) + sx
sy = x(k + i) + sy
Next
sx = sx / (dataSize + 1 - i - diff)
sy = sy / (dataSize + 1 - i - diff)
For k = 0 To dataSize - i - diff
s1 = s1 + (x(k) - sx) * (x(k + i) - sy)
s2 = s2 + (x(k) - sx) ^ 2
s3 = s3 + (x(k + i) - sy) ^ 2
Next
p(i) = s1 / Sqr(s2 * s3)
Next
'Set all ts to zero'
For k = 1 To maxLag
For j = 1 To maxLag
t(j, k) = 0
Next
Next
t(0, 0) = 1
t(1, 1) = p(1)
For k = 2 To maxLag
'Calculate factors to take away from p(i)'
totalt = 0
For j = 1 To k - 1
If k - 1 <> j And k - 2 > 0 Then
t(k - 1, j) = t(k - 2, j) - t(k - 1, k - 1) * t(k - 2, k - 1 - j)
End If
totalt = totalt + t(k - 1, j) * p(k - j)
Next
t(k, k) = (p(k) - totalt) / (1 - totalt)
Next
For k = 1 To numLags
If vertOutput Then
output(k, 1) = t(lags(k), lags(k))
Else
output(1, k) = t(lags(k), lags(k))
End If
Next
PAutoCor = output
End Function
Conclusion
I've added this function to my add-in which can be found on Sourceforge. Version 0.6.12 contains this macro.
As ever, all comments are welcome.