Introduction
Partial auto-correlation (PACF) is useful in time series analysis. It is generally used with the auto-correlation coefficient for determining the order of the ARIMA processes to be fitted to whatever data set you may have.
According to Dr. Roland Fuss 'the partial auto-correlation describes the supplementary information provided by the additional lag'. Lag in this sense being the time difference between one point and another.
Essentially you use the ACF and PACF to determine what equation to fit to a set of time series data.
The equations that describe the PACF are as follows:
The value in the first equation is what we're interested in.
The Macro
The heart of the macro is contained within the following few lines of code:
1 t(0, 0) = 1 2 t(1, 1) = p(1) 3 For k = 2 To maxLag 4 'Calculate factors to take away from p(i)' 5 totalt = 0 6 For j = 1 To k - 1 7 If k - 1 <> j And k - 2 > 0 Then 8 t(k - 1, j) = t(k - 2, j) - t(k - 1, k - 1) * t(k - 2, k - 1 - j) 9 End If 10 totalt = totalt + t(k - 1, j) * p(k - j) 11 Next 12 t(k, k) = (p(k) - totalt) / (1 - totalt) 13 14 Next
All this macro is doing is taking the auto-correlations, held in p(), and calculating the partial auto-correlations, the t() values or pi in the equation. It's very simple but does assume that all values of t(k,j) for smaller values of k and j have already been calculated.
Line 8 holds equation 2 and calculates the value of t() when k and j are not the same. Otherwise it is assumed that the values have already been calculated.
Lines 6 to 12 hold equation 1.
The bulk of the macro loads the data set, calculates the auto-correlations and only then calculates the partial auto-correlations. It's detailed next.
The Complete Code
And here is the complete PACF function for Excel:
Function PAutoCor(dataRange As range, Optional lag As Variant, Optional diff As Variant) As Variant Dim x() As Double 'Array to hold data values' Dim lags() As Integer 'Array to hold lag values' Dim sx As Double Dim sy As Double Dim s1 As Double Dim s2 As Double Dim s3 As Double Dim i, k As Integer 'Loop variables' Dim outputRows As Long Dim outputCols As Long Dim output() As Variant 'Temporary output array' Dim vertOutput As Boolean Dim vertInput As Boolean Dim vertLag As Boolean Dim numLags As Integer Dim maxLag As Integer Dim a As Integer Dim b As Integer Dim dataSize As Integer Dim p() As Double Dim t() As Double Dim j As Integer Dim totalt As Double 'How many lags are there to calculate for?' With Application.Caller outputRows = .Rows.Count outputCols = .Columns.Count End With ReDim output(1 To outputRows, 1 To outputCols) 'Vertical or horizontal output?' If outputRows > outputCols Then vertOutput = True numLags = outputRows 'Set all output() to "#N/A"' For i = 1 To outputRows output(i, 1) = CVErr(xlErrNA) Next Else vertOutput = False numLags = outputCols 'Set all output() to "#N/A"' For i = 1 To outputCols output(1, i) = CVErr(xlErrNA) Next End If 'Vertical or horizontal input?' If dataRange.Rows.Count > dataRange.Columns.Count Then vertInput = True a = 1 b = 0 dataSize = dataRange.Rows.Count - 1 Else vertInput = False a = 0 b = 1 dataSize = dataRange.Columns.Count - 1 End If ReDim x(dataSize) 'Check that lag is there' If IsMissing(lag) Then ReDim lags(numLags) 'Populate lags' For i = 1 To numLags lags(i) = i Next ElseIf TypeName(lag) = "Range" Then 'Horizontal or vertical lag' If lag.Rows.Count > lag.Columns.Count Then vertLag = True 'Check that lag range matches output range in size' 'Need to change this for maxLag' If lag.Rows.Count <> numLags Then numLags = lag.Rows.Count ReDim lags(numLags) maxLag = 0 'Populate lags' For i = 1 To numLags lags(i) = lag(i, 1).Value If lags(i) > maxLag Then maxLag = lags(i) Next Else vertLag = False 'Check that lag range matches output range in size' If lag.Columns.Count <> numLags Then numLags = lag.Columns.Count ReDim lags(numLags) maxLag = 0 'Populate lags' For i = 1 To numLags lags(i) = lag(1, i).Value If lags(i) > maxLag Then maxLag = lags(i) Next End If Else 'Should I check for array/single values?' ReDim lags(1) 'Need to check if integer' lags(1) = lag 'Set numLags to 1 just in case' numLags = 1 End If 'Check that diff is there' If IsMissing(diff) Then diff = 0 End If 'Fill data array x()' If diff > 0 Then For i = 0 To dataSize - 1 If vertInput Then x(i) = dataRange(i + 1, 1).Value - dataRange(i + 2, 1).Value Else x(i) = dataRange(1, i + 1).Value - dataRange(1, i + 2).Value End If Next Else For i = 0 To dataSize If vertInput Then x(i) = dataRange(i + 1, 1).Value Else x(i) = dataRange(1, i + 1).Value End If Next End If If diff > 1 Then For k = 1 To diff For i = 0 To dataSize - 1 - k x(i) = x(i) - x(i + 1) Next Next End If 'Redim for maxlag' ReDim p(maxLag) ReDim t(maxLag, maxLag) 'Calculate autocorrelation' For i = 0 To maxLag sx = 0 sy = 0 s1 = 0 s2 = 0 s3 = 0 For k = 0 To dataSize - i - diff sx = x(k) + sx sy = x(k + i) + sy Next sx = sx / (dataSize + 1 - i - diff) sy = sy / (dataSize + 1 - i - diff) For k = 0 To dataSize - i - diff s1 = s1 + (x(k) - sx) * (x(k + i) - sy) s2 = s2 + (x(k) - sx) ^ 2 s3 = s3 + (x(k + i) - sy) ^ 2 Next p(i) = s1 / Sqr(s2 * s3) Next 'Set all ts to zero' For k = 1 To maxLag For j = 1 To maxLag t(j, k) = 0 Next Next t(0, 0) = 1 t(1, 1) = p(1) For k = 2 To maxLag 'Calculate factors to take away from p(i)' totalt = 0 For j = 1 To k - 1 If k - 1 <> j And k - 2 > 0 Then t(k - 1, j) = t(k - 2, j) - t(k - 1, k - 1) * t(k - 2, k - 1 - j) End If totalt = totalt + t(k - 1, j) * p(k - j) Next t(k, k) = (p(k) - totalt) / (1 - totalt) Next For k = 1 To numLags If vertOutput Then output(k, 1) = t(lags(k), lags(k)) Else output(1, k) = t(lags(k), lags(k)) End If Next PAutoCor = output End Function
Conclusion
I've added this function to my add-in which can be found on Sourceforge. Version 0.6.12 contains this macro.As ever, all comments are welcome.
No comments:
Post a Comment