Sunday, 8 June 2014

UK TV Listings in Excel

UK TV Listings in Excel

This article describes a simple way of getting TV programme listings from the Internet and into Excel using VBA.

For this I'm going to use the XMLTV feed provided by the Radio Times. This feed is usable only for personal and non-commercial use and is copyrighted. It is useful for analysing the TV guide though.

Method

The first thing to do is create a new module to hold the code. Let's call this TVGuide. To hold all the data we need to create a new workbook and worksheet. Then we can set up an external data query from a web page to get the channel data. I used the macro recorder to do this part which is why it's a bit verbose. I'm not sure how much of the code I need but it doesn't do any harm leaving it there.


Sub GetChannels()
'
' Get Channels from Radio Times
'

    Dim wb As Workbook
    Dim wk As Worksheet
    
    Set wb = Application.Workbooks.Add
    Set wk = ActiveWorkbook.Worksheets.Add
    wk.Name = "Channels"
    
    With ActiveSheet.QueryTables.Add(Connection:="URL;http://xmltv.radiotimes.com/xmltv/channels.dat", Destination:=Range("$A$1"))
        .Name = "channels"
        .FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = True
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .WebSelectionType = xlEntirePage
        .WebFormatting = xlWebFormattingNone
        .WebPreFormattedTextToColumns = True
        .WebConsecutiveDelimitersAsOne = True
        .WebSingleBlockTextImport = False
        .WebDisableDateRecognition = False
        .WebDisableRedirections = False
        .Refresh BackgroundQuery:=False
    End With
    
End Sub
     

You'll note that the address of the feed is http://xmltv.radiotimes.com/xmltv/channels.dat. This page contains a list of the channel numbers with their descriptions separated by a pipe character. So the next job is to separate these into two columns.


'Split data by pipe
Columns("A:A").Select
Selection.TextToColumns Destination:=Range("A1"), DataType:=xlDelimited, _
    TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=False, _
    Semicolon:=False, Comma:=False, Space:=False, Other:=True, OtherChar _
    :="|", FieldInfo:=Array(Array(1, 1), Array(2, 1)), TrailingMinusNumbers:=True

'Delete row 2
Rows("2:2").Select
Selection.Delete Shift:=xlUp

'Label columns
Range("A1").FormulaR1C1 = "Channel Number"
Range("B1").FormulaR1C1 = "Channel"

'Change widths of columns
Columns("A:A").EntireColumn.AutoFit
Columns("B:B").EntireColumn.AutoFit

'Sort data
wk.Sort.SortFields.Clear
wk.Sort.SortFields.Add Key:=Range("B:B"), SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
With wk.Sort
    .SetRange Range("A1").CurrentRegion
    .Header = xlYes
    .MatchCase = False
    .Orientation = xlTopToBottom
    .SortMethod = xlPinYin
    .Apply
End With
     

The above code needs to be inserted before the End Sub of the previous code example. I've also labelled the columns, deleted a row, changed the widths of the columns and sorted the data by channel description. This makes it easier to find the channel you're interested in.

So, that's nice. You have a list of channels sorted by their description. It's not very useful though. Fortunately, it's also easy to get the actual TV guide data for each channel. The address is almost exactly the same as before but instead of channel.dat at the end of the address we have {channel number}.dat where {channel number} is the number from the previous list.

Now all we need is some way to use the first list to get the actual time data. The easiest way (well, I thought it was easy) is to create a class module and enable events on the workbook that you've created.

So let's create that new class module and call it ChannelSelector. We want to allow events within the workbook we've created. Therefore we need to create a variable for the workbook in question and use the WithEvents keyword. We also need to attach this newly defined variable to the workbook that was created earlier.


Option Explicit
Private WithEvents App As Workbook

Private Sub Class_Initialize()
    Set App = ActiveWorkbook
End Sub
     

The above five lines are almost all that's needed to do this. We also need to create a variable within the TVGuide module for this class i.e. instantiate an object of this class. So:


Option Explicit
Public ChannelHandler As ChannelSelector
     

defines the variable in the module. We then 'set' this variable to the class defined above. This is done at the end of the GetChannels() subroutine.


'Set events for channel numbers
Set ChannelHandler = New ChannelSelector
     

Now that we've allowed user-defined events for this workbook, we need to create a subroutine to handle them. In this case we want to handle the double click event when the user double clicks on the channel number. Therefore we use the SheetBeforeDoubleClick event. Below is the code for the event handler.


Private Sub App_SheetBeforeDoubleClick(ByVal Sh As Object, ByVal Target As Range, Cancel As Boolean)
    Dim channel As String
    Dim wk As Worksheet
    
    'Cancel event
    Cancel = True
    
    channel = Target.Text
    
    Set wk = ActiveWorkbook.Worksheets.Add
    wk.Name = channel
    With ActiveSheet.QueryTables.Add(Connection:="URL;http://xmltv.radiotimes.com/xmltv/" & channel & ".dat", Destination:=Range("$A$1"))
        .Name = "chan"
        .FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = True
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .WebSelectionType = xlEntirePage
        .WebFormatting = xlWebFormattingNone
        .WebPreFormattedTextToColumns = True
        .WebConsecutiveDelimitersAsOne = True
        .WebSingleBlockTextImport = False
        .WebDisableDateRecognition = False
        .WebDisableRedirections = False
        .Refresh BackgroundQuery:=False
    End With
    
    'Split data by tilde
    Columns("A:A").Select
    Selection.TextToColumns Destination:=Range("A1"), DataType:=xlDelimited, _
        TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=False, _
        Semicolon:=False, Comma:=False, Space:=False, Other:=True, OtherChar _
        :="~", FieldInfo:=Array(Array(1, 1), Array(2, 1)), TrailingMinusNumbers:=True
        
End Sub
     

This uses almost exactly the same code as before to extract data from the XMLTV feed. This time however the data needs to be split by the tilde (~) character.

Conclusion

And there you have it, the times for the next two weeks for all the programs for the channel that you just double clicked.

Obviously this is just a simple example and there is no error checking whatsoever. If you double click on something other than a relevant channel number then the program will fail with a runtime error.

There's a lot that can be done to expand on this and produce a nice TV Guide. I just wanted to give a simple introduction though.

Thursday, 17 April 2014

Clearing the Excel Text To Columns Pasting Feature using VBA

One issue with using the 'Text to Columns' feature within Excel is that any text that you subsequently paste also gets split into columns - whether you wanted it done or not.

There doesn't seem to be any way to turn this feature off. However, there are three methods that I know of that can stop it happening. These are:

  1. Close Excel - this seems to me to be a bit of a nuclear option. You have to save all your work, close the application and then reopen all the spreadsheets again. Boring.
  2. Use the 'Text to Columns' feature but with different options. This sets the TTC procedure to a different delimiter and so you can now paste without data wandering off into subsequent columns. Annoying.
  3. Use a macro to reset the TTC options. It's described below. Better.
The macro is very simple and just repeats the steps that you'd take in option 2 above.

The Code


1 Dim v As Variant
2
3 v = ActiveCell
4 ActiveCell.Value = "Rem"
5 ActiveCell.TextToColumns Tab:=False, Semicolon:=False, Comma:=False, Space:=False, Other:=False, DataType:=xlDelimited
6 ActiveCell = v
Line 1 declares a variable to hold whatever is in the current active cell. Line 3 then populates this variable with the ActiveCell object. Line 4 puts a temporary value in this cell. Then in line 5 we run a TTC with no delimiters. This resets the paste options. Line 6 resets the active cell.

There are a few problems with writing the macro like this.
  • For a start, you have to select a cell.
  • This cell can have no protection enabled.
  • A worksheet/book has to be open.
  • The active cell can't be part of a pivot table.

Survey Science Add-in

A better way would be to include this procedure as part of an add-in that could be run via a button in a custom ribbon and use a worksheet in the add-in. This way it can be run with a single click and you know what cell it was over-writing.

And this is how I've implemented it in my add-in.

As usual comments are welcome.

Wednesday, 9 April 2014

Market Research and Statistics Excel add-in Part 4

Another update to the Excel add-in. It's now on version 0.7.

I felt that it was time to up the minor version number as I'd added more functionality and a lot more bug fixes. I've even managed to write some documentation for rim weighting.

The most notable addition is a subroutine to select a stratified sample from a sampling frame.

The full list of changes for version 0.7 and the add-in itself are on:
http://sourceforge.net/projects/surveyscience

Comments on the add-in are always welcome.

Tuesday, 25 March 2014

Market Research and Statistics Excel add-in Part 3

I've updated the Excel add-in. It's now on version 0.6.13.

Most of the changes are bug fixes but I've added some functionality. Most notably these are:

  • Partial auto-correlation function
  • Generate a set of frequency tables from a data set
  • List file details in a folder
The full list of changes since version 0.6 and the add-in itself are on:
  • sciolist.weebly.com
  • http://sourceforge.net/projects/surveyscience
Comments on the add-in are very welcome.

Thursday, 20 March 2014

Excel macro for partial auto correlation


Introduction

Partial auto-correlation (PACF) is useful in time series analysis. It is generally used with the auto-correlation coefficient for determining the order of the ARIMA processes to be fitted to whatever data set you may have.

According to Dr. Roland Fuss 'the partial auto-correlation describes the supplementary information provided by the additional lag'. Lag in this sense being the time difference between one point and another.

Essentially you use the ACF and PACF to determine what equation to fit to a set of time series data.

The equations that describe the PACF are as follows:
The value in the first equation is what we're interested in.

The Macro

The heart of the macro is contained within the following few lines of code:

1    t(0, 0) = 1
2    t(1, 1) = p(1)
3    For k = 2 To maxLag
4        'Calculate factors to take away from p(i)'
5        totalt = 0
6        For j = 1 To k - 1
7            If k - 1 <> j And k - 2 > 0 Then
8                t(k - 1, j) = t(k - 2, j) - t(k - 1, k - 1) * t(k - 2, k - 1 - j)
9            End If
10           totalt = totalt + t(k - 1, j) * p(k - j)
11       Next
12       t(k, k) = (p(k) - totalt) / (1 - totalt)
13        
14   Next


All this macro is doing is taking the auto-correlations, held in p(), and calculating the partial auto-correlations, the t() values or pi in the equation. It's very simple but does assume that all values of t(k,j) for smaller values of k and j have already been calculated.

Line 8 holds equation 2 and calculates the value of t() when k and j are not the same. Otherwise it is assumed that the values have already been calculated.

Lines 6 to 12 hold equation 1.

The bulk of the macro loads the data set, calculates the auto-correlations and only then calculates the partial auto-correlations. It's detailed next.


The Complete Code

And here is the complete PACF function for Excel:

Function PAutoCor(dataRange As range, Optional lag As Variant, Optional diff As Variant) As Variant
    Dim x() As Double           'Array to hold data values'
    Dim lags() As Integer       'Array to hold lag values'
    Dim sx As Double
    Dim sy As Double
    Dim s1 As Double
    Dim s2 As Double
    Dim s3 As Double
    Dim i, k As Integer         'Loop variables'
    Dim outputRows As Long
    Dim outputCols As Long
    Dim output() As Variant     'Temporary output array'
    Dim vertOutput As Boolean
    Dim vertInput As Boolean
    Dim vertLag As Boolean
    Dim numLags As Integer
    Dim maxLag As Integer
    Dim a As Integer
    Dim b As Integer
    Dim dataSize As Integer
    Dim p() As Double
    Dim t() As Double
    Dim j As Integer
    Dim totalt As Double
    
    'How many lags are there to calculate for?'
    With Application.Caller
        outputRows = .Rows.Count
        outputCols = .Columns.Count
    End With
    ReDim output(1 To outputRows, 1 To outputCols)
    
    'Vertical or horizontal output?'
    If outputRows > outputCols Then
        vertOutput = True
        numLags = outputRows
        'Set all output() to "#N/A"'
        For i = 1 To outputRows
            output(i, 1) = CVErr(xlErrNA)
        Next
    Else
        vertOutput = False
        numLags = outputCols
        'Set all output() to "#N/A"'
        For i = 1 To outputCols
            output(1, i) = CVErr(xlErrNA)
        Next
    End If
    
    'Vertical or horizontal input?'
    If dataRange.Rows.Count > dataRange.Columns.Count Then
        vertInput = True
        a = 1
        b = 0
        dataSize = dataRange.Rows.Count - 1
    Else
        vertInput = False
        a = 0
        b = 1
        dataSize = dataRange.Columns.Count - 1
    End If
    ReDim x(dataSize)
    
    'Check that lag is there'
    If IsMissing(lag) Then
        ReDim lags(numLags)
        'Populate lags'
        For i = 1 To numLags
            lags(i) = i
        Next
    ElseIf TypeName(lag) = "Range" Then
        'Horizontal or vertical lag'
        If lag.Rows.Count > lag.Columns.Count Then
            vertLag = True
            'Check that lag range matches output range in size'
            'Need to change this for maxLag'
            If lag.Rows.Count <> numLags Then numLags = lag.Rows.Count
            ReDim lags(numLags)
            maxLag = 0
            'Populate lags'
            For i = 1 To numLags
                lags(i) = lag(i, 1).Value
                If lags(i) > maxLag Then maxLag = lags(i)
            Next
        Else
            vertLag = False
            'Check that lag range matches output range in size'
            If lag.Columns.Count <> numLags Then numLags = lag.Columns.Count
            ReDim lags(numLags)
            maxLag = 0
            'Populate lags'
            For i = 1 To numLags
                lags(i) = lag(1, i).Value
                If lags(i) > maxLag Then maxLag = lags(i)
            Next
        End If
    Else
        'Should I check for array/single values?'
        ReDim lags(1)
        'Need to check if integer'
        lags(1) = lag
        'Set numLags to 1 just in case'
        numLags = 1
    End If
    
    'Check that diff is there'
    If IsMissing(diff) Then
        diff = 0
    End If
    
    'Fill data array x()'
    If diff > 0 Then
        For i = 0 To dataSize - 1
            If vertInput Then
                x(i) = dataRange(i + 1, 1).Value - dataRange(i + 2, 1).Value
            Else
                x(i) = dataRange(1, i + 1).Value - dataRange(1, i + 2).Value
            End If
        Next
    Else
        For i = 0 To dataSize
            If vertInput Then
                x(i) = dataRange(i + 1, 1).Value
            Else
                x(i) = dataRange(1, i + 1).Value
            End If
        Next
    End If
    
    If diff > 1 Then
        For k = 1 To diff
            For i = 0 To dataSize - 1 - k
                x(i) = x(i) - x(i + 1)
            Next
        Next
    End If
    
    'Redim for maxlag'
    ReDim p(maxLag)
    ReDim t(maxLag, maxLag)
    
    'Calculate autocorrelation'
    For i = 0 To maxLag
        sx = 0
        sy = 0
        s1 = 0
        s2 = 0
        s3 = 0
        For k = 0 To dataSize - i - diff
            sx = x(k) + sx
            sy = x(k + i) + sy
        Next
        sx = sx / (dataSize + 1 - i - diff)
        sy = sy / (dataSize + 1 - i - diff)
        
        For k = 0 To dataSize - i - diff
            s1 = s1 + (x(k) - sx) * (x(k + i) - sy)
            s2 = s2 + (x(k) - sx) ^ 2
            s3 = s3 + (x(k + i) - sy) ^ 2
        Next
        
        p(i) = s1 / Sqr(s2 * s3)
    Next
    
    'Set all ts to zero'
    For k = 1 To maxLag
        For j = 1 To maxLag
            t(j, k) = 0
        Next
    Next

    t(0, 0) = 1
    t(1, 1) = p(1)
    For k = 2 To maxLag
        'Calculate factors to take away from p(i)'
        totalt = 0
        For j = 1 To k - 1
            If k - 1 <> j And k - 2 > 0 Then
                t(k - 1, j) = t(k - 2, j) - t(k - 1, k - 1) * t(k - 2, k - 1 - j)
            End If
            totalt = totalt + t(k - 1, j) * p(k - j)
        Next
        t(k, k) = (p(k) - totalt) / (1 - totalt)
        
    Next
    
    For k = 1 To numLags
        If vertOutput Then
            output(k, 1) = t(lags(k), lags(k))
        Else
            output(1, k) = t(lags(k), lags(k))
        End If
    Next
    
    PAutoCor = output
    
End Function


Conclusion

I've added this function to my add-in which can be found on Sourceforge. Version 0.6.12 contains this macro.

As ever, all comments are welcome.

Wednesday, 29 January 2014

ONS UK Visitor Numbers

This is just a quick note to say that I've written an article about the UK visitor numbers released by the ONS last December.

I haven't put it here as I can't seem to get the D3 library to work nicely with blogger. I'm not sure yet what I'm doing wrong.

Anyway, it's on my new website at http://www.surveyscience.co.uk/articles_ONS.html.

Sunday, 22 December 2013

D3 - Simple Line Chart

There seem to be a lot of tutorials on the web about D3 (www.d3js.org). However I've yet to find one that explains,simply, how to create a line chart. They all seem to be overly complicated, don't explain exactly what is happening, or call functions, classes and variables by the same name.

So here's my attempt. It took me a good few hours to get this working, which is pretty poor given that what I achieved is extremely simple.