Saturday 22 June 2013

Rim Weighting Excel Macro

Update - 09/12/2015

I've updated the macro and put it into one Excel add-in. It's now available here.

Introduction

In a previous post I wrote about two methods to demographically weight a market research sample to make the sample represent a population. For example, if you had a sample of 1000 people to measure the average height of a certain country, you might want to weight the sample by age and gender so that the sample represented the population in those two variables (dimensions, fields) exactly.

In this post I'm going to detail a macro that can be used to rim weight a sample.

The algorithm for rim weighting is very simple. Graham Kalton and Ismael Flores-Cervantes give a very good explanation of rim weighting (and other demographic weighting methods) in an article for the Journal of Official Statistics. This website is well worth a look for other articles on running panels, weighting schemes and, it seems, everything else to do with statistics. Anyway, I give a short explanation below.

To give a concrete example and help with the explanation I'll use the example given in the previous blog entry.
Gender
Age Male Female Total
Young 30% 20% 50%
Old 25% 25% 50%
Total 55% 45% 100%

The above table shows the population proportions for age and gender. Let's assume the sample proportions are as below:
Gender
Age Male Female Total
Young 20% 35% 55%
Old 20% 25% 45%
Total 40% 60% 100%


The totals are called the marginal distributions or the rims.

Rim weighting is an iterative process. The first dimension's (in this case age) totals are used to calculate a weight such that the sample totals (40% and 60%) equal the population totals (55% and 45%). This weight is then adjusted so that the next dimension's (gender) sample totals equal the population totals for that dimension. This is one iteration.

The process is repeated until either all the sample totals equal the population totals or the number of iterations exceeds a predefined maximum.

Macro

Below is the (fairly complicated) macro. It assumes that you want to match population proportions and that the input data is of the form rim1, targets1, rim2, targets2.

---------------------------------------------------------------------------------
Sub rimOrdered()
    'Rim weight with respect to ordering of rim

    Dim totalRange As range
    Dim rimRange() As range
    Dim tarRange() As range
    Dim weights() As Double
    Dim factors() As Double
    Dim iterations As Integer
    Dim m As Integer
    Dim weightCap As Double         'Upper limit for weight
    Dim weightLow As Double         'Lower limit for weight
    Dim convCrit As Double          'Convergence criterion
    Dim testBoo As Boolean          'Boolean for convergence criteria
    Dim totalRim As Double
    Dim numberOfRims As Integer
    Dim names() As String           'Name of rim
    Dim colNumber() As Integer      'Column number of rim
    Dim weff As Double              'Weighting effect
    Dim sqWeights As Double
    Dim currWkName As String
    Dim rimIter As Integer
    Dim tempCheck As Double
    Dim startTime
    Dim i, k, n As Integer
    
    Dim rimw() As Dictionary
    Dim tarw() As Dictionary
    Dim actuals() As Dictionary

    RimForm2.Show
    
    If RimForm2.TextBox1.Enabled = False Then
        RimForm2.Hide
        Unload RimForm2
        Exit Sub
    End If
    
    Set totalRange = range(RimForm2.RefEdit1.Value)
    
    Application.ScreenUpdating = False
    Application.Calculation = xlManual
    startTime = Time
    
    numberOfRims = RimForm2.ListBox2.ListCount
    
    iterations = RimForm2.TextBox1.Value
    weightCap = RimForm2.TextBox2.Value
    weightLow = RimForm2.TextBox5.Value
    convCrit = RimForm2.TextBox3.Value
    currWkName = ActiveSheet.Name
    
    'Calculate number of rows
    totalRim = totalRange.Rows.Count - 1
    
    'Redimension arrays
    ReDim rimw(numberOfRims)
    ReDim tarw(numberOfRims)
    ReDim actuals(numberOfRims)
    For n = 0 To numberOfRims - 1
        Set rimw(n) = New Dictionary
        Set tarw(n) = New Dictionary
        Set actuals(n) = New Dictionary
    Next
    ReDim rimRange(numberOfRims)
    ReDim tarRange(numberOfRims)
    ReDim factors(numberOfRims, totalRange.Rows.Count - 1)
    ReDim weights(totalRange.Rows.Count - 1)
    ReDim names(numberOfRims)
    ReDim colNumber(numberOfRims)
    
    'Define area of rim ranges
    For i = 0 To RimForm2.ListBox2.ListCount - 1
        names(i) = RimForm2.ListBox2.List(i)
        For k = 1 To totalRange.Columns.Count
            If totalRange.Cells(1, k).Value = names(i) Then
                colNumber(i) = k
            End If
        Next
        Set rimRange(i) = range(Cells(2, colNumber(i)), Cells(totalRange.Rows.Count, colNumber(i)))
        Set tarRange(i) = range(Cells(2, colNumber(i) + 1), Cells(totalRange.Rows.Count, colNumber(i) + 1))
    Next
    
    'Calculate factors for each of the rims
    'Set up hashes
    For n = 0 To numberOfRims - 1
        For Each i In rimRange(n).Rows
            If rimw(n).Exists(i.Value) Then
                rimw(n).Item(i.Value) = rimw(n).Item(i.Value) + 1
                actuals(n).Item(i.Value) = actuals(n).Item(i.Value) + 1
            Else
                rimw(n).Add i.Value, 1
                actuals(n).Add i.Value, 1
            End If
        Next
    Next
    For n = 0 To numberOfRims - 1
        For i = 1 To tarRange(n).Rows.Count
            If Not tarw(n).Exists(rimRange(n)(i).Value) Then
                tarw(n).Add rimRange(n)(i).Value, tarRange(n)(i).Value
            End If
            'Test that value already in target is the same for this cell
            'i.e. that targets are consistent
        Next
        'Test that tarw(n) adds up to 1
        tempCheck = 0
        For Each i In tarw(n).Keys
            'Test for numeric values
            If IsNumeric(tarw(n).Item(i)) Then
                'If a non-numeric value is found then abort and display error message
                tempCheck = tempCheck + (10 ^ CInt(RimForm2.TextBox4.Value)) * tarw(n).Item(i)
            Else
                MsgBox ("Rim " & n + 1 & ", Key " & i & " is not numeric")
                Application.ScreenUpdating = True
                Application.Calculation = xlAutomatic
                Unload RimForm2
                'Application.StatusBar = "Rim weighting macro aborted"
                Exit Sub
            End If
        Next
        'If they don't display sum of targets and rim affected
        If Int(tempCheck) <> (10 ^ CInt(RimForm2.TextBox4.Value)) And RimForm2.CheckBox1.Value = True Then
            contvar = MsgBox("Rim " & n + 1 & " targets do not add to 100%" & vbCrLf & "They add to: " & tempCheck / (10 ^ (CInt(RimForm2.TextBox4.Value)) / 100) & "%" & vbCrLf & vbCrLf & "Do you wish to continue?", vbYesNo)
            If contvar = vbNo Then
                Application.ScreenUpdating = True
                Application.Calculation = xlAutomatic
                Unload RimForm2
                'Application.StatusBar = "Rim weighting macro aborted"
                Exit Sub
            End If
        End If
    Next
    
    'Set up initial weights
    For i = 0 To totalRange.Rows.Count - 1
        weights(i) = 1
    Next
    
    'Calculate factor and iterate round the loop
    'Factors
    m = 0
    Do
        m = m + 1
        For n = 0 To numberOfRims - 1
            For i = 0 To totalRange.Rows.Count - 2
                weights(i) = weights(i) * totalRim * tarw(n)(rimRange(n)(i + 1).Value) / rimw(n)(rimRange(n)(i + 1).Value)
                If weights(i) > weightCap Then weights(i) = weightCap
                If weights(i) < weightLow Then weights(i) = weightLow
            Next
            
            'Reset hashes
            rimIter = n + 1
            If rimIter = numberOfRims Then rimIter = 0
            For Each i In rimw(rimIter).Keys
                rimw(rimIter).Item(i) = 0
            Next
            totalRim = 0
            
            'Recalculate values in hashes
            j = 0
            
            For Each v In rimRange(rimIter).Rows
                rimw(rimIter).Item(v.Value) = rimw(rimIter).Item(v.Value) + weights(j)
                'Recalculate total sum of weights
                totalRim = totalRim + weights(j)
                j = j + 1
            Next
        Next
        
        'Test convergence criterion
        testBoo = True
        For n = 0 To numberOfRims - 1
            For Each i In rimw(n).Keys
                If rimw(n).Item(i) = 0 Then
                    rimw(n).Item(i) = 0.0001
                End If
                If Abs(totalRim * tarw(n).Item(i) / rimw(n).Item(i) - 1) > convCrit Then
                    testBoo = False
                End If
            Next
        Next

    Loop Until m >= iterations Or testBoo
    
    sqWeights = 0
    'Display weights
    Cells(1, totalRange.Columns.Count + 1).Value = "Weights"
    For i = 2 To totalRange.Rows.Count
        Cells(i, totalRange.Columns.Count + 1).Value = weights(i - 2)
        'Calculate WEFF
        sqWeights = sqWeights + (weights(i - 2) * weights(i - 2))
    Next
    
    'Produce report
    Worksheets.Add
    Set sname = ActiveSheet
    'Need to check that this doesn't already exist in the spreadsheet
    Dim wkshtName As String
    Dim wkshtThere As Boolean
    Dim wkshtCount As Integer
    wkshtName = currWkName & " Report"
    wkshtCount = 1
    Do
        wkshtThere = False
        For wksht = 1 To ActiveWorkbook.Sheets.Count
            If ActiveWorkbook.Sheets(wksht).Name = wkshtName Then
                wkshtThere = True
            End If
        Next
        If wkshtThere = True Then
            wkshtCount = wkshtCount + 1
            wkshtName = currWkName & " Report" & wkshtCount
        End If
    Loop Until wkshtThere = False
    sname.Name = wkshtName
    
    If totalRim = 0 Then totalRim = 0.00001
    
    Cells(1, 1).Value = "Number of iterations"
    Cells(1, 2).Value = m
    Cells(2, 1).Value = "WEFF"
    Cells(2, 2).Value = (totalRange.Rows.Count - 1) * sqWeights / totalRim ^ 2
    Cells(3, 1).Value = "Time taken"
    stopTime = Time
    Cells(3, 2).Value = Format((stopTime - startTime) * 24 * 60 * 60, "00")
    Cells(1, 4).Value = "Ordered weighting"

    For n = 0 To numberOfRims - 1
        Cells(4, (n + 1) * 6 - 5).Value = names(n)
        p = 6
        Cells(5, (n + 1) * 6 - 4).Value = "Actual"
        Cells(5, (n + 1) * 6 - 3).Value = "Weighted"
        Cells(5, (n + 1) * 6 - 2).Value = "Targets"
        Cells(5, (n + 1) * 6 - 1).Value = "Difference"
        For Each i In rimw(n).Keys
            Cells(p, (n + 1) * 6 - 5).NumberFormat = "@"
            Cells(p, (n + 1) * 6 - 5).Value = i
            Cells(p, (n + 1) * 6 - 4).Value = actuals(n).Item(i) / (totalRange.Rows.Count - 1)
            Cells(p, (n + 1) * 6 - 4).NumberFormat = "0.00%"
            Cells(p, (n + 1) * 6 - 3).Value = rimw(n).Item(i) / totalRim
            Cells(p, (n + 1) * 6 - 3).NumberFormat = "0.00%"
            Cells(p, (n + 1) * 6 - 2).Value = tarw(n).Item(i)
            Cells(p, (n + 1) * 6 - 2).NumberFormat = "0.00%"
            Cells(p, (n + 1) * 6 - 1).Value = (totalRim * tarw(n).Item(i) / rimw(n).Item(i)) - 1
            Cells(p, (n + 1) * 6 - 1).NumberFormat = "0.00%"
            p = p + 1
        Next
        'Sort bits
        range(Cells(6, (n + 1) * 6 - 5), Cells(p - 1, (n + 1) * 6 - 1)).Sort Key1:=range(Cells(6, (n + 1) * 6 - 5), Cells(p - 1, (n + 1) * 6 - 1)).Columns(1), Header:=xlNo
    Next
    
    Dim freqDist() As Integer
    ReDim freqDist((weightCap - weightLow) * 10)
    For i = 2 To totalRange.Rows.Count
        freqDist(WorksheetFunction.Floor(weights(i - 2), 0.1) * 10) = freqDist(WorksheetFunction.Floor(weights(i - 2), 0.1) * 10) + 1
    Next
    
    For i = weightLow To weightCap Step 0.1
        Cells(i * 10 + 3, numberOfRims * 6 + 1).Value = i
        Cells(i * 10 + 3, numberOfRims * 6 + 2).Value = freqDist(i * 10)
    Next
    
    Application.ScreenUpdating = True
    Application.Calculation = xlAutomatic
    Unload RimForm2

End Sub
---------------------------------------------------------------------------------

What's missing here is the form required to specify the target area, the dimensions and the criteria for running the procedure. I'll add a link to the form when I think of somewhere to put it. The image below shows the listboxes defining the rims and targets:

From these the dictionary objects rimw and tarw are populated.

The next image shows the variables tab:

This tab allows you to define the maximum number of iterations and the weight caps.

The meat of the macro is shaded in grey. This is where the calculation of the weights actually happens.

Also the macro produces a report of the weighting giving a breakdown of how successful it was by each of the rims.

For example, the following input:
ID Rim1 Target1 Rim2 Target2
1 A 0.75 C 0.5
2 A 0.75 C 0.5
3 A 0.75 D 0.5
4 B 0.25 D 0.5
5 B 0.25 D 0.5

produces:
Number of iterations 1
WEFF 1.09375
Time taken 0
Rim1 Rim2
Actual Weighted Targets Difference Actual Weighted Targets Difference
A 60.00% 75.00% 75.00% 0.00% C 40.00% 50.00% 50.00% 0.00%
B 40.00% 25.00% 25.00% 0.00% D 60.00% 50.00% 50.00% 0.00%
in the report.

Weights are placed in the column after the area specified in the first tab.

Conclusion

The macro above will calculate rim weights fairly rapidly for any number of dimensions/fields. You can specify convergence criteria, weight caps and the number of iterations.

Hopefully this will help somebody and all comments welcome - especially if they are suggestions for improvements.

In my next entry I'll explain what the WEFF is and how it affects the accuracy of the estimation.

8 comments:

  1. Firstly nice blog. I am playing around with the macro and just wanted to know if you have had a chance to add a link to the form?

    ReplyDelete
    Replies
    1. Ah, yes. Forgot about that.

      I've been planning to add all the macros into one add-in and place that on a website. It's taken quite a long time to get them all updated and more robust, but I think that it'll only be a couple of weeks until it's ready. I'll then update the post with a link to the website I'm using. Sorry for the delay.

      Delete
    2. I've put the add-in on www.sciolist.weebly.com. I know it's not the form but I will add that later.

      Delete
  2. I've developed a macro as well that also allows for trimming weights, reports, and weighting multiple subsets at once. I'm busy converting it to a web application at www.spinnakerresearch.nl. We're still in beta so you can still get a free account.

    ReplyDelete
  3. First of all thank you for sharing this macro. It's very useful. Unfortunately I have to switch to English region settings in order to use it, since the macro breaks when decimals are indicated with commas instead of dots. Is it possible to fix this by altering the code? Thx.

    ReplyDelete
    Replies
    1. Good question. You would think that it wouldn't matter to Excel how a number was specified so I'm not sure what's causing it to break. I'll have a look into it and get back to you.

      Delete
  4. Hi thanks for sending this. I have been spending lots of time on this with formulas. Please can you repost the macro with all the fixes? I went to the websites above and could not access the updated macro. Thanks in advance

    ReplyDelete
    Replies
    1. Hi, sorry about that. I've put the macro on http://www.surveyscience.co.uk/rim_weighting.html

      This is the updated version and a post with instructions is here: http://statsciolist.blogspot.co.uk/2014/10/new-rim-weighting-excel-macro.html

      I've changed the macro slightly but the change is a fairly minor one to the form. Hopefully it should all be ok. I'll write some updated instructions and put them on the site soon.

      Delete