Update - 09/12/2015
I've updated the macro and put it into one Excel add-in. It's now available here.
Introduction
In a previous post I wrote about two methods to demographically weight a market research sample to make the sample represent a population. For example, if you had a sample of 1000 people to measure the average height of a certain country, you might want to weight the sample by age and gender so that the sample represented the population in those two variables (dimensions, fields) exactly.In this post I'm going to detail a macro that can be used to rim weight a sample.
The algorithm for rim weighting is very simple. Graham Kalton and Ismael Flores-Cervantes give a very good explanation of rim weighting (and other demographic weighting methods) in an article for the Journal of Official Statistics. This website is well worth a look for other articles on running panels, weighting schemes and, it seems, everything else to do with statistics. Anyway, I give a short explanation below.
To give a concrete example and help with the explanation I'll use the example given in the previous blog entry.
Gender | |||
Age | Male | Female | Total |
Young | 30% | 20% | 50% |
Old | 25% | 25% | 50% |
Total | 55% | 45% | 100% |
The above table shows the population proportions for age and gender. Let's assume the sample proportions are as below:
Gender | |||
Age | Male | Female | Total |
Young | 20% | 35% | 55% |
Old | 20% | 25% | 45% |
Total | 40% | 60% | 100% |
The totals are called the marginal distributions or the rims.
Rim weighting is an iterative process. The first dimension's (in this case age) totals are used to calculate a weight such that the sample totals (40% and 60%) equal the population totals (55% and 45%). This weight is then adjusted so that the next dimension's (gender) sample totals equal the population totals for that dimension. This is one iteration.
The process is repeated until either all the sample totals equal the population totals or the number of iterations exceeds a predefined maximum.
Macro
Below is the (fairly complicated) macro. It assumes that you want to match population proportions and that the input data is of the form rim1, targets1, rim2, targets2.
---------------------------------------------------------------------------------
Sub rimOrdered()
'Rim weight with respect to ordering of rim
Dim totalRange As range
Dim rimRange() As range
Dim tarRange() As range
Dim weights() As Double
Dim factors() As Double
Dim iterations As Integer
Dim m As Integer
Dim weightCap As Double 'Upper limit for weight
Dim weightLow As Double 'Lower limit for weight
Dim convCrit As Double 'Convergence criterion
Dim testBoo As Boolean 'Boolean for convergence criteria
Dim totalRim As Double
Dim numberOfRims As Integer
Dim names() As String 'Name of rim
Dim colNumber() As Integer 'Column number of rim
Dim weff As Double 'Weighting effect
Dim sqWeights As Double
Dim currWkName As String
Dim rimIter As Integer
Dim tempCheck As Double
Dim startTime
Dim i, k, n As Integer
Dim rimw() As Dictionary
Dim tarw() As Dictionary
Dim actuals() As Dictionary
RimForm2.Show
If RimForm2.TextBox1.Enabled = False Then
RimForm2.Hide
Unload RimForm2
Exit Sub
End If
Set totalRange = range(RimForm2.RefEdit1.Value)
Application.ScreenUpdating = False
Application.Calculation = xlManual
startTime = Time
numberOfRims = RimForm2.ListBox2.ListCount
iterations = RimForm2.TextBox1.Value
weightCap = RimForm2.TextBox2.Value
weightLow = RimForm2.TextBox5.Value
convCrit = RimForm2.TextBox3.Value
currWkName = ActiveSheet.Name
'Calculate number of rows
totalRim = totalRange.Rows.Count - 1
'Redimension arrays
ReDim rimw(numberOfRims)
ReDim tarw(numberOfRims)
ReDim actuals(numberOfRims)
For n = 0 To numberOfRims - 1
Set rimw(n) = New Dictionary
Set tarw(n) = New Dictionary
Set actuals(n) = New Dictionary
Next
ReDim rimRange(numberOfRims)
ReDim tarRange(numberOfRims)
ReDim factors(numberOfRims, totalRange.Rows.Count - 1)
ReDim weights(totalRange.Rows.Count - 1)
ReDim names(numberOfRims)
ReDim colNumber(numberOfRims)
'Define area of rim ranges
For i = 0 To RimForm2.ListBox2.ListCount - 1
names(i) = RimForm2.ListBox2.List(i)
For k = 1 To totalRange.Columns.Count
If totalRange.Cells(1, k).Value = names(i) Then
colNumber(i) = k
End If
Next
Set rimRange(i) = range(Cells(2, colNumber(i)), Cells(totalRange.Rows.Count, colNumber(i)))
Set tarRange(i) = range(Cells(2, colNumber(i) + 1), Cells(totalRange.Rows.Count, colNumber(i) + 1))
Next
'Calculate factors for each of the rims
'Set up hashes
For n = 0 To numberOfRims - 1
For Each i In rimRange(n).Rows
If rimw(n).Exists(i.Value) Then
rimw(n).Item(i.Value) = rimw(n).Item(i.Value) + 1
actuals(n).Item(i.Value) = actuals(n).Item(i.Value) + 1
Else
rimw(n).Add i.Value, 1
actuals(n).Add i.Value, 1
End If
Next
Next
For n = 0 To numberOfRims - 1
For i = 1 To tarRange(n).Rows.Count
If Not tarw(n).Exists(rimRange(n)(i).Value) Then
tarw(n).Add rimRange(n)(i).Value, tarRange(n)(i).Value
End If
'Test that value already in target is the same for this cell
'i.e. that targets are consistent
Next
'Test that tarw(n) adds up to 1
tempCheck = 0
For Each i In tarw(n).Keys
'Test for numeric values
If IsNumeric(tarw(n).Item(i)) Then
'If a non-numeric value is found then abort and display error message
tempCheck = tempCheck + (10 ^ CInt(RimForm2.TextBox4.Value)) * tarw(n).Item(i)
Else
MsgBox ("Rim " & n + 1 & ", Key " & i & " is not numeric")
Application.ScreenUpdating = True
Application.Calculation = xlAutomatic
Unload RimForm2
'Application.StatusBar = "Rim weighting macro aborted"
Exit Sub
End If
Next
'If they don't display sum of targets and rim affected
If Int(tempCheck) <> (10 ^ CInt(RimForm2.TextBox4.Value)) And RimForm2.CheckBox1.Value = True Then
contvar = MsgBox("Rim " & n + 1 & " targets do not add to 100%" & vbCrLf & "They add to: " & tempCheck / (10 ^ (CInt(RimForm2.TextBox4.Value)) / 100) & "%" & vbCrLf & vbCrLf & "Do you wish to continue?", vbYesNo)
If contvar = vbNo Then
Application.ScreenUpdating = True
Application.Calculation = xlAutomatic
Unload RimForm2
'Application.StatusBar = "Rim weighting macro aborted"
Exit Sub
End If
End If
Next
'Set up initial weights
For i = 0 To totalRange.Rows.Count - 1
weights(i) = 1
Next
'Calculate factor and iterate round the loop
'Factors
m = 0
Do
m = m + 1
For n = 0 To numberOfRims - 1
For i = 0 To totalRange.Rows.Count - 2
weights(i) = weights(i) * totalRim * tarw(n)(rimRange(n)(i + 1).Value) / rimw(n)(rimRange(n)(i + 1).Value)
If weights(i) > weightCap Then weights(i) = weightCap
If weights(i) < weightLow Then weights(i) = weightLow
Next
'Reset hashes
rimIter = n + 1
If rimIter = numberOfRims Then rimIter = 0
For Each i In rimw(rimIter).Keys
rimw(rimIter).Item(i) = 0
Next
totalRim = 0
'Recalculate values in hashes
j = 0
For Each v In rimRange(rimIter).Rows
rimw(rimIter).Item(v.Value) = rimw(rimIter).Item(v.Value) + weights(j)
'Recalculate total sum of weights
totalRim = totalRim + weights(j)
j = j + 1
Next
Next
'Test convergence criterion
testBoo = True
For n = 0 To numberOfRims - 1
For Each i In rimw(n).Keys
If rimw(n).Item(i) = 0 Then
rimw(n).Item(i) = 0.0001
End If
If Abs(totalRim * tarw(n).Item(i) / rimw(n).Item(i) - 1) > convCrit Then
testBoo = False
End If
Next
Next
Loop Until m >= iterations Or testBoo
sqWeights = 0
'Display weights
Cells(1, totalRange.Columns.Count + 1).Value = "Weights"
For i = 2 To totalRange.Rows.Count
Cells(i, totalRange.Columns.Count + 1).Value = weights(i - 2)
'Calculate WEFF
sqWeights = sqWeights + (weights(i - 2) * weights(i - 2))
Next
'Produce report
Worksheets.Add
Set sname = ActiveSheet
'Need to check that this doesn't already exist in the spreadsheet
Dim wkshtName As String
Dim wkshtThere As Boolean
Dim wkshtCount As Integer
wkshtName = currWkName & " Report"
wkshtCount = 1
Do
wkshtThere = False
For wksht = 1 To ActiveWorkbook.Sheets.Count
If ActiveWorkbook.Sheets(wksht).Name = wkshtName Then
wkshtThere = True
End If
Next
If wkshtThere = True Then
wkshtCount = wkshtCount + 1
wkshtName = currWkName & " Report" & wkshtCount
End If
Loop Until wkshtThere = False
sname.Name = wkshtName
If totalRim = 0 Then totalRim = 0.00001
Cells(1, 1).Value = "Number of iterations"
Cells(1, 2).Value = m
Cells(2, 1).Value = "WEFF"
Cells(2, 2).Value = (totalRange.Rows.Count - 1) * sqWeights / totalRim ^ 2
Cells(3, 1).Value = "Time taken"
stopTime = Time
Cells(3, 2).Value = Format((stopTime - startTime) * 24 * 60 * 60, "00")
Cells(1, 4).Value = "Ordered weighting"
For n = 0 To numberOfRims - 1
Cells(4, (n + 1) * 6 - 5).Value = names(n)
p = 6
Cells(5, (n + 1) * 6 - 4).Value = "Actual"
Cells(5, (n + 1) * 6 - 3).Value = "Weighted"
Cells(5, (n + 1) * 6 - 2).Value = "Targets"
Cells(5, (n + 1) * 6 - 1).Value = "Difference"
For Each i In rimw(n).Keys
Cells(p, (n + 1) * 6 - 5).NumberFormat = "@"
Cells(p, (n + 1) * 6 - 5).Value = i
Cells(p, (n + 1) * 6 - 4).Value = actuals(n).Item(i) / (totalRange.Rows.Count - 1)
Cells(p, (n + 1) * 6 - 4).NumberFormat = "0.00%"
Cells(p, (n + 1) * 6 - 3).Value = rimw(n).Item(i) / totalRim
Cells(p, (n + 1) * 6 - 3).NumberFormat = "0.00%"
Cells(p, (n + 1) * 6 - 2).Value = tarw(n).Item(i)
Cells(p, (n + 1) * 6 - 2).NumberFormat = "0.00%"
Cells(p, (n + 1) * 6 - 1).Value = (totalRim * tarw(n).Item(i) / rimw(n).Item(i)) - 1
Cells(p, (n + 1) * 6 - 1).NumberFormat = "0.00%"
p = p + 1
Next
'Sort bits
range(Cells(6, (n + 1) * 6 - 5), Cells(p - 1, (n + 1) * 6 - 1)).Sort Key1:=range(Cells(6, (n + 1) * 6 - 5), Cells(p - 1, (n + 1) * 6 - 1)).Columns(1), Header:=xlNo
Next
Dim freqDist() As Integer
ReDim freqDist((weightCap - weightLow) * 10)
For i = 2 To totalRange.Rows.Count
freqDist(WorksheetFunction.Floor(weights(i - 2), 0.1) * 10) = freqDist(WorksheetFunction.Floor(weights(i - 2), 0.1) * 10) + 1
Next
For i = weightLow To weightCap Step 0.1
Cells(i * 10 + 3, numberOfRims * 6 + 1).Value = i
Cells(i * 10 + 3, numberOfRims * 6 + 2).Value = freqDist(i * 10)
Next
Application.ScreenUpdating = True
Application.Calculation = xlAutomatic
Unload RimForm2
End Sub
---------------------------------------------------------------------------------
What's missing here is the form required to specify the target area, the dimensions and the criteria for running the procedure. I'll add a link to the form when I think of somewhere to put it. The image below shows the listboxes defining the rims and targets:
From these the dictionary objects rimw and tarw are populated.
The next image shows the variables tab:
This tab allows you to define the maximum number of iterations and the weight caps.
The meat of the macro is shaded in grey. This is where the calculation of the weights actually happens.
Also the macro produces a report of the weighting giving a breakdown of how successful it was by each of the rims.
For example, the following input:
ID | Rim1 | Target1 | Rim2 | Target2 |
1 | A | 0.75 | C | 0.5 |
2 | A | 0.75 | C | 0.5 |
3 | A | 0.75 | D | 0.5 |
4 | B | 0.25 | D | 0.5 |
5 | B | 0.25 | D | 0.5 |
produces:
Number of iterations | 1 | |||||||||
WEFF | 1.09375 | |||||||||
Time taken | 0 | |||||||||
Rim1 | Rim2 | |||||||||
Actual | Weighted | Targets | Difference | Actual | Weighted | Targets | Difference | |||
A | 60.00% | 75.00% | 75.00% | 0.00% | C | 40.00% | 50.00% | 50.00% | 0.00% | |
B | 40.00% | 25.00% | 25.00% | 0.00% | D | 60.00% | 50.00% | 50.00% | 0.00% |
in the report.
Weights are placed in the column after the area specified in the first tab.
Conclusion
The macro above will calculate rim weights fairly rapidly for any number of dimensions/fields. You can specify convergence criteria, weight caps and the number of iterations.
Hopefully this will help somebody and all comments welcome - especially if they are suggestions for improvements.
In my next entry I'll explain what the WEFF is and how it affects the accuracy of the estimation.
Firstly nice blog. I am playing around with the macro and just wanted to know if you have had a chance to add a link to the form?
ReplyDeleteAh, yes. Forgot about that.
DeleteI've been planning to add all the macros into one add-in and place that on a website. It's taken quite a long time to get them all updated and more robust, but I think that it'll only be a couple of weeks until it's ready. I'll then update the post with a link to the website I'm using. Sorry for the delay.
I've put the add-in on www.sciolist.weebly.com. I know it's not the form but I will add that later.
DeleteI've developed a macro as well that also allows for trimming weights, reports, and weighting multiple subsets at once. I'm busy converting it to a web application at www.spinnakerresearch.nl. We're still in beta so you can still get a free account.
ReplyDeleteFirst of all thank you for sharing this macro. It's very useful. Unfortunately I have to switch to English region settings in order to use it, since the macro breaks when decimals are indicated with commas instead of dots. Is it possible to fix this by altering the code? Thx.
ReplyDeleteGood question. You would think that it wouldn't matter to Excel how a number was specified so I'm not sure what's causing it to break. I'll have a look into it and get back to you.
DeleteHi thanks for sending this. I have been spending lots of time on this with formulas. Please can you repost the macro with all the fixes? I went to the websites above and could not access the updated macro. Thanks in advance
ReplyDeleteHi, sorry about that. I've put the macro on http://www.surveyscience.co.uk/rim_weighting.html
DeleteThis is the updated version and a post with instructions is here: http://statsciolist.blogspot.co.uk/2014/10/new-rim-weighting-excel-macro.html
I've changed the macro slightly but the change is a fairly minor one to the form. Hopefully it should all be ok. I'll write some updated instructions and put them on the site soon.