Stats, MR and Data: WEFF - determining the loss of precision after demographically weighting a sample

Introduction

To improve the estimate of a measurement from a sample, the sample is usually demographically weighted to the population being measured. One example would be using a sample to measure the sales of cars in a country. You may wish the sample to represent the population of the country in question but what variables do you want to control for/to? What effect does increasing the number of variables you weight by have on the data? Sometimes you may be better off not weighting to a variable if it has little to no effect on the accuracy of the estimate but a large effect on the precision.

The WEFF

One way of measuring the effect of the weights and the weighting structure on the precision of the estimate is to use what is called the WEFF (sometimes just called F - see the Journal of Official Statistics, volume 19, No. 2, 2003 pp 81-97 for more details).

The WEFF is defined as:

Which is just one plus the population standard deviation of the weights divided by the mean of the weights. An alternative formulation is:

where n is the number of weights and x is the weight variable.

The loss in precision is determined by dividing the sample size by the WEFF. Therefore a small value is desirable.

Caveat

Bear in mind that having a low WEFF does not mean you have a good, representative sample. It merely means that there is a small loss of precision given the variables that you are weighting to. These variables should be chosen with care.

For example, It's relatively hard to recruit 16 to 17 year olds to a sample. Does this mean that you should then adjust the weighting structure so that you are weighting to 16 to 25 year olds rather than splitting the cells to 16 to 17 and 18 to 25 year olds? Does it make any estimate more precise because a lower WEFF ensues? Of course not. Certainly in the UK, due to legal restrictions as much as anything else, the behaviour of the 16 to 17 year old demographic will be hugely different to the older group. The estimate will not be more accurate.

A VBA function

There follows a function to calculate the WEFF in Excel.

---------------------------------------------------------------------------------

Public Function weff(wgts As range) As Double

Dim x As Integer

Dim y As Integer

Dim i, k As Integer

Dim sumwgt As Double

Dim sumwgtsq As Double

Dim cntwgt As Integer

x = wgts.Columns.Count

y = wgts.Rows.Count

cntwgt = 0

sumwgt = 0

sumwgtsq = 0

For i = 1 To y

For k = 1 To x

cntwgt = cntwgt + 1

sumwgt = sumwgt + wgts.Cells(i, k).Value

sumwgtsq = sumwgtsq + wgts.Cells(i, k).Value * wgts.Cells(i, k).Value

weff = cntwgt * sumwgtsq / (sumwgt ^ 2)

End Function

---------------------------------------------------------------------------------

Conclusion

The WEFF can be a useful indicator of how biased a sample can be given the variables that you weight to. Care should be taken not to read too much into it though. It only tells you the loss in precision for the variables that you have included.

Stats, MR and Data

Monday, 24 June 2013

WEFF - determining the loss of precision after demographically weighting a sample

Introduction

The WEFF

Caveat

A VBA function

Conclusion

No comments:

Post a Comment