Stats, MR and Data: 2016

Monday, 14 March 2016

Asymmetric filters on time series data in JavaScript

I recently made a change to the ssci.smooth.filter() function in the ssci.js JavaScript library. Here is a lengthier explanation of the change.

One of the changes I’ve made is to add the ability to use asymmetric filters with this function. This is achieved via a setter function that defines the start and end index of the points to apply the filter with reference to the point being adjusted.

To give a concrete example, if we call the point being adjusted ‘n’ and the start is two points before this and the end is two points after then we would set this via:

var example = ssci.smooth.filter()
.data(data)
.filter([0.2,0.2,0.2,0.2,0.2])
.limits([-2,2]);

This is still a symmetric filter and is the default if you don’t use the limits() setter function given the filter used in the example.

However if you had quarterly data and wanted to take a moving average over the year you can now do this via the method used above. This time you will need to use:

var example = ssci.smooth.filter()
.data(data)
.filter([0.25,0.25,0.25,0.25])
.limits([-3,0]);

You can also difference the data using this method:

var example = ssci.smooth.filter()
.data(data)
.filter([-1,1])
.limits([-1,0]);

This is just taking the current point, multiplying it by 1 and subtracting the point before from it.

An explanation of the function can be found here. The source code is here.

Henderson Filters in JavaScript

I wrote in a recent post that I’d added a function to calculate Henderson filters to the ssci.js library. This post will expand on that (slightly).

To quote the Australian Bureau of Statistics:

Henderson filters were derived by Robert Henderson in 1916 for use in actuarial applications. They are trend filters, commonly used in time series analysis to smooth seasonally adjusted estimates in order to generate a trend estimate. They are used in preference to simpler moving averages because they can reproduce polynomials of up to degree 3, thereby capturing trend turning points.

The filters themselves have an odd number of terms and can be generated for sequences of 3 terms and more. To use the function contained within the JavaScript library you can just use:

ssci.smooth.henderson(term)

where term is the number of terms you want to return. These are returned as an array. So if term was 5 you would get:

[
-0.07342657342657342,
0.2937062937062937,
0.5594405594405595,
0.2937062937062937,
-0.07342657342657342
]

The equation to generate the filters was taken from here (pdf).

To actually filter a set of data this would be combined with the ssci.smooth.filter() function i.e.:

var ft = ssci.smooth.filter()
.filter(ssci.smooth.henderson(23))
.data(data);

An example is on my wordpress blog.

Rim Weighting Question

I recently had a question about rim weighting and how to set the values for the maximum iterations and the upper and lower weight caps.

I’ve reproduced my answer, though I’ve adjusted it slightly:

Maximum Iterations

The value to set here will depend largely on how many rims you have, how small the cells are and how close the actuals are to the targets. The only way to tell for sure is to see what difference it makes to the weights when you run the program again with one more iteration. If it makes no difference to the weights then you’re ok to leave it as it is. Non-convergence in this case will be down to either the rims having conflicting targets (i.e. one rim causes the weights to go up and another causes them to go down) or the weight cap bringing the weights back down (or up).
In terms of an actual value, 25 is generally ok for small number of rims (of the order of 5-20). However I’ve seen weighting schemes that required more than 200 iterations to converge. These had hundreds of interlaced rims.
Potentially I could add a metric to the program to check for a minimum weight change so that the program ends if all weights change by less than this figure. It would, of course, affect performance though and is not a trivial change.

Upper Weight Cap

A good starting point for this figure is to divide the actual proportions (or base sizes) by the targets for each cell and look at the largest. So, if for example, you had 20 percent males in the sample but the target was 45 percent and this was the biggest difference, then the biggest initial weight would be 0.45/0.2=>2.25. Given the way the algorithm works, it will not stay at that but it should be of that order. It will depend on the other rims.
One consequence of lowering the upper weight cap is that it will reduce the WEFF – the weighting efficiency. A higher WEFF means that you will have lower precision in your estimates i.e. it increases the standard error. However lowering the weight cap can also increase the number of iterations and also potentially lead to non-convergence.
I’d set a value that allows the procedure to converge and gives a reasonable WEFF. Generally a value of 5 or 6 is fine for proportional targets and a multiple of 5 or 6 above the total base size divided by the total number of panellists for base size targets (e.g. if there are 1000 panellists and a total base size of 4500, then set a value of 4.5*5=22.5).
A WEFF above 1.5 – 1.6 is high and is an indication of poor representation within the panel.

Lower Weight Cap

I’d leave this at 0 unless the WEFF needs to be lowered. A good indication of problems with the targets or with the panel is whether all the weights drop to near zero.

So, the basic answer to how to set them is that it depends on any lack of convergence and how high the WEFF goes. The above should give some indication of where to set them though.

Thanks to Bryan for the question.

Friday, 26 February 2016

Excel add-ins

I've finally got back to updating these and putting them on the website. I'm currently working on the time series one and hope to have that done soon.

In the meantime, I have updated the page for the text add-in to give a better idea of what is in it.

ssci.js version 1.2.2

I've updated to version 1.2.2.

The changes are:

change the ssci.smooth.filter() function to allow asymmetric filters;
add the ability to create symmetric henderson filters.

Wednesday, 24 February 2016

ssci.js is on GitHub

I put the ssci.js library on GitHub a while ago but forgot to say anything.

Anyway it’s here should you wish to take a look.

I’ll be using it in more entries (on my wordpress blog) to highlight what it can do.

ssci.js version 1.2.1

I’ve updated the JavaScript library to version 1.2.1. The only changes from v1.2.0 is the addition of a gain function and phase shift function for the filter smoothing function.

Change to Kernel Smoother

Introduction

I recently changed a whole load of the functions in the ssci JavaScript library. One of these changes was the way that the kernel smoothing function worked.

The previous function was fairly slow and didn’t scale particularly well. In fact, the main loop of the function would suggest that it scales as O(n^2).

I therefore decided to make a change to the function. Instead of looping over every point and then calculating the weight at every other point, I’ve changed it so that:

It loops through every point similarly to the previous function
Then for point n it calculates the weight at the central point (i.e. point n)
It then loops through every point lower (in the array) than n and calculates the weight. If the weight of this point is lower than a certain threshold to the central point, then the loop ends.
It then loops through every point higher (in the array) than n and calculates the weight. If the weight of this point is lower than a certain threshold to the central point, then the loop ends.

The default setting of the threshold is 0.001 (i.e. 0.1%). The way the function operates though does mean two assumptions have been made.

The data has already been ordered in the x-coordinate within the array.
The kernel function being used must always decrease from the central point beyond this threshold.

The rest of this entry can be read on my new blog...

Tuesday, 23 February 2016

SurveyScience JavaScript Library

I’ve been converting some of the functions from the Excel add-in to Javascript functions within a library. It’s finally ready for release and can be found here.

It contains functions to:

Smooth data
Deseasonalise data
Perform least squares regression
Exponential smoothing
Create auto-correlation and partial auto-correlation plots
Market research functions
Utility functions to modify arrays

There’s still some work to do on it but it can be used as is – go to the above link for more details.

I originally started it to created some smoothed lines for some D3 charts.

Hopefully it will prove useful.

Website refreshed

In the process of reprogramming the Excel add-in, I’ve also refreshed my website.

Everything bar the articles have been changed, I think.

I’ve also released a JavaScript version of some of the functions in the add-in. I’ll post about it fully later on but if you want to take a look go here.