Monday, 3 August 2015

London Land Use 2005

I've put another article on my website about London land use statistics.

I've used R, R markdown, ggplot2, d3heatmap and maptools to plot the data. This does unfortunately mean that it can't be easily posted here and that it's a fairly sizeable page (about 1.4Mb when created in R Studio).

Here are some pictures from the page:



Anyway, see what you think.

Wednesday, 3 June 2015

Rim weighting web tool

Another rim weighting tool. This time I've stuck it on my site and programmed it in JavaScript.

It uses the same conventions as the Excel tool and you will need:

  1. A tab delimited file holding the demographics. It will need to have a header labelling each of the columns of data.
  2. A tab delimited file holding the targets. This will need to have a header and 3 columns with the rim name, the cell name and the target. The rim name will need to correspond to one of the headers in the demographic file.
Drag these files to the relevant boxes on the page, set the parameters and then click on the rim weight button.

The weights will then be placed in the weights tab below the button.

The tool is still in a beta stage so improvements will be made but it does work. I do need to test the speed of it though.

As it's JavaScript it all happens in the browser. This means that nothing has to be loaded to a server but it does also mean that nothing is saved.



Comments always welcome.

Tuesday, 5 May 2015

Sample size estimator

I've decided to put some of the tools from my Excel add-in on my website. The first one I've done is the sample size estimator for surveys found here.

Give it a go and see what you think. As ever, comments are welcome.

Friday, 17 April 2015

D3 Tooltips for a line chart

I wrote a blog entry on my attempts to write a line chart for D3 here. One thing it didn't have was tooltips when you hovered over the chart to tell you what the data was at that point.

I've added these for an article I recently wrote for my website. Their behaviour is limited but it's a start.

Composition

The tips themselves are constructed of one SVG 'rect' element and two SVG 'text' elements. They are positioned via the mouseover event.

Code

The first thing to do is to create the SVG element to hang all the child elements off:
    var chk = d3.select("#cht1")
.append("svg")
.attr("class", "chk")
.attr("width", 960)
.attr("height", 600);

Then we need to create the tooltip elements:
     chk.append("rect")
.attr("width", 70)
.attr("height", 50)
.attr("x","-2000")
.attr("y","-2000")
.attr("rx","2")
.attr("ry","2")
.attr("class", "tooltip_box")
.attr("id", "tooltip1")
.attr("opacity", "0.0");
    chk.append("text")
.attr("class","bbd_tooltip_text")
.attr("id","bbd_tt_txt1")
.attr("x", "-2000")
        .attr("y", "-2000")
        .attr("dy", ".35em")
        .attr("dx", ".35em")
        .text(" ");
    chk.append("text")
.attr("class","bbd_tooltip_text")
.attr("id","bbd_tt_txt2")
.attr("x", "-2000")
        .attr("y", "-2000")
        .attr("dy", ".35em")
        .attr("dx", ".35em")
        .text(" ");

I've given the elements a starting position of (-2000,-2000). It's not strictly necessary as I could have just made their opacity attribute equal to zero. I've also given the elements ids and classes.

Now we need to make them move with the mouse. I've added <rect> elements over the data points and it is to these that we add the event function:
    .on("mouseover", function(d, i) {
//Select mouse position
var ym = d3.mouse(this)[1];
d3.select("#tooltip1")
.attr("x", x_scale(resp_data[i].x)+10)
.attr("y", ym)
.attr("opacity", "0.5");
d3.select("#bbd_tt_txt1")
.attr("x", x_scale(resp_data[i].x)+10)
.attr("y", ym+12)
.text("x="+resp_data[i].x);
d3.select("#bbd_tt_txt2")
.attr("x", x_scale(resp_data[i].x)+10)
.attr("y", ym+32)
.text("y="+resp_data[i].y);
   })

In the code above resp_data is an array holding all the data, x_scale is a D3 scale object and ym holds the y position of the mouse.

For the three elements of the tooltip I've changed the opacity, the position and the text.

We also need to clear up the tooltip when we exit the <rect> element:
    .on("mouseout", function() {
d3.select("#tooltip1")
.attr("x", "-2000")
.attr("y", "-2000")
.attr("opacity", "0.0");
d3.select("#bbd_tt_txt1")
.attr("x", "100")
.attr("y", "100")
.text(" ");
d3.select("#bbd_tt_txt2")
.attr("x", "-2000")
.attr("y", "-2000")
.text(" ");
    })

and finally we need to change the tooltip when the mouse moves:
    .on("mousemove", function(d, i) {
var ym = d3.mouse(this)[1];
d3.select("#tooltip1")
.attr("x", x_scale(resp_data[i].x)+10)
.attr("y", ym)
.attr("opacity", "0.5");
d3.select("#bbd_tt_txt1")
.attr("x", x_scale(resp_data[i].x)+10)
.attr("y", ym+12)
.text("x="+resp_data[i].x);
d3.select("#bbd_tt_txt2")
.attr("x", x_scale(resp_data[i].x)+10)
.attr("y", ym+32)
.text("y="+resp_data[i].y);
    })

And that's it, apart from CSS to style the elements. I'll leave that up to you though.

Thursday, 16 April 2015

Ofcom Broadband Report 2013 Analysis

I still can't get D3 to work with blogger so I've written this as an article on my website.

It basically just looks at the data contained within the Ofcom 2013 broadband report with charts and maps created using D3.

I'll create some posts to explain how I did it later. Hope you enjoy!

Monday, 9 March 2015

Petrol Price Exploration

Introduction

With the petrol price coming down so rapidly recently I decided to have a look and see what correlation there is between the crude oil price and the price of a litre of petrol sold at a garage.

I'm sure this sort of analysis has come up quite a lot recently. It's certainly nothing new but I thought it would be interesting nonetheless.

The first thing to do is to get hold of the necessary data:
  1. Petrol prices - ONS
  2. Brent crude oil prices - US Energy Information Administration
  3. Dollar to Pound conversion rates - Bank of England
The petrol prices are an average of pump prices collected from four oil companies and two supermarkets on the Monday of each week. The data goes back to 2003.

The Brent crude spot prices are a daily series although there are gaps in the series for holidays etc. The data goes back to 1987.

The dollar to pound conversion rate is again a daily series and again there are gaps on the holidays. The gaps are similar to the gaps found in the Brent crude prices but are not always the same - different country, different holidays.



It's interesting to note that the drop in price in 2008 was much larger than the recent drop. The two series are show similar trends but they're not currently directly comparable. Let's see if we can improve on that.


Manipulation

First we need to convert the Brent crude prices to pounds per barrel. For this we use the Bank of England conversion rate data.

Next we take only the Brent crude prices from the Monday of each week so that we are comparing prices on the same day of each week.

For both of these data series I had to interpolate some data points for holes in the series. This was done using the SplineInterpolate macro from the SurveyScience excel add-in. There's no particular reason to use this over a straight line interpolation. I just thought I would.

The fourth adjustment to the data was to the pump prices of petrol. I removed the taxes. There are two taxes added - a duty and VAT. I've assumed here that duty is added first and then VAT.

The result is below:


As you can see, the correlation is remarkably good. Some of this apparent correlation is due to the way Excel has chosen the scales but it still looks good.

I did also try a smoothing algorithm on the data but I think it hides too much of the detail for the analysis.


Correlation

So taking the data from the beginning of 2011, there appears to be a fairly stable set of prices. Let's see if how much time lag there is between crude oil price and pump prices.

To do this, I've taken the prices and calculated the correlation for the two original series and then calculated the correlation for the two series when the pump prices are shifted back by a number of weeks.

We get the following two charts:


The first chart shows the data when shifted back by -14 to +14 weeks. The second chart shows the peak in more detail with interpolated points (spline).

This shows that pump prices follow crude oil prices most closely 2 weeks and 5 days later i.e. there is a lag of 2 weeks and 5 days.

Accounting for this lag, let's plot the two series against each other:

The plot looks indicative of a strong relationship between the two prices. However note that there are very few points towards the bottom left. This is the recent drop in prices. Taking these out we get:


Still a strong relationship but the R squared value has dropped from 0.9 to 0.64. So you can predict the price given the crude oil price but you will be off by quite a bit a lot of the time. Much of this is due to the volatility of the crude oil prices compared to the pump price as the plot below shows. It is a plot of the two series accounting for the lag:


Further Analysis

One of the questions often raised is whether the price at the pump rises quickly on a rise in crude oil but drops slowly when crude oil prices decline.

There are two related ways to look at this:
  • Are the peaks in the two series closer together than the troughs,
  • Are the positive gradients of the pump prices more steep than the negative gradients.
Unfortunately I'm going to have to leave that for another time. Until then.

Wednesday, 4 February 2015

Extracting data from text files using Java

I use Java a fair amount to look at data contained within text files. It can be data from surveys, logs of various processes and sometimes extracts from databases.

This post links to four articles on how to read data line by line from a simple text file. The four articles are all very similar but detail different methods of accessing the data. They all consist of five steps:
  • Defining the file
  • Opening the file
  • Reading in the data
  • Doing something with the data
  • Outputting the result
The articles are:
  1. Extract data from a simple text file
  2. Extract data from gzipped text files - useful if the individual files are large. The biggest bottleneck is disc access times, especially HDDs.
  3. Extract data from zipped text files (zip archives)
  4. Extract data from a directory of text files - it's often the case that you want to analyse data from a whole raft of files.
Hopefully they will be of use.