Monday 3 June 2013

Using an online panel to show Windows OS share

Deciphering http user agent data from an online panel

The http user agent string is the data that is sent by the browser or software to identify that software to the website.

Looking at the http user agent data generated by your panel members can be very instructive. You can find out what operating they are using, what browser and the browser version. Some user agents will also tell you what device the software is being run on. The downside to this is that user agent data can be spoofed. See here for a fuller explanation of spoofing.

One benefit online panels have over the usual statistics from providers, such as Statcounter, is that you know exactly who has logged on and will not over or under count them in the statistics. As ever, Wikipedia has a list of reasons for over and under counting. There's no point repeating them here.

So, how have I decoded the strings? Badly, it turns out. There are better ways to do what I've done and there are websites around that have decoders available. See wurfl and MobileESP for examples.

The user agent is made up of around 5 parts. Websites such as www.useragentstring.com show exactly how these are made up.

I was initially only interested in 4 items from the string and these are relatively easy to extract. Therefore I did it myself. Some of the aforementioned websites are subject to restrictive licences for commercial use and so I couldn't use them for work. The 4 items are:

  1. Operating system
  2. OS version
  3. Browser
  4. Browser version

Part 1 is a simple matter of looking for the relevant string, although you do need to be careful with the order in which you search as Android phones will have Linux in the string as well (assuming that you're interested in splitting out Android) and iPads etc. have OS X as part of their string.

So I searched for:
a) Windows
b) Android
c) Intel Mac OS X
d) Linux
e) Mac OS X - iPad and iPhones

This won't find Blackberry or Symbian systems, although it's trivial to add these.
Part 2 is more complicated but the OS version is generally around the OS string. Therefore it's just a case of creating a regular expression to extract the relevant information. For all systems we can use the pattern:
 ((\d+)(\.\d+)*)
The pattern is searching for any number of digits (one or more) and then any number of a dot coupled with any number of digits (one or more) zero or more times. Hopefully that makes sense.

You will need to replace underscores with dots in the user agent string and Windows strings pre-NT need to be catered for differently (Win98 is one example). The first group from the match is captured.

Part 3 is once again simple. We look for the strings:
a) MSIE
b) Firefox
c) Chrome
d) Safari
e) Opera

Chrome strings always include Safari so it needs to be searched for first. Again Blackberry and feature phones won't be found if only searching with these strings.

Part 4 is not so simple and requires a bit of faff to find the version. The above pattern can be reused with the addition of \/ at the beginning to extract the version number. The version number is usually with the browser string and so should be relatively easy to find but inconsistencies within the user agents often cause problems.


Results

Here's an example of the data that can be extracted:



This chart shows the smoothed, raw (so not weighted to be representative) percentage share of the various Windows operating systems in Great Britain from early 2007 to early 2013. I've also put in the release dates of the operating systems (according to Wikipedia).

More work would need to be done on this (weighting, test the smoothing) but it does show fairly well the tail off in the rise in Windows 7 since Windows 8 was released. Also although it's early days the take up of the Windows 8 is only just behind that of the previous 2 systems.

The gradient of the Windows 7 share line is clearly steeper than either Vista of Windows 8 but it's not by much. The next few months will be interesting.

I also find it interesting that Vista never had more share than XP.

Because the data is from a panel we could look at switching and demographics. Another time maybe.

No comments:

Post a Comment