Stats, MR and Data: October 2015

Wednesday, 21 October 2015

American Football Statistics part 2

In the last post, I went through an interesting (for me anyway) observation from a book on the proportion of wins for American football teams. I only gave a chart for the example of the Buffalo Bills team.

Well, I've now put the data together for the rest of the teams and placed it on my site. You can now look at the stats for the rest of the teams:

Or at least those teams that have data from 1960 to 2014. I still find it hard to find much periodicity in quite a few of the teams. And some teams have some quite lengthy periods of high win percentages (San Francisco 49ers):

It's only to be expected that there would be some exceptions. What I should do if I wanted to 'prove' it one way or the other is to calculate the periodicity for each team and then see what correlation we get. Next time maybe.

Data courtesy of www.pro-football-reference.com
Charts courtesy of D3 (www.d3js.org)

Monday, 19 October 2015

American Football Statistics

Recently I've been reading a book about how mathematical our everyday lives are (Towing Icebergs, falling dominoes and other adventures in applied mathematics by Robert B. Banks). In this book the author has an interesting chapter dedicated to the statistics of America football.

In this chapter he uses the data from the performances of the NFL teams from 1960 to 1992 to suggest that a first order linear discrete delay differential equation can be used to model the teams winning record for each season and that the performance is periodic with a specific time between each peak in the percentage of wins for that season.

The rationale is that (basically) the performance of the team from the previous season dictates the order (in reverse) of selection of new talent for the upcoming season.

The equation he derives is:

dU(t)/dt=a[U_m-U(t-τ)]

where U(t) is the proportion of wins in each season, U_m is the league wide average value of U and a is the growth coefficient.

Irritatingly he says that there are many ways to solve the equation but then uses an approximate, 'risky' method to produce a solution (Taylor series expansion in case you're wondering). He does however provide references so that you can follow up on the details.

The example he goes through in most detail is that of the Buffalo Bills. The graph of their performance for the years in question is below (thanks to this site for the data). It does indeed seem to be periodic between the years he mentioned.

This seemed odd to me. Obviously you wouldn't be able to determine the exact rank of each team in each year but you would know which teams were likely to be going down or up the rankings. You'd be able to tell, for example, that if your team did well one season, then they were likely to do less well the next.

However, when you look at the data for subsequent years the pattern was hard to determine. It seemed that the Bills were on a consistent downward trend after this. Although given the volatility of the data it's possible that a lot of frequencies would fit this chart.

Has something changed in the way selection is now carried out? I don't know. Is the pattern that he spotted (or at least went through) the same for other teams throughout this time period? Does it also change after 1992?

I'll follow it up by looking at all the teams that were around between 1960 and 1992 next time.

Monday, 12 October 2015

Analysis of the numbers of views of my blog part 2

Last time I looked at the number of views per month I was getting on my blog. This time I will go into more detail on the numbers behind this.

First up is the number of views per day for each entry by time. This should give me an indication of which entries are more popular. I've divided the number of views by the number of days since the entry was written so that individual entries can be compared on an equal basis.

The picture is possibly a little hard to see when plotted like this but it does show that there are some entries that have a much higher 'popularity rating than others. To put this in a better perspective I've plotted a frequency distribution of them:

There are roughly four entries/posts that have more than 1 view per day. These are my 'popular' entries (and yes, this is popular in a very relative sense).

Let's look at the titles for the top 20 posts. Is there something in common? If I want to make this blog more popular is there something I should focus on? Something other than statistics probably. Anyway here is the chart:

Well, 2 of the popular posts are Java related, 2 are Excel related. I only have 4 Java related posts so this is fairly good for Java. However I have over 20 Excel related posts - not so good for Excel. Especially as one of those is to do with rim weighting.

My next chart shows the number of times that a post label appears. For those that don't know, the labels are just descriptive words or phrases that describe the post.

You can see that I've been concentrating on Excel quite a bit over the years.

The last chart shows the average views per day for these labels ranked by the average.

This just re-iterates the points above - Java is popular. Having said that, Java also has a very high standard deviation. Only one post has ever been relatively very high. And it was one of the first that I did and was about JavaFX rather than plain old Java.

As for whether I can make the blog more popular by picking what to write about based on these figures, I'm not sure. I think that there is far too much noise in the data. It's hard to pick a common theme. I'm sure I could do more analysis on this data but one for the future I think.

Wednesday, 7 October 2015

Analysis of the numbers from my blog

I've decided to write shorter entries for my blog. The entries end up being very long and cover a lot of aspects so I'm splitting them up to cover only one to a few aspects at a time.

So for my first shorty, I'm going to look at the numbers of views to this blog. Initially it will just be a look at the number of views per month. I'll then go into the views per post, what looks most interesting for people etc.

So, without further ado, here are the numbers of views per month plotted with the number of posts per month:

The left hand y-axis shows the number of views per month. The right hand y-axis shows the number of blog posts per month.

So, I'm getting on average about 800 views per month since the beginning of 2014. I've no idea if this is good or bad. Probably really bad as I don't connect with other people or mention the blog to anyone.

You can also see that there has been a major slowdown in blog entries recently. This corresponds to a drop in the number of views.

The next chart shows the difference in views from one month to the next plotted against the number of blog entries per month.

By simply calculating the correlation between the two series in the chart above we get a value of 0.2. This would seem to suggest that there is no correlation between the number of blog posts and the monthly increase in the number of views. It's all very random.

I will leave you with one final chart:

This shows the cumulative number of views and the cumulative number of posts. As you can see the number of views rises linearly with time. So much so, in fact, that you can fit a linear trendline through the data from January 2014 onwards to get the gradient of ~869 views per month.

From this it would seem that nothing I have, or have not done, has affected the upward trend.

Next time, I'll look at the data from the individual blog entries.