Wednesday, 23 October 2013

Excel Box Plot Macro

Introduction

A boxplot is a way of plotting the median of a set of data points alongside the quartile ranges and any outliers. Basically it's a good way of showing the range and distribution of a univariate data set.

The boxplot consists of a line showing the median, a box encompassing the first and third quartiles, and whiskers showing lines at 1.5 times the inter-quartile range above the third and below the first quartile. The inter-quartile range is just the third quartile minus the first quartile.

The image below is an example of what the macro produces:


It was produced from a random set of data generated from the normal distribution with a mean of 0 and a standard deviation of 1.

The Macro

The macro takes the raw data, a univariate set of points, calculates the quartiles and plots them on an open-high-low-close stock chart. Lines are added for the median and the inter-quartile boundaries. The outliers are added as a XY scatter plot series.

A form is used to get the range holding the data but you can adjust the macro to used any old range. The form also has a box to select whether you want the outliers plotted. Again, you can change the macro for this.

There are two parts to the macro. The first takes the data, calculates the various values and plots them. The second adjusts the chart to update it when it is resized or recalculated.

Here's the first part:

-------------------------------------------------------------------------------
'Define global variables
Dim bpd As New Dictionary
Dim boxPlotCount As Integer

Sub Boxplot()
    Dim q1 As Double
    Dim q2 As Double
    Dim q3 As Double
    Dim iqr As Double
    Dim Data As range
    Dim counter As Long
    Dim i As Long
    Dim outliers As range
    Dim outliersArray() As Integer
    Dim a1 As Shape
    Dim a2 As Shape
    Dim a3 As Shape
    Dim m As Double
    Dim c As Double
    Dim bpt As New BoxPlotChart
    
    'Show form to get location of data
    BoxPlotForm.Show
    
    If BoxPlotForm.cancelB = True Then
        Unload BoxPlotForm
        Exit Sub
    End If
    
    Set Data = range(BoxPlotForm.RefEdit1.Text)
    
    'Calculate median, first quartile, third quartile and inter-quartile range
    q1 = WorksheetFunction.Quartile_Inc(Data, 1)
    q2 = WorksheetFunction.Quartile_Inc(Data, 2)
    q3 = WorksheetFunction.Quartile_Inc(Data, 3)
    iqr = 1.5 * (q3 - q1)
    
    'Create new sheet
    Worksheets.Add
    Cells(1, 2).Value = "Data"
    Cells(1, 3).Value = "Chart Data"
    
    Cells(2, 1).Value = "Inter quartile range"
    Cells(3, 1).Value = "Quartile 1"
    Cells(4, 1).Value = "Median"
    Cells(5, 1).Value = "Quartile 3"
    Cells(6, 1).Value = "Inter quartile range"
    
    Cells(2, 2).Value = iqr
    Cells(3, 2).Value = q1
    Cells(4, 2).Value = q2
    Cells(5, 2).Value = q3
    Cells(6, 2).Value = iqr
    
    'Filter list of points to those outside IQR
    Cells(1, 5).Value = "Extremes"
    counter = 2
    For i = 1 To Data.Rows.Count
        If Data(i, 1).Value > (q3 + iqr) Or Data(i, 1).Value < (q1 - iqr) Then
            Cells(counter, 5).Value = Data(i, 1).Value
            counter = counter + 1
        End If
    Next
    
    Set outliers = range(Cells(2, 5), Cells(counter - 1, 5))
    ReDim outliersArray(counter - 2)
    For i = 0 To counter - 2
        outliersArray(i) = 1
    Next
        
    Cells(2, 3).Value = iqr
    Cells(3, 3).Value = q1
    Cells(4, 3).Value = q3 + iqr
    Cells(5, 3).Value = q1 - iqr
    Cells(6, 3).Value = q3
        
    For i = 9 To 12
        Cells(3, i).Value = q1
        Cells(4, i).Value = q3 + iqr
        Cells(5, i).Value = q1 - iqr
        Cells(6, i).Value = q3
    Next
    
    'Plot a ohlc chart chart
    ActiveSheet.Shapes.AddChart.Select
    ActiveChart.SetSourceData Source:=range(Cells(3, 9), Cells(6, 12))
    ActiveChart.ChartType = xlStockOHLC
    ActiveChart.PlotBy = xlRows
    ActiveChart.SetSourceData Source:=range(Cells(3, 3), Cells(6, 3))
    Set bpt.BoxChart = ActiveChart
    
    'Delete temporary data
    range(Cells(3, 9), Cells(6, 12)).ClearContents
    
    'Plot outliers
    If BoxPlotForm.PlotCheckBox.Value = True And counter > 2 Then
        'Plot these as a scatterplot with x of 1
        For i = 1 To counter - 2
            With ActiveChart.SeriesCollection.NewSeries
                .ChartType = xlXYScatter
                .Values = outliers(i, 1)
                .Name = "Outliers" & i
                .XValues = outliersArray(i - 1)
                .MarkerStyle = -4168
                .MarkerSize = 3
                .MarkerForegroundColor = RGB(0, 0, 0)
            End With
        Next
        
        'Need to make sure that both axes line up
        ActiveChart.Axes(xlValue, xlSecondary).MinimumScale = ActiveChart.Axes(xlValue).MinimumScale
        ActiveChart.Axes(xlValue, xlSecondary).MaximumScale = ActiveChart.Axes(xlValue).MaximumScale
        ActiveChart.Axes(xlValue).MinimumScale = ActiveChart.Axes(xlValue, xlSecondary).MinimumScale
        ActiveChart.Axes(xlValue).MaximumScale = ActiveChart.Axes(xlValue, xlSecondary).MaximumScale
    End If
    
    'Get rid of horizontal grid lines, x axis labels and ticks
    With ActiveChart.Axes(xlCategory)
        .MajorTickMark = xlNone
        .TickLabelPosition = xlNone
    End With
    ActiveChart.Legend.Delete
    ActiveChart.Axes(xlValue).MajorGridlines.Delete
    
    'Plot line on chart
    m = (ActiveChart.PlotArea.InsideTop - (ActiveChart.PlotArea.InsideTop + ActiveChart.PlotArea.InsideHeight)) / (ActiveChart.Axes(xlValue).MaximumScale - ActiveChart.Axes(xlValue).MinimumScale)
    c = ActiveChart.PlotArea.InsideTop - m * ActiveChart.Axes(xlValue).MaximumScale
    Set bpt.s1 = ActiveChart.Shapes.AddLine((ActiveChart.PlotArea.InsideLeft + 0.3 * ActiveChart.PlotArea.InsideWidth), m * q2 + c, (ActiveChart.PlotArea.InsideLeft + 0.7 * ActiveChart.PlotArea.InsideWidth), m * q2 + c)
    bpt.s1.Line.ForeColor.ObjectThemeColor = msoThemeColorAccent2
    
    'Plot lines at high and low markers
    Set bpt.s2 = ActiveChart.Shapes.AddLine((ActiveChart.PlotArea.InsideLeft + 0.33 * ActiveChart.PlotArea.InsideWidth), m * (q3 + iqr) + c, (ActiveChart.PlotArea.InsideLeft + 0.66 * ActiveChart.PlotArea.InsideWidth), m * (q3 + iqr) + c)
    bpt.s2.Line.ForeColor.ObjectThemeColor = msoThemeColorText1
    Set bpt.s3 = ActiveChart.Shapes.AddLine((ActiveChart.PlotArea.InsideLeft + 0.33 * ActiveChart.PlotArea.InsideWidth), m * (q1 - iqr) + c, (ActiveChart.PlotArea.InsideLeft + 0.66 * ActiveChart.PlotArea.InsideWidth), m * (q1 - iqr) + c)
    bpt.s3.Line.ForeColor.ObjectThemeColor = msoThemeColorText1
    
    bpt.tq = q3 + iqr
    bpt.lq = q1 - iqr
    bpt.q2 = q2
    
    boxPlotCount = boxPlotCount + 1
    'Add chart to dictionary
    bpd.Add boxPlotCount, bpt
    
End Sub
-------------------------------------------------------------------------------

And here is the boxplot object that replots the lines when it is adjusted. You'll need to create a class module called 'BoxPlotChart' to hold the code:

-------------------------------------------------------------------------------
Option Explicit

Public WithEvents BoxChart As Chart
Public posY As Integer
Public s1 As Shape
Public s2 As Shape
Public s3 As Shape
Public tq As Double
Public lq As Double
Public q2 As Double

Private Sub BoxChart_Calculate()
    'Redraw the median line
    
    'Need to know the position of the top and bottom in both
    'screen coordinates and scale
    
    Dim m As Double
    Dim c As Double
    
    m = (ActiveChart.PlotArea.InsideTop - (ActiveChart.PlotArea.InsideTop + ActiveChart.PlotArea.InsideHeight)) / (ActiveChart.Axes(xlValue).MaximumScale - ActiveChart.Axes(xlValue).MinimumScale)
    c = ActiveChart.PlotArea.InsideTop - m * ActiveChart.Axes(xlValue).MaximumScale
    
    If Not (s1 Is Nothing) Then
        s1.Left = (ActiveChart.PlotArea.InsideLeft + 0.3 * ActiveChart.PlotArea.InsideWidth)
        s1.Top = m * q2 + c
        s1.width = (0.4 * ActiveChart.PlotArea.InsideWidth)
    End If
    
    If Not (s2 Is Nothing) Then
        s2.Left = (ActiveChart.PlotArea.InsideLeft + 0.33 * ActiveChart.PlotArea.InsideWidth)
        s2.Top = m * tq + c
        s2.width = (0.33 * ActiveChart.PlotArea.InsideWidth)
    End If
    
    If Not (s3 Is Nothing) Then
        s3.Left = (ActiveChart.PlotArea.InsideLeft + 0.33 * ActiveChart.PlotArea.InsideWidth)
        s3.Top = m * lq + c
        s3.width = (0.33 * ActiveChart.PlotArea.InsideWidth)
    End If
    
End Sub

Private Sub BoxChart_Resize()
    'Redraw the median line
    
    'Need to know the position of the top and bottom in both
    'screen coordinates and scale
    
    Dim m As Double
    Dim c As Double
    
    m = (ActiveChart.PlotArea.InsideTop - (ActiveChart.PlotArea.InsideTop + ActiveChart.PlotArea.InsideHeight)) / (ActiveChart.Axes(xlValue).MaximumScale - ActiveChart.Axes(xlValue).MinimumScale)
    c = ActiveChart.PlotArea.InsideTop - m * ActiveChart.Axes(xlValue).MaximumScale
    
    s1.Left = (ActiveChart.PlotArea.InsideLeft + 0.3 * ActiveChart.PlotArea.InsideWidth)
    s1.Top = m * q2 + c
    s1.width = (0.4 * ActiveChart.PlotArea.InsideWidth)
    
    s2.Left = (ActiveChart.PlotArea.InsideLeft + 0.33 * ActiveChart.PlotArea.InsideWidth)
    s2.Top = m * tq + c
    s2.width = (0.33 * ActiveChart.PlotArea.InsideWidth)
    
    s3.Left = (ActiveChart.PlotArea.InsideLeft + 0.33 * ActiveChart.PlotArea.InsideWidth)
    s3.Top = m * lq + c
    s3.width = (0.33 * ActiveChart.PlotArea.InsideWidth)
    
End Sub
-------------------------------------------------------------------------------

Limitations

Although the macro works fine when you run it, if you save the workbook, close it and then reopen it, the charts are just charts. They no longer resize properly.

Presumably you'd need to cycle through the charts in the workbook and re-assign any that were boxplots to BoxPlotCharts.

Limitation number two is that it only creates a boxplot for one set of data. It would be fairly trivial to extend this to more than one set of data.

Limitation number three is that the function used to calculate the quartiles only appeared in Excel 2010. I believe that there is a 'quartile' function in previous versions though so it can be changed for this.

Conclusion

Hopefully this macro is useful. As ever, if you have any comments, improvements or find any bugs, etc. then please comment.

No comments:

Post a Comment