Introduction
A boxplot is a way of plotting the median of a set of data points alongside the quartile ranges and any outliers. Basically it's a good way of showing the range and distribution of a univariate data set.
The boxplot consists of a line showing the median, a box encompassing the first and third quartiles, and whiskers showing lines at 1.5 times the inter-quartile range above the third and below the first quartile. The inter-quartile range is just the third quartile minus the first quartile.
The image below is an example of what the macro produces:
The Macro
The macro takes the raw data, a univariate set of points, calculates the quartiles and plots them on an open-high-low-close stock chart. Lines are added for the median and the inter-quartile boundaries. The outliers are added as a XY scatter plot series.
A form is used to get the range holding the data but you can adjust the macro to used any old range. The form also has a box to select whether you want the outliers plotted. Again, you can change the macro for this.
There are two parts to the macro. The first takes the data, calculates the various values and plots them. The second adjusts the chart to update it when it is resized or recalculated.
Here's the first part:
-------------------------------------------------------------------------------
'Define global variables
Dim bpd As New Dictionary
Dim boxPlotCount As Integer
Sub Boxplot()
Dim q1 As Double
Dim q2 As Double
Dim q3 As Double
Dim iqr As Double
Dim Data As range
Dim counter As Long
Dim i As Long
Dim outliers As range
Dim outliersArray() As Integer
Dim a1 As Shape
Dim a2 As Shape
Dim a3 As Shape
Dim m As Double
Dim c As Double
Dim bpt As New BoxPlotChart
'Show form to get location of data
BoxPlotForm.Show
If BoxPlotForm.cancelB = True Then
Unload BoxPlotForm
Exit Sub
End If
Set Data = range(BoxPlotForm.RefEdit1.Text)
'Calculate median, first quartile, third quartile and inter-quartile range
q1 = WorksheetFunction.Quartile_Inc(Data, 1)
q2 = WorksheetFunction.Quartile_Inc(Data, 2)
q3 = WorksheetFunction.Quartile_Inc(Data, 3)
iqr = 1.5 * (q3 - q1)
'Create new sheet
Worksheets.Add
Cells(1, 2).Value = "Data"
Cells(1, 3).Value = "Chart Data"
Cells(2, 1).Value = "Inter quartile range"
Cells(3, 1).Value = "Quartile 1"
Cells(4, 1).Value = "Median"
Cells(5, 1).Value = "Quartile 3"
Cells(6, 1).Value = "Inter quartile range"
Cells(2, 2).Value = iqr
Cells(3, 2).Value = q1
Cells(4, 2).Value = q2
Cells(5, 2).Value = q3
Cells(6, 2).Value = iqr
'Filter list of points to those outside IQR
Cells(1, 5).Value = "Extremes"
counter = 2
For i = 1 To Data.Rows.Count
If Data(i, 1).Value > (q3 + iqr) Or Data(i, 1).Value < (q1 - iqr) Then
Cells(counter, 5).Value = Data(i, 1).Value
counter = counter + 1
End If
Next
Set outliers = range(Cells(2, 5), Cells(counter - 1, 5))
ReDim outliersArray(counter - 2)
For i = 0 To counter - 2
outliersArray(i) = 1
Next
Cells(2, 3).Value = iqr
Cells(3, 3).Value = q1
Cells(4, 3).Value = q3 + iqr
Cells(5, 3).Value = q1 - iqr
Cells(6, 3).Value = q3
For i = 9 To 12
Cells(3, i).Value = q1
Cells(4, i).Value = q3 + iqr
Cells(5, i).Value = q1 - iqr
Cells(6, i).Value = q3
Next
'Plot a ohlc chart chart
ActiveSheet.Shapes.AddChart.Select
ActiveChart.SetSourceData Source:=range(Cells(3, 9), Cells(6, 12))
ActiveChart.ChartType = xlStockOHLC
ActiveChart.PlotBy = xlRows
ActiveChart.SetSourceData Source:=range(Cells(3, 3), Cells(6, 3))
Set bpt.BoxChart = ActiveChart
'Delete temporary data
range(Cells(3, 9), Cells(6, 12)).ClearContents
'Plot outliers
If BoxPlotForm.PlotCheckBox.Value = True And counter > 2 Then
'Plot these as a scatterplot with x of 1
For i = 1 To counter - 2
With ActiveChart.SeriesCollection.NewSeries
.ChartType = xlXYScatter
.Values = outliers(i, 1)
.Name = "Outliers" & i
.XValues = outliersArray(i - 1)
.MarkerStyle = -4168
.MarkerSize = 3
.MarkerForegroundColor = RGB(0, 0, 0)
End With
Next
'Need to make sure that both axes line up
ActiveChart.Axes(xlValue, xlSecondary).MinimumScale = ActiveChart.Axes(xlValue).MinimumScale
ActiveChart.Axes(xlValue, xlSecondary).MaximumScale = ActiveChart.Axes(xlValue).MaximumScale
ActiveChart.Axes(xlValue).MinimumScale = ActiveChart.Axes(xlValue, xlSecondary).MinimumScale
ActiveChart.Axes(xlValue).MaximumScale = ActiveChart.Axes(xlValue, xlSecondary).MaximumScale
End If
'Get rid of horizontal grid lines, x axis labels and ticks
With ActiveChart.Axes(xlCategory)
.MajorTickMark = xlNone
.TickLabelPosition = xlNone
End With
ActiveChart.Legend.Delete
ActiveChart.Axes(xlValue).MajorGridlines.Delete
'Plot line on chart
m = (ActiveChart.PlotArea.InsideTop - (ActiveChart.PlotArea.InsideTop + ActiveChart.PlotArea.InsideHeight)) / (ActiveChart.Axes(xlValue).MaximumScale - ActiveChart.Axes(xlValue).MinimumScale)
c = ActiveChart.PlotArea.InsideTop - m * ActiveChart.Axes(xlValue).MaximumScale
Set bpt.s1 = ActiveChart.Shapes.AddLine((ActiveChart.PlotArea.InsideLeft + 0.3 * ActiveChart.PlotArea.InsideWidth), m * q2 + c, (ActiveChart.PlotArea.InsideLeft + 0.7 * ActiveChart.PlotArea.InsideWidth), m * q2 + c)
bpt.s1.Line.ForeColor.ObjectThemeColor = msoThemeColorAccent2
'Plot lines at high and low markers
Set bpt.s2 = ActiveChart.Shapes.AddLine((ActiveChart.PlotArea.InsideLeft + 0.33 * ActiveChart.PlotArea.InsideWidth), m * (q3 + iqr) + c, (ActiveChart.PlotArea.InsideLeft + 0.66 * ActiveChart.PlotArea.InsideWidth), m * (q3 + iqr) + c)
bpt.s2.Line.ForeColor.ObjectThemeColor = msoThemeColorText1
Set bpt.s3 = ActiveChart.Shapes.AddLine((ActiveChart.PlotArea.InsideLeft + 0.33 * ActiveChart.PlotArea.InsideWidth), m * (q1 - iqr) + c, (ActiveChart.PlotArea.InsideLeft + 0.66 * ActiveChart.PlotArea.InsideWidth), m * (q1 - iqr) + c)
bpt.s3.Line.ForeColor.ObjectThemeColor = msoThemeColorText1
bpt.tq = q3 + iqr
bpt.lq = q1 - iqr
bpt.q2 = q2
boxPlotCount = boxPlotCount + 1
'Add chart to dictionary
bpd.Add boxPlotCount, bpt
End Sub
-------------------------------------------------------------------------------
And here is the boxplot object that replots the lines when it is adjusted. You'll need to create a class module called 'BoxPlotChart' to hold the code:
-------------------------------------------------------------------------------
Option Explicit
Public WithEvents BoxChart As Chart
Public posY As Integer
Public s1 As Shape
Public s2 As Shape
Public s3 As Shape
Public tq As Double
Public lq As Double
Public q2 As Double
Private Sub BoxChart_Calculate()
'Redraw the median line
'Need to know the position of the top and bottom in both
'screen coordinates and scale
Dim m As Double
Dim c As Double
m = (ActiveChart.PlotArea.InsideTop - (ActiveChart.PlotArea.InsideTop + ActiveChart.PlotArea.InsideHeight)) / (ActiveChart.Axes(xlValue).MaximumScale - ActiveChart.Axes(xlValue).MinimumScale)
c = ActiveChart.PlotArea.InsideTop - m * ActiveChart.Axes(xlValue).MaximumScale
If Not (s1 Is Nothing) Then
s1.Left = (ActiveChart.PlotArea.InsideLeft + 0.3 * ActiveChart.PlotArea.InsideWidth)
s1.Top = m * q2 + c
s1.width = (0.4 * ActiveChart.PlotArea.InsideWidth)
End If
If Not (s2 Is Nothing) Then
s2.Left = (ActiveChart.PlotArea.InsideLeft + 0.33 * ActiveChart.PlotArea.InsideWidth)
s2.Top = m * tq + c
s2.width = (0.33 * ActiveChart.PlotArea.InsideWidth)
End If
If Not (s3 Is Nothing) Then
s3.Left = (ActiveChart.PlotArea.InsideLeft + 0.33 * ActiveChart.PlotArea.InsideWidth)
s3.Top = m * lq + c
s3.width = (0.33 * ActiveChart.PlotArea.InsideWidth)
End If
End Sub
Private Sub BoxChart_Resize()
'Redraw the median line
'Need to know the position of the top and bottom in both
'screen coordinates and scale
Dim m As Double
Dim c As Double
m = (ActiveChart.PlotArea.InsideTop - (ActiveChart.PlotArea.InsideTop + ActiveChart.PlotArea.InsideHeight)) / (ActiveChart.Axes(xlValue).MaximumScale - ActiveChart.Axes(xlValue).MinimumScale)
c = ActiveChart.PlotArea.InsideTop - m * ActiveChart.Axes(xlValue).MaximumScale
s1.Left = (ActiveChart.PlotArea.InsideLeft + 0.3 * ActiveChart.PlotArea.InsideWidth)
s1.Top = m * q2 + c
s1.width = (0.4 * ActiveChart.PlotArea.InsideWidth)
s2.Left = (ActiveChart.PlotArea.InsideLeft + 0.33 * ActiveChart.PlotArea.InsideWidth)
s2.Top = m * tq + c
s2.width = (0.33 * ActiveChart.PlotArea.InsideWidth)
s3.Left = (ActiveChart.PlotArea.InsideLeft + 0.33 * ActiveChart.PlotArea.InsideWidth)
s3.Top = m * lq + c
s3.width = (0.33 * ActiveChart.PlotArea.InsideWidth)
End Sub
-------------------------------------------------------------------------------
Limitations
Although the macro works fine when you run it, if you save the workbook, close it and then reopen it, the charts are just charts. They no longer resize properly.
Presumably you'd need to cycle through the charts in the workbook and re-assign any that were boxplots to BoxPlotCharts.
Limitation number two is that it only creates a boxplot for one set of data. It would be fairly trivial to extend this to more than one set of data.
Limitation number three is that the function used to calculate the quartiles only appeared in Excel 2010. I believe that there is a 'quartile' function in previous versions though so it can be changed for this.
Conclusion
Hopefully this macro is useful. As ever, if you have any comments, improvements or find any bugs, etc. then please comment.
No comments:
Post a Comment