Plotting Statistical Visualizations with AlgebraOfGraphics.jl
Authors
Jose Storopoli
Juan Oneto
In this tutorial, we’ll explore different statistical transformations that we can apply to our visualizations with AoG.jl.
These can be added to any AoG.jl layer with the * operator.
We will also cover some geometries that are commonly used with the statistical visualization functions: Contour and Heatmap
Statistical transformations are paired with defaults for visual(), so you don’t need to specify a visual() yourself. You just add the data() layer along with a mapping() layer and finalize with a statistical transformation layer.
If you want to customize other aspects of the histogram’s underlying bar plot, you can add a visual() layer without any plotting type inside and just with the keyword desired customizations as keywords arguments:
data(df) *mapping(:AGE; layout =:SEX) *histogram() *visual(; color =:blue) |> draw
The number of bins to use is determined automatically. But if you want, you can customize with the keyword argument bins inside histogram():
histogram() also has a normalization keyword argument which lets you specify a normalization scheme. There are 4 possible normalizations schemes and you can specify them using the corresponding Symbols:
:none: the default, no normalization applied, i.e. raw count.
:pdf: normalize by sum of weights and bin sizes. The resulting histogram will behave as a probability density function (PDF) which the sum of all bins sums to 1.
:density: normalize by bin sizes only. The resulting histogram represents count density of input and does not sum to 1.
:probability: normalize by sum of weights only. The resulting histogram represents the fraction of probability mass for each bin and does not sum to 1.
Our advice is to use either :none (the default) for a raw histogram or :pdf for a relative histogram.
We can also change the colormap keyword argument inside visual(). Let’s make a black and white printer-friendly visualization with colormap = Reverse(:greys):
As before we can customize our plot with keyword arguments using a visual() layer without any plotting type inside:
data(df) *mapping(:AGE) * AlgebraOfGraphics.density() *visual(; color = (:blue, 0.75)) |>draw
Tip
Makie.jl lets us specify colors either as Symbols, e.g. :blue; or as a tuple with length 2 where the first element is a Symbol representing the desired color and the second element is a Float representing the desired transparency (i.e. alpha).
Since AoG.jl uses Makie.jl as a backend we can use Makie.jl’s multiple dispatch on the color argument.
For a complete list of named colors that Makie.jl has access to, see here.
Also as before, we can perform faceting by specifying either a layout, row or col as keyword arguments inside mapping():
data(df) *mapping(:AGE, col =:SEX) * AlgebraOfGraphics.density() |> draw
Similar to histogram(), you can also specify a two-axes density() transformation, i.e. a 2-D density plot.
Let’s use the same example as before with :AGE and :eGFR to take a look at the relationship between age and kidney function:
density() has some nice extra features. You can also use it with 3-D visualizations by passing an Axis3 as type inside the axisNamedTuple customization inside the draw() function. This is done with the Surface plotting type inside visual():
The third statistical function we will cover is the frequency() function which computes a raw frequency table of the arguments.
Tip
frequency() does not take any arguments.
The simplest example could be just computing the frequency of the column :SEX:
data(df) *mapping(:SEX) *frequency() |> draw
As before we can add faceting with either a layout, row or col as keyword arguments inside mapping(); and customize with keyword arguments inside an empty visual() layer:
data(df) *mapping(:SEX; layout =:WEIGHT_cat) *frequency() *visual(; color =:blue) |>draw
A nice plot to have up your sleeve is a frequency() plot using stacked bars.
This can be done by specifying the keyword arguments color and stack to a desired column inside mapping():
data(df) *mapping(:SEX; color =:WEIGHT_cat, stack =:WEIGHT_cat) *frequency() |> draw
frequency() can also be paired with a 2-D visualization in order to have a heatmap plot of raw counts.
For example, the previous stacked bar frequency plot can be done as a 2-D heatmap frequency plot:
Our fourth statistical visualization function is expectation() which is the mathematical term for the “mean” of a random variable and used extensively in fields like probability.
expectation computes the expected value, i.e. “mean” of a random variable, of the second argument conditioned on the values of the first argument inside mapping(). In other words, the mean of the y column conditioned on the x column.
Here is an example with only the columns :SEX and :AGE:
As before we can add faceting with either a layout, row or col as keyword arguments inside mapping(); and customize with keyword arguments inside an empty visual() layer:
Another nice plot to have up your sleeve is a grouped expected value bar plot.
This is accomplished with adding keyword arguments of color along with dodge inside mapping():
data(df) *mapping(:SEX, :AGE; color =:WEIGHT_cat, dodge =:WEIGHT_cat) *expectation() *visual(; color =:blue) |> draw
Tip
expectation() does not take any arguments.
2.5linear()
Our fifth statistical visualization function is linear() which draws a linear trend line between two variables. This is similar to geom_smooth(method = "lm") in ggplot2. It computes a linear fit using the following formula: y ~ 1 + x.
Let’s see an example using :AGE vs :eGFR:
data(df) *mapping(:AGE, :eGFR) *linear() |> draw
Caution
If we also add a visual(Scatter) to see the data points along with our linear trend line, we get something that is not our original intention. This is because AoG.jl has two algebraic operations: addition with + and multiplication with *. To superimpose layers you need to use the + and not the * operator. We’ll discuss more in-depth AoG.jl’s algebraic operations in Tutorial 5 - Grammar of Graphics with AlgebraOfGraphics.jl. Don’t forget to check it out.