Plotting Different Geometries with AlgebraOfGraphics.jl

Authors

Jose Storopoli

Juan Oneto

In this tutorial, we will explore how to make different kinds of plots (also called geometries or geoms in ggplot2) with AoG.jl. First, we’ll discuss how to navigate AoG.jl and Makie.jl documentation. Then, we’ll proceed to show the most common plotting functions in AoG.jl:

  1. BarPlot
  2. Lines
  3. Errorbars
  4. Scatter
  5. BoxPlot
  6. Violin
  7. Contour
  8. Heatmap
Note

Some main visualizations are missing from this tutorial. These would be the Statistical Visualizations. They are covered in Plotting Statistical Visualizations with AlgebraOfGraphics.jl. Don’t forget to check it out.

1 📋 geom_*() - AoG.jl Table

The following table is a mapping of ggplot2’s geom_*() to AoG.jl’s plotting functions:

ggplot2 AoG.jl
geom_col() visual(BarPlot)
geom_point() visual(Scatter)
geom_line() visual(Lines)
geom_errorbar() visual(Errorbars)
geom_boxplot() visual(BoxPlot)
geom_violin() visual(Violin)
geom_label() visual(Annotations)
geom_text() visual(Annotations)
geom_contour() visual(Contour)
geom_tile() visual(Heatmap)
geom_bar() frequency()
geom_histogram() histogram()
geom_density() density()
geom_smooth() smooth()
geom_smooth(method = "lm") linear()
geom_area() linesfill()

2 🆘 How to Find Available Plotting Functions?

As you probably know, AoG.jl uses Makie.jl as the plotting engine for all visualizations. This has the consequence that all possible plotting objects (geometries) in AoG.jl are actually plotting types in Makie.jl.

So, for example, to plot a bar plot in AoG.jl you would have to call the BarPlot type from Makie.jl.

In AoG.jl you use the visual() function and then pass the desired Makie.jl plotting type along with all desired keyword arguments. So, the bar plot would be the following call for visual():

plt = data(...) * mapping(...) * visual(BarPlot; ...)

draw(plt)

There are two main ways to browse and obtain information regarding plotting types and custom arguments:

  1. Makie.jl Documentation: this is very useful and even AoG.jl’s documentation redirects to it.
  2. help_attributes() function and Docstrings: you can also see information from the Julia REPL (terminal) with the help_attributes() and by seeing the help information for Makie.jl’s plotting functions.

2.1 Makie Documentation

In Makie.jl’s documentation there is a rich description of the plotting functions. We encourage you to browse it and learn with the examples the several options available for every plotting type.

Caution

Note that Makie.jl’s plotting functions are all lowercase since they use the naming convention for functions. Instead, AoG.jl’s uses plotting types which are all TitleCase with the naming convention for types.

If you try to use visual() on the plotting functions, you’ll get an error. Instead, you need to use visual() on the plotting types.

For instance, this will error:

visual(barplot)

The correct way is:

visual(BarPlot)

Just remember that you would need to convert the plotting functions to plotting types when you pass it to the visual() function.

Note

This is the online documentation for the barplot() function from Makie.jl:


barplot(x, y; kwargs...)

Plots a barplot; y defines the height. x and y should be 1 dimensional. Bar width is determined by the attribute width, shrunk by gap in the following way: width -> width * (1 - gap).

Attributes

Available attributes and their defaults for MakieCore.Combined{Makie.barplot} are:

  bar_labels             "nothing"
  color                  RGBA{Float32}(0.0f0,0.0f0,0.0f0,0.6f0)
  color_over_background  MakieCore.Automatic()
  color_over_bar         MakieCore.Automatic()
  colormap               :viridis
  colorrange             MakieCore.Automatic()
  cycle                  [:color => :patchcolor]
  direction              :y
  dodge                  MakieCore.Automatic()
  dodge_gap              0.03
  fillto                 MakieCore.Automatic()
  flip_labels_at         Inf
  gap                    0.2
  highclip               MakieCore.Automatic()
  inspectable            true
  label_color            :black
  label_font             :regular
  label_formatter        Makie.bar_label_formatter
  label_offset           5
  label_rotation         0.0
  label_size             20
  lowclip                MakieCore.Automatic()
  marker                 GeometryBasics.HyperRectangle
  n_dodge                MakieCore.Automatic()
  nan_color              :transparent
  offset                 0.0
  stack                  MakieCore.Automatic()
  strokecolor            :black
  strokewidth            0
  transparency           false
  visible                true
  width                  MakieCore.Automatic()

2.2 help_attributes() and Docstrings

A nice helping hand with plotting functions, if you do not want to browse Makie.jl’s documentation is the help_attributes() function from any Makie.jl’s backend.

Let us show how it works, but first let’s load the default backend that we are using in these tutorials: CairoMakie.jl.

using CairoMakie

Here is an example with the barplot() plotting function:

help_attributes(barplot)
Available attributes for `MakieCore.Combined{Makie.barplot}` are: 

bar_labels color color_over_background color_over_bar colormap colorrange cycle direction dodge dodge_gap fillto flip_labels_at gap highclip inspectable label_color label_font label_formatter label_offset label_rotation label_size lowclip marker n_dodge nan_color offset stack strokecolor strokewidth transparency visible width

We can see that BarPlot, when used inside visual(), has a lot of keyword arguments for us to customize our bar plots.

2.2.1 Docstrings from help or ?

We can also check the docstrings from a specific plotting function by calling either the help() function on it or by using the help mode of the Julia REPL:

julia> ?

help?> barplot
Note

Also don’t forget to check AoG.jl’s Documentation. The tutorial and gallery are nice sections that showcase several use cases and possible customizations.

3 🎨 visual() function

The visual() function from AoG.jl is the function which we attribute plotting objects to our mapping()s in our data().

The most important argument to visual() is the first positional argument: the plotting type. Then the following keyword arguments are the same that the analogous Makie.jl plotting function’s available keyword arguments.

For example, the barplot() plotting function from Makie.jl supports the width keyword argument. That would be translate to the following visual() function call in AoG.jl:

visual(BarPlot; width = ...)

Let’s show some of the available plotting types (geometries) to the visual() function. But first, we begin by loading AoG.jl, data wrangling libraries and the DataFrame we’ve used previously:

using PharmaDatasets
using DataFramesMeta
using AlgebraOfGraphics

df = dataset("demographics_1")
first(df, 5)
5×6 DataFrame
Row ID AGE WEIGHT SCR ISMALE eGFR
Int64 Float64 Float64 Float64 Int64 Float64
1 1 34.823 38.212 1.1129 0 42.635
2 2 32.765 74.838 0.8846 1 126.0
3 3 35.974 37.303 1.1004 1 48.981
4 4 38.206 32.969 1.1972 1 38.934
5 5 33.559 47.139 1.5924 0 37.198

We will also do some columns transformations to CategoricalArrays:

Note

Don’t forget to check our Data Wrangling in Julia tutorials Handling Factors and Categorical Data with CategoricalArrays.jl.

using CategoricalArrays
@transform! df :SEX = categorical(:ISMALE);
@transform! df :SEX = recode(:SEX, 0 => "female", 1 => "male");
@transform! df :WEIGHT_cat = cut(:WEIGHT, 2; labels = ["light", "heavy"])

3.1 BarPlot

Let’s begin with the bar plot. Here the plotting type is BarPlot and the plotting function is barplot. So, we just call visual() and pass BarPlot as the first argument followed by any desired keyword arguments supported by the plotting function barplot().

Here is an example with our dataset df. Notice that we need to first group the data with the @by macro and then apply the mean() function from Julia’s standard library Statistics module:

Note

There is an easier way to automatically perform grouping and summarizing in AoG.jl with statistical transformation functions. We will cover this in Plotting Statistical Visualizations with AlgebraOfGraphics.jl. Make sure to check it out.

using Statistics
data(@by df :SEX :AGE_MEAN = mean(:AGE)) * mapping(:SEX, :AGE_MEAN) * visual(BarPlot) |>
draw

We can customize the specified plotting object in visual() by adding supported keyword arguments.

If we would like to make our bars blue and a little bit less wide we can use the color and width arguments:

data(@by df :SEX :AGE_MEAN = mean(:AGE)) *
mapping(:SEX, :AGE_MEAN) *
visual(BarPlot; color = :blue, width = 0.5) |> draw

Here is a more complex example using color and dodge for the column :WEIGHT_cat inside mapping():

data(@by df [:WEIGHT_cat, :SEX] :AGE_MEAN = mean(:AGE)) *
mapping(:SEX, :AGE_MEAN; color = :WEIGHT_cat, dodge = :WEIGHT_cat) *
visual(BarPlot) |> draw

Tip

Note that the color mapping will override the color keyword argument inside a visual() call. For custom colors, which we will cover in Customization of AlgebraOfGraphics.jl Plots, it is better to use the palette argument inside draw[!]() function.

3.2 Lines

Lines creates a line plot with the specified data() and mapping()s.

It is analogous to ggplot2’s geom_line().

For the line plot, we will use some concentration-time pharmacokinetic data after oral administration. This plot is known as spaghetti plot.

Tip

Line plots implicitly indicate a dependence of an observation with previous ones. This dependence makes line plots perfect for time series data and other time-dependent visualizations. But for data that do not have a time-dependency, or any other x-axis dependency, line plots might convey an intuition that is not the objective of the visualization.

pk = dataset("pumas_tutorials/po_sd_1")
first(pk, 5)
5×9 DataFrame
Row id time cp dv amt evid cmt rate dosegrp
Int64 Float64 Float64? Float64? Float64? Int64 Int64 Float64 Int64
1 1 0.0 missing missing 10.0 1 1 0.0 10
2 1 0.25 20.2592 22.6353 missing 0 2 0.0 10
3 1 0.5 36.8068 16.5712 missing 0 2 0.0 10
4 1 0.75 50.2838 60.8928 missing 0 2 0.0 10
5 1 1.0 61.2211 46.8858 missing 0 2 0.0 10

Here is a simple plot for the PK data for one subject using the positional x-axis, y-axis, and color arguments from mapping. We reduce the dataset to the first 10 ids so our plot’s legend doesn’t overflow.

Note

We are removing missing values from the pk dataset and also filtering only to 10 observations so that the legend does not overflow.

dropmissing!(pk, :cp);
pk_ids = @rsubset(pk, :id <= 10);
Note

We are using nonnumeric() inside mapping() to tell AoG.jl that the column :id, despite being an integer column, should be treated as discrete/categorical, i.e. non-numeric.

This will be covered in Customization of AlgebraOfGraphics.jl Plots.

data(pk_ids) *
mapping(:time, :cp; color = :id => nonnumeric) *
visual(Lines; alpha = 0.5) |> draw

Tip

To draw this visualization without the legend, you can call the mutating function draw!(). Whereas draw() automatically adds colorbars and legends, draw!() does not. Colorbar and legend, should they be necessary, can be added separately to the visualization with the colorbar!() and legend!() helper functions.

We’ll cover customizations in Customization of AlgebraOfGraphics.jl Plots. Don’t forget to check it out.

Here’s how the code would look like without the legend:

fig = Figure()
plt = data(pk) * mapping(:time, :cp; color = :id => nonnumeric) * visual(Lines)
draw!(fig, plt)
fig

3.3 Errorbars

Errorbars creates vertical interval lines, commonly used to represent data variability or uncertainty.

It is analogous to ggplot2’s geom_errorbar()

Let’s use the same example as before, but this time we are interested in finding the mean concentration-time profile for each dose level. We will also use the standard deviation as our measure of variability.

Note

We are using @by macro again to group the data by dosegrp and time. Also, we will use the mean() function to calculate the mean concentration and the std() function to determine the standard deviation. These functions are available in the Statistics module in Julia’s standard library.

Don’t forget to check our Data Wrangling in Julia tutorial Manipulating Tables with DataFramesMeta.jl for a more in-depth explanation on the use of DataFramesMeta.jl’s macros

pk_error = @by pk [:dosegrp, :time] begin
    :Cmean = mean(:cp)
    :Cstd = std(:cp)
end
first(pk_error, 5)
5×4 DataFrame
Row dosegrp time Cmean Cstd
Int64 Float64 Float64 Float64
1 10 0.25 31.3022 13.3897
2 10 0.5 54.484 21.9751
3 10 0.75 71.6505 27.4859
4 10 1.0 84.3257 30.9806
5 10 2.0 108.478 34.991

Now we can plot the error bars. In this case, the mapping function will take three positional arguments: x position (time), y position (mean concentration), and the error bar length (standard deviation):

data(pk_error) *
mapping(:time, :Cmean, :Cstd, color = :dosegrp => nonnumeric) *
visual(Errorbars) |> draw

Tip

Notice that Errorbars only generates the vertical interval lines. It is a common practice to show error bars together with other data visualization methods, such as bar and line plots. You can achieve this by adding two layers with the + operator, which would look like this for a line plot:

data(pk_error) *
mapping(:time, :Cmean, :Cstd, color = :dosegrp => nonnumeric) *
(visual(Errorbars) + visual(Lines)) |> draw

You can learn more about combining layers with the + operator by checking our tutorial on Grammar of Graphics with AlgebraOfGraphics.jl

3.4 Scatter

Scatter is the plotting type for scatter plots and is analogous to ggplot2’s geom_point():

data(pk_ids) * mapping(:time, :cp; color = :id => nonnumeric) * visual(Scatter) |> draw

There are some interesting keyword arguments for Scatter if you type in a Julia REPL help_attributes(scatter) or ?scatter. For example, you can choose a marker type and markersize:

data(pk_ids) *
mapping(:time, :cp; color = :id => nonnumeric) *
visual(Scatter; marker = '+', markersize = 25, alpha = 0.5) |> draw

3.5 ScatterLines

ScatterLines is the fusion of Scatter and Lines. So every keyword argument from both of them will be available.

It is similar to applying in ggplot2 the following geometries: geom_line() + geom_point().

Let’s plot our previous Lines example using ScatterLines:

data(pk_ids) *
mapping(:time, :cp; color = :id => nonnumeric) *
visual(ScatterLines; alpha = 0.5) |> draw

3.6 BoxPlot

Box plots are the statistician’s favorite plots. Here is a simple box plot using BoxPlot inside visual():

data(df) * mapping(:SEX, :AGE) * visual(BoxPlot) |> draw

You can use some keyword arguments for BoxPlot, such as:

  • show_notch: whether or not to have a notch near the median.
  • range: the inter-quartile range (IQR), default 1.5.
  • whiskerwidth: if you want to have a small horizontal end in the whiskers relative to the box width.
  • show_outliers: whether or not to show outliers as points, default true.
data(df) *
mapping(:SEX, :AGE) *
visual(BoxPlot; show_notch = true, range = 1, whiskerwidth = 0.25) |> draw

3.7 Violin

Violin plots are also popular and a good alternative to box plots. Instead of being based in median, quartiles and IQR, violin plots display the actual probability density of the underlying values (using a kernel density estimator).

Here is the same box plot example, but now using a Violin inside visual(). It shows much more visual information than the box plot:

data(df) * mapping(:SEX, :AGE) * visual(Violin) |> draw

As with BoxPlot, Violin has some interesting keyword arguments. The most important is show_median, which tells AoG.jl whether to show or not the median inside the violins:

data(df) * mapping(:SEX, :AGE) * visual(Violin; show_median = true) |> draw

Violin also accepts a side inside mapping() which breaks the violin plot into two sides: left and right.

If you pair side with color you can convey more information in your violin plots:

data(df) *
mapping(:SEX, :AGE; side = :WEIGHT_cat, color = :WEIGHT_cat) *
visual(Violin; show_median = true) |> draw

Note

We will cover more geometries and plotting types in Plotting Statistical Visualizations with AlgebraOfGraphics.jl which are frequently paired with statistical visualizations.