Introduction to AlgebraOfGraphics.jl

Authors

Jose Storopoli

Juan Oneto

AlgebraOfGraphics.jl is a powerful package for plotting and data visualization. Its expressive syntax is based on principles similar to the grammar of graphics of the R package ggplot2.

In this set of tutorials we will learn how to do complex and custom plots easily with AlgebraOfGraphics.jl. In this tutorial, we will introduce the fundamentals of AlgebraOfGraphics.jl and give an overview of the Makie.jl plotting ecosystem, which is the base on which AlgebraOfGraphics.jl is built.

Tip

AlgebraOfGraphics.jl is a mouthful. So we will be using the alias AoG.jl which makes it less cumbersome.

1 📇 Comparison ggplot2 vs AoG.jl

action ggplot2 AoG.jl
Input data ggplot(df) data(df)
Map aesthetics aes(...) mapping(...)
Add geometries geom_*(...) visual(...)
Combine layers + *
Facetting facet_[wrap\|grid](~ column) mapping(...; [row\|col\|layout]=:column)
Customize scales scale_*_manual() renamer(...)
Themes theme_*(...) set_theme!(theme_*()); draw(plt)
Customize axes labels [x\|y]lab("...") draw(plt, axis=(; [x\|y]label="..."))
Customize color scale_[fill\|color]_*(...) draw(plt, palettes=(; color=...)) or visual(..., colormap=...)
Save plot ggsave("file.[png\|svg]") save("file.[png\|svg]", draw(plt))
Frequency geom_bar() or stat_count() frequency()
Histogram geom_histogram or stat_bin() histogram()
Density geom_density or stat_density() density()
Expectation/Mean stat_summary(fun = "mean") expectation()
Smooth trend stat_smooth or geom_smooth() (visual(...) + smooth())
Linear trend stat_smooth(method = "lm") or geom_smooth(method = "lm") (visual(...) + linear())
Log scale scale_[x\|y]_log10() draw(plt; axis=(; [x\|y]scale=log10))

2 đź’ľ Interface with Data: data() function

The first step with AoG.jl is to specify your data source. In Julia, there is a unifying data API provided by the Tables.jl package. A lot of packages and types in the Julia ecosystem are compatible with the Tables.jl data API. For instance, the DataFrame that we have been using so far is compatible with the Tables.jl data API.

AoG.jl can use any Tables.jl compatible type as input and you specify them with the data() function.

First, let’s import the PharamaDatasets.jl and DataFramesMeta.jl packages with the using statements and load our data using the dataset function:

using PharmaDatasets
using DataFramesMeta
df = dataset("demographics_1")
first(df, 5)
5Ă—6 DataFrame
Row ID AGE WEIGHT SCR ISMALE eGFR
Int64 Float64 Float64 Float64 Int64 Float64
1 1 34.823 38.212 1.1129 0 42.635
2 2 32.765 74.838 0.8846 1 126.0
3 3 35.974 37.303 1.1004 1 48.981
4 4 38.206 32.969 1.1972 1 38.934
5 5 33.559 47.139 1.5924 0 37.198

Now, we import AoG.jl and CairoMakie.jl as our Makie backend.

using AlgebraOfGraphics # big name, AoG
using CairoMakie
Note

CairoMakie.jl is a Makie.jl backend built on the Cairo open source graphics library. It is the default backend that we will use in our tutorials on data visualization. If you want to know more about the different Makie.jl backends that are available, their advantages, and which one you should use; check the end of this tutorial.

Now, if we call data() on our DataFrame named df we will have back an AoG.jl object of type Layer.

This object is where AoG.jl stores all the specifications of our intended visualization.

Of course, now it will only hold the visualization’s data and nothing more:

data(df)

This is similar to the following in R:

df  %>% ggplot()

3 🗺️ Specify Mappings: mapping function

The second step is to specify our mappings, also known as aesthetics or aes() from ggplot2.

This is done with AoG.jl’s mapping() function. It accepts 3 positional arguments and several keyword arguments which we will cover briefly. Let’s first focus on the 3 positional arguments. They represent the x, y and z axes of the plot:

  1. First Positional Argument: x-axis
  2. Second Positional Argument: y-axis
  3. Third Positional Argument: z-axis
Note

If you want to know more about functions, please check Tutorial Data Wrangling in Julia - Functions.

So for example, if we specify first the column :AGE followed by the column :WEIGHT, we would be asking AoG.jl to map :AGE to the x-axis and :WEIGHT to the y-axis.

Note that if we do not specify any “geometry”, AoG.jl will, by default, draw a scatter plot.

data(df) * mapping(:AGE, :WEIGHT) |> draw

Tip

I am using the draw() function but we have not yet covered it. Don’t worry for now. Just think that draw() renders our plot specifications into a backend (in our case CairoMakie.jl).

Notice that we are using the * multiplication operator. This is the primary operator to combine partially defined layers into a full visualization.

The * operator is also associative, which means that order does not matter. So, if we specify mapping() first then apply a multiplication operation * to the data(), we get back the same plot:

mapping(:AGE, :WEIGHT) * data(df) |> draw

Note

We will cover AoG.jl’s grammar in Grammar of Graphics with AlgebraOfGraphics.jl. Be sure to check it out.

3.1 mapping keyword arguments

Besides the 3 positional arguments, mapping() has several keyword arguments:

  1. color
  2. marker
  3. dodge
  4. stack
  5. col
  6. row
  7. layout

We’ll cover all of them below:

3.1.1 color

The first mapping() keyword argument we will cover is color which maps a column to a color to be displayed in the visualization.

For example if we specify the color argument the column :ISMALE we get the same scatter plot as before but now color is mapped to the :ISMALE column:

data(df) * mapping(:AGE, :WEIGHT; color = :ISMALE) |> draw

Since :ISMALE is an Int64 type of column, so AoG.jl will, by default, map it as a continuous color gradient. That is obviously not what we intend to display.

Let’s add :SEX column as a CategoricalArray of the :ISMALE column and do a little bit of recode():

using CategoricalArrays
@chain df begin
    @transform! :SEX = categorical(:ISMALE)
    @transform! :SEX = recode(:SEX, 0 => "female", 1 => "male")
end
first(df, 5)
5Ă—7 DataFrame
Row ID AGE WEIGHT SCR ISMALE eGFR SEX
Int64 Float64 Float64 Float64 Int64 Float64 Cat…
1 1 34.823 38.212 1.1129 0 42.635 female
2 2 32.765 74.838 0.8846 1 126.0 male
3 3 35.974 37.303 1.1004 1 48.981 male
4 4 38.206 32.969 1.1972 1 38.934 male
5 5 33.559 47.139 1.5924 0 37.198 female
data(df) * mapping(:AGE, :WEIGHT; color = :SEX) |> draw

Note

In order for AoG.jl to display values as categorical/factor/discrete instead of continuous, we need to use the function nonnumeric() for the desired mapping. We could also use the renamer() function.

We will cover more “helper” functions and advanced plot tweaks and customizations in Customization of AlgebraOfGraphics.jl Plots. Be sure to check it out.

3.1.2 marker

The marker keyword argument from mapping() will map the desired values to the marker geometry. It only accepts categorical/factor/discrete values:

data(df) * mapping(:AGE, :WEIGHT; marker = :SEX) |> draw

Caution

If you pass a column that is numeric, i.e. a column that AoG.jl will parse as “continuous” or “non-discrete”, you’ll get a nasty error.

# Error if you pass non-discrete values to marker argument
data(df) * mapping(:AGE, :WEIGHT; marker = :eGFR) |> draw
image/png showerror: image/png showerror
image/svg+xml showerror: image/svg+xml showerror

3.1.3 dodge

To show the dodge keyword argument, we will do a statistical visualization with the frequency() function. dodge can be used with the following geometries:

  • BoxPlot
  • BarPlot
  • Violin

Let’s use the :WEIGHT column to create a CategoricalArray with 3 levels using the cut() function and assign it to the :WEIGHT_CAT column:

@transform! df :WEIGHT_CAT = cut(:WEIGHT, 3; labels = ["light", "medium", "heavy"])
first(df, 5)
5Ă—8 DataFrame
Row ID AGE WEIGHT SCR ISMALE eGFR SEX WEIGHT_CAT
Int64 Float64 Float64 Float64 Int64 Float64 Cat… Cat…
1 1 34.823 38.212 1.1129 0 42.635 female light
2 2 32.765 74.838 0.8846 1 126.0 male heavy
3 3 35.974 37.303 1.1004 1 48.981 male light
4 4 38.206 32.969 1.1972 1 38.934 male light
5 5 33.559 47.139 1.5924 0 37.198 female medium

To show how dodge works let’s first create a frequency bar plot only with the color mapping as :SEX:

data(df) * mapping(:WEIGHT_CAT; color = :SEX) * frequency() |> draw

We get overlapping opaque bars, to arrange them side-by-side instead, we can pass the dodge mapping to the column :SEX:

data(df) * mapping(:WEIGHT_CAT; color = :SEX, dodge = :SEX) * frequency() |> draw

Note

We will cover all geometries in Plotting Different Geometries with AlgebraOfGraphics.jl.

In-depth statistical visualizations will be covered in Plotting Statistical Visualizations with AlgebraOfGraphics.jl.

Be sure to check both of them out.

3.1.4 stack

stack mapping is only available for bar plots. So let’s revisit the last example but instead of “dodging” the bars, we will stack them:

data(df) * mapping(:WEIGHT_CAT; color = :SEX, stack = :SEX) * frequency() |> draw

3.1.5 col

Have you ever done “facetting” in ggplot2? If you have, the next 3 keywords arguments, col, row, and layout, represent 3 different ways to do facetting on a plot.

First let’s facet our visualization using different columns. This is done with the col keyword argument inside mapping():

data(df) * mapping(:AGE, :eGFR; color = :SEX, col = :WEIGHT_CAT) |> draw

3.1.6 row

We can do the same facet as before but now using different rows with row:

data(df) * mapping(:AGE, :eGFR; color = :SEX, row = :WEIGHT_CAT) |> draw

3.1.7 layout

layout tells AoG.jl to facet with an automatic setting that best uses the available space. It is analogous to ggplot2’s facet_wrap() function. Here, we can have the previous plot, but now with a facetting that is optimized for a neutral aspect ratio:

data(df) * mapping(:AGE, :eGFR; color = :SEX, layout = :WEIGHT_CAT) |> draw

Tip

Notice that the axes are linked while facetting either with row, col or layout. We’ll explore ways to customize the axes behavior in Advanced Layouts with AlgebraOfGraphics.jl.

Be sure to check it out.

4 🖼️ draw[!]() function

The draw[!]() function in AoG.jl is responsible for passing all the plot specifications and customizations to the desired Makie.jl backend. In this notebook, the chosen backend was CairoMakie.jl.

Also, note that all of our draw() usage was by “piping” AoG.jl layers into it with the Julia’s |> pipe operator. This is fine if you do not need to specify arguments to the draw() function.

The draw() function has 3 keyword arguments used to customize either the axis, figure or palettes.

For example, here is an AoG.jl plot with custom axis, figure and palette specifications inside the draw() function:

plt = data(df) * mapping(:AGE, :WEIGHT; color = :SEX);
draw(
    plt;
    axis = (;
        title = "My Fancy Title",
        aspect = 4 / 3,
        ylabel = "KG",
        xticklabelrotation = π / 8,
    ),
    figure = (;
        resolution = (600, 600),
        figure_padding = 6,
        backgroundcolor = :pink,
        fontsize = 16,
    ),
    palettes = (; color = [:purple, :green]),
)

Note

There is also a draw!() function that alters an AoG.jl plot in-place. It is covered in Advanced Layouts with AlgebraOfGraphics.jl.

Don’t worry about the arguments inside draw(), we will be covering those extensively in Customization of AlgebraOfGraphics.jl Plots.

5 đź’ľ Saving Plots

In order to save plots, AoG.jl defines a new method for FileIO.jl’s save() function. The first argument is the filename with the desired extension/format, e.g. my_plot.png. The second argument is an AoG.jl plot (the one returned from draw()).

The resolution of a Makie Figure is in principle unitless until it is exported to a file, then the output depends on additional backend settings.

When you save a bitmap (.png), the resolution is converted to pixels using the px_per_unit setting of the CairoMakie backend. This is set to 1 by default, so a figure with resolution (800, 600) will be 800px wide and 600px high. If you set it to 2, you double the resolution without having to adjust font sizes, line widths, etc. This is similar to changing the dpi in other plotting packages, although it is technically different because Cairo does not actually adjust the dpi of the output image and without the dpi metadata, an image does not have a well-defined physical size. Therefore the “per inch” part of dpi settings is usually misleading.

Vector graphics, however, have a physical size by definition because the pt unit that they are specified in can be directly converted to inch or cm. The value pt_per_unit governs how the figure size is converted to pt when saving vector graphics. Its default is 0.75 (this causes png and svg files to be displayed with the same size in most browsers when saved with default settings).

For example, to save our plt image from above as a my_image.png file with 3 times the resolution of the underlying figure, we would call the following save() function:

save("my_image.png", draw(plt); px_per_unit = 3)

5.1 Supported Extensions

Different Makie.jl backends support different filetypes and extensions. Here is a complete list:

  • CairoMakie.jl: .svg, .pdf and .png
  • GLMakie.jl: .png
  • WGLMakie.jl: .png

6 🌎 Overview of the Plotting Makie.jl Ecosystem

Under the hood, AoG.jl uses a pure-Julia visualization backend named Makie.jl. We believe that Makie.jl is the present and future of plotting and visualizations in Julia.

Makie.jl itself uses different visualization backends under the hood. These backends are the barebones interfaces for rendering graphics. Currently (January 2022), Makie.jl supports 3 interfaces:

  1. OpenGL
  2. Cairo
  3. WebGL

Let’s talk about each one of them.

6.1 OpenGL with GLMakie.jl

The first interface is the OpenGL which stands for Open Graphics Library and was created in 1991. OpenGL can use the GPU and is managed by a non-profit technology consortium much the same as the majority of open source platforms, standards and technologies that a lot of other packages depends on.

Makie.jl has an interface to OpenGL with the package GLMakie.jl. GLMakie.jl will render your visualizations and plots in a standalone screen and allows for click, drag and zoom interactivity with the mouse. Notice that Visual Studio Code will not render the image if you use GLMakie.jl and that you won’t be able to take advantage of the interactivity in static rendered versions of Quarto documents, such as HTML or PDFs.

Note

If you played some PC videogames you are familiar with OpenGL and OpenCL.

To use OpenGL with GLMakie.jl you’ll need to load it with the using statement:

using GLMakie

6.2 Cairo with CairoMakie.jl

The second interface, CairoMakie.jl, is the Makie.jl interface to the Cairo open source graphics library.

Cairo was created in 2003 and is written in C. It is primarily used to render static, high-quality vector graphics visualizations.

If you use CairoMakie.jl, most of your visualizations will be rendered as static SVGs or PNGs. For example, Quarto will render the images in the output cell. Visual Studio Code will also render the images in a preview pane. However, if you use CairoMakie.jl in a Julia terminal you will not have the image rendered, but instead you’ll see an object printed in the terminal that represents the image building blocks, such as points, lines and shapes.

Note

If you used \(\LaTeX\), Gnuplot uses Cairo under the hood to render PDFs and PNGs files. R also uses Cairo for rendering output plots as PDFs and SVGs files. Finally, if you have seen some of the YouTube videos from the famous math communicator and entertainer 3Blue1Brown, his software uses Cairo under the hood as well.

To use Cairo with CairoMakie.jl you’ll need to load it with the using statement:

using CairoMakie

6.3 WebGL with WGLMakie.jl

The third interface, WGLMakie.jl, uses WebGL. WebGL is the “cousin” of OpenGL. It is a JavaScript GPU-accelerated API that renders graphics and is compatible with almost all web browsers available. It was initially released in 2011 and is managed by the same non-profit technology consortium that manages OpenGL.

WGLMakie.jl is still experimental, so beware that it might not work as intended. You can use WGLMakie.jl to get some of the interactivity that you would get in a standalone GLMakie.jl window, but it will not work in static rendered versions of Quarto documents, such as HTML or PDF versions.

Note

If you ever played games in your browser you’ll definitely have benefited from WebGL. There are a lot of notorious game engines that use WebGL, such as the Unreal Engine 4 and Unity.

To use WebGL with WGLMakie.jl you’ll need to load it with the using statement:

using WGLMakie

6.4 Which One to Use?

Now the question arises: which one shall I use?

To make it simple, our recommendations are:

  1. Always use CairoMakie.jl (Cairo backend). Most data communication and visualizations are still static, so prefer the Cairo backend for outstanding high-quality static images and plots.
  2. If you need interactivity, use GLMakie.jl, but beware that it creates stand-alone plot windows that can’t be displayed inline in a Visual Studio Code session or Quarto document. Additionally, if you like to code in terminal environments GLMakie.jl might be worth using.
  3. Avoid WGLMakie.jl and only use it if you need to do something really fancy.

7 ⏳ A note about Time To First Plot (TTFP)

Julia is a just-in-time (JIT) compiled language. Which means that it will generate binary code as it needs. This is great for a lot of things, but can be a challenge for others. One of such challenges is the notorious Time To First Plot (TTFP).

Since AoG.jl runs on Makie.jl; which in turn is a pure-Julia implementation some plots will take a while to render. This is because Makie.jl will JIT-compile everything in order to generate the first plot. After this, the following plots will be much faster to show.

This is somewhat disliked by users coming from R’s ggplot2. Since it is coded in C++, ggplot2 is not JIT-compiled, but Ahead-of-time (AOT) compiled.

Fortunately, the Julia community is focusing on resolving TTFP. In the near future, you can expect TTFP to reduce in every Julia and Makie.jl new versions. Eventually, TTFP will be negligible.