Wide Data with AlgebraOfGraphics.jl

Authors

Jose Storopoli

Juan Oneto

In this tutorial we will focus on dealing with wide data directly in AlgebraOfGraphics.jl.

AoG.jl has some nice functionalities to handle wide data very easily.

To start let’s load CSV.jl and DataFramesMeta.jl and read a wide dataset into a DataFrame:

using PharmaDatasets
using DataFramesMeta
wide_df = dataset("pumas_tutorials/wide_data")
15×6 DataFrame
Row IDS 0.0 30 min 1 hrs 2 hrs 4 hrs
String15 Float64 Float64 Float64 Float64 Float64
1 ID001_S001 17.9 16.5 15.5 10.2 17.4
2 ID001_S002 11.9 13.7 11.7 16.0 16.9
3 ID001_S003 14.5 11.0 17.4 10.4 14.1
4 ID002_S001 18.9 18.3 13.6 16.8 18.2
5 ID002_S002 12.1 11.8 11.5 12.6 17.8
6 ID002_S003 10.5 17.4 18.3 19.7 18.3
7 ID003_S001 18.4 18.4 14.9 13.4 18.0
8 ID003_S002 13.1 17.2 10.6 10.2 17.5
9 ID003_S003 18.5 10.1 19.1 10.1 15.2
10 ID004_S001 15.3 18.9 18.3 15.5 17.2
11 ID004_S002 10.5 18.9 10.3 15.6 15.9
12 ID004_S003 16.4 19.2 15.5 19.1 17.6
13 ID005_S001 11.1 16.1 18.0 19.2 17.7
14 ID005_S002 12.4 16.7 11.9 13.0 14.4
15 ID005_S003 10.2 11.4 15.3 19.9 13.8

As you can see, wide_df has a column of :IDS which represents IDs and Subjects separated by an underline (_), e.g. ID001_S002 is ID 1 and Subject 2. Also, we have 4 columns which should be in a more tidy format:

Note

We cover pivoting data in Reshaping DataFrames in our Data Wrangling in Julia tutorials. Don’t forget to check it out.

Don’t worry about those columns, AoG.jl can handle them just fine. Speaking of AoG.jl, let’s load it with CairoMakie.jl as backend

using CairoMakie
using AlgebraOfGraphics

1 🌌 The dims() arguments inside mapping() function

AoG.jl deals with wide data by passing the dims() function to keyword arguments inside the mapping() function. Inside the dims() you input an integer which represents which mapping()’s positional argument you’ll want to pivot. For example, dims(1) will pivot the first position argument inside mapping().

Let’s showcase dims() usage. As a start, we’ll create labels to hold our not so tidy columns:

labels = ["0.0", "30 min", "1 hrs", "2 hrs", "4 hrs"]
5-element Vector{String}:
 "0.0"
 "30 min"
 "1 hrs"
 "2 hrs"
 "4 hrs"

We’ll pass a range of columns 2:6 which represents all the columns between the 2nd column (0.0) and the 6th column (4 hrs) as the first positional argument inside mapping().

Tip

Note that we need to vectorize the => operator since we are inputting a range/vector of columns. Thus we use the .=> vectorized pair syntax inside mapping().

Next, we pass the dims(1) to color keyword argument. This tells AoG.jl to use the first positional argument as the color mapping. We also continue the pair syntax inside mapping() with a renamer() and cleverly reusing our labels list of columns.

Note

Note that dims() will use the nth positional argument inside mapping(). For example:

  1. dims(1) will use the first argument of mapping(:x, :y, :z), that is :x.
  2. dims(2) will use the second argument of mapping(:x, :y, :z), that is :y.
  3. dims(3) will use the third argument of mapping(:x, :y, :z), that is :z.
data(wide_df) *
mapping(2:6 .=> "Dosage") *
mapping(; color = dims(1) => renamer(labels) => "Time") *
AlgebraOfGraphics.density() |> draw

As you can see AoG.jl used the 5 columns as both the first positional argument and, thanks to dims(1), as the color keyword argument. No need to pivot!

Now, let’s show a more complex example of a bar plot using the same dims(1) also as the dodge keyword argument with a faceting by :IDS:

data(wide_df) *
mapping(labels .=> "Dosage") *
mapping(;
    color = dims(1) => renamer(labels) => "Time",
    dodge = dims(1) => renamer(labels) => "Time",
    layout = :IDS,
) *
visual(BarPlot) |> draw

This is still not ideal, we can split the IDXXXs and SXXXs inside the column :IDS and have two columns: one for :IDS and other for :SUBJS.

This is accomplished with a @rtransform macro combined with an @astable macro:

split_df = @rtransform wide_df @astable begin
    split_ids = split(:IDS, '_')
    :IDS = first(split_ids)
    :SUBJS = last(split_ids)
end
15×7 DataFrame
Row IDS 0.0 30 min 1 hrs 2 hrs 4 hrs SUBJS
SubStrin… Float64 Float64 Float64 Float64 Float64 SubStrin…
1 ID001 17.9 16.5 15.5 10.2 17.4 S001
2 ID001 11.9 13.7 11.7 16.0 16.9 S002
3 ID001 14.5 11.0 17.4 10.4 14.1 S003
4 ID002 18.9 18.3 13.6 16.8 18.2 S001
5 ID002 12.1 11.8 11.5 12.6 17.8 S002
6 ID002 10.5 17.4 18.3 19.7 18.3 S003
7 ID003 18.4 18.4 14.9 13.4 18.0 S001
8 ID003 13.1 17.2 10.6 10.2 17.5 S002
9 ID003 18.5 10.1 19.1 10.1 15.2 S003
10 ID004 15.3 18.9 18.3 15.5 17.2 S001
11 ID004 10.5 18.9 10.3 15.6 15.9 S002
12 ID004 16.4 19.2 15.5 19.1 17.6 S003
13 ID005 11.1 16.1 18.0 19.2 17.7 S001
14 ID005 12.4 16.7 11.9 13.0 14.4 S002
15 ID005 10.2 11.4 15.3 19.9 13.8 S003
Note

We cover transformations and the @astable macro in Manipulating Tables with DataFramesMeta.jl in our Data Wrangling in Julia tutorials. Don’t forget to check it out.

Now we can repeat the same code as before but facetting our rows by :IDS and our cols by :SUBJS. This is a much better visualization and showcase the full power of AoG.jl’s functionality:

data(split_df) *
mapping(labels .=> "Dosage") *
mapping(;
    color = dims(1) => renamer(labels) => "Time",
    dodge = dims(1) => renamer(labels) => "Time",
    row = :IDS,
    col = :SUBJS,
) *
visual(BarPlot) |> draw

We can also combine a mapping() layer that has any dims() in them with any other layer. For example, if we use the linear() transformation from Tutorial 3 - Plotting Statistical Visualizations with AlgebraOfGraphics.jl, it just works.

This next plot compares the column 2 (0.0, the initial dosage) in the x-axis (as the first positional argument inside mapping()) with the other remaining columns 3:6 (the subsequent dosage measurements) in the y-axis (the second positional argument inside mapping()) by using a Scatter transformation inside visual() along with a linear() transformation:

data(wide_df) *
mapping(
    2 => "Initial Dosage",
    3:6 .=> "After Dosage";
    color = dims(1) => renamer(labels[2:end]),
) *
(linear() + visual(Scatter)) |> draw

There’s nothing special about dims(1): it just tells AoG.jl to use the first positional argument in mapping(). We can do an example where we use both dims(1) and dims(2) inside row and col mapping()’s keywords arguments, respectively.

The first positional argument is the first two columns of wide_df: 0.0 and 30 min. The second positional argument is the subsequent two columns: 1 hrs and 2 hrs.

data(wide_df) *
mapping(["0.0", "30 min"], ["2 hrs" "1 hrs"]; col = dims(1), row = dims(2)) *
(linear() + visual(Scatter)) |> draw

Tip

Note that we are using the common 1-D array (vector) as the first positional argument, but we are using a row vector (without the ,) as the second positional argument.

This is because AoG.jl will combine them, and since we want the outer product of these vectors, we have to use one of them as a row vector. Also the elements inside the row vector are ordered different because of such operation.

2 🌔 facet arguments: link[x|y]axes

We can control how x- and y-axes behave while faceting. This is done with the facet keyword inside the draw() function.

To pass keyword arguments to customize facet’s attributes, you need to pass a NamedTuple of the desired keyword arguments to draw() via:

draw(...; facet = (; keyword_1 = value_1, keyword_2 = value_2))

To begin, let’s create a Layer/Layers object for us to call inside draw(). We use the same visualization specifications as above:

plt =
    data(wide_df) *
    mapping(["0.0", "30 min"], ["2 hrs" "1 hrs"]; col = dims(1), row = dims(2)) *
    (linear() + visual(Scatter));

link[x|y]axes takes three options:

  1. :all: links all axes.
  2. :none: unlinks all axes.
  3. :minimal: links x-axes in each column / y-axes in each row

AoG.jl’s default is :all for both x and y:

draw(plt; facet = (; linkxaxes = :all, linkyaxes = :all))

Now, the second option: fully unlinked axes can be specified with :none. Notice that now all 4 facets have their own individual x- and y-axis:

draw(plt; facet = (; linkxaxes = :none, linkyaxes = :none))

Now the :minimal option, you can see how rows 1 and 2 have different y-axes and vice versa for the x-axis:

draw(plt; facet = (; linkxaxes = :minimal, linkyaxes = :minimal))