Wide Data with `AlgebraOfGraphics.jl`

Authors

Jose Storopoli

Juan Oneto

In this tutorial we will focus on dealing with wide data directly in `AlgebraOfGraphics.jl`.

`AoG.jl` has some nice functionalities to handle wide data very easily.

To start letâ€™s load `CSV.jl` and `DataFramesMeta.jl` and read a wide dataset into a `DataFrame`:

``````using PharmaDatasets
using DataFramesMeta
wide_df = dataset("pumas_tutorials/wide_data")``````
15Ã—6 DataFrame
Row IDS 0.0 30 min 1 hrs 2 hrs 4 hrs
String15 Float64 Float64 Float64 Float64 Float64
1 ID001_S001 17.9 16.5 15.5 10.2 17.4
2 ID001_S002 11.9 13.7 11.7 16.0 16.9
3 ID001_S003 14.5 11.0 17.4 10.4 14.1
4 ID002_S001 18.9 18.3 13.6 16.8 18.2
5 ID002_S002 12.1 11.8 11.5 12.6 17.8
6 ID002_S003 10.5 17.4 18.3 19.7 18.3
7 ID003_S001 18.4 18.4 14.9 13.4 18.0
8 ID003_S002 13.1 17.2 10.6 10.2 17.5
9 ID003_S003 18.5 10.1 19.1 10.1 15.2
10 ID004_S001 15.3 18.9 18.3 15.5 17.2
11 ID004_S002 10.5 18.9 10.3 15.6 15.9
12 ID004_S003 16.4 19.2 15.5 19.1 17.6
13 ID005_S001 11.1 16.1 18.0 19.2 17.7
14 ID005_S002 12.4 16.7 11.9 13.0 14.4
15 ID005_S003 10.2 11.4 15.3 19.9 13.8

As you can see, `wide_df` has a column of `:IDS` which represents IDs and Subjects separated by an underline (`_`), e.g. `ID001_S002` is ID 1 and Subject 2. Also, we have 4 columns which should be in a more tidy format:

• `0.0`: Dosage at initial time.
• `30 min`: Dosage at 30 minutes.
• `1 hrs`: Dosage at 1 hour.
• `2 hrs`: Dosage at 2 hours.
• `4 hrs`: Dosage at 4 hours.
Note

We cover pivoting data in Reshaping `DataFrame`s in our Data Wrangling in Julia tutorials. Donâ€™t forget to check it out.

Donâ€™t worry about those columns, `AoG.jl` can handle them just fine. Speaking of `AoG.jl`, letâ€™s load it with `CairoMakie.jl` as backend

``````using CairoMakie
using AlgebraOfGraphics``````

1 ðŸŒŒ The `dims()` arguments inside `mapping()` function

`AoG.jl` deals with wide data by passing the `dims()` function to keyword arguments inside the `mapping()` function. Inside the `dims()` you input an integer which represents which `mapping()`â€™s positional argument youâ€™ll want to pivot. For example, `dims(1)` will pivot the first position argument inside `mapping()`.

Letâ€™s showcase `dims()` usage. As a start, weâ€™ll create `labels` to hold our not so tidy columns:

``labels = ["0.0", "30 min", "1 hrs", "2 hrs", "4 hrs"]``
``````5-element Vector{String}:
"0.0"
"30 min"
"1 hrs"
"2 hrs"
"4 hrs"``````

Weâ€™ll pass a range of columns `2:6` which represents all the columns between the 2nd column (`0.0`) and the 6th column (`4 hrs`) as the first positional argument inside `mapping()`.

Tip

Note that we need to vectorize the `=>` operator since we are inputting a range/vector of columns. Thus we use the `.=>` vectorized pair syntax inside `mapping()`.

Next, we pass the `dims(1)` to `color` keyword argument. This tells `AoG.jl` to use the first positional argument as the `color` mapping. We also continue the pair syntax inside `mapping()` with a `renamer()` and cleverly reusing our `labels` list of columns.

Note

Note that `dims()` will use the `n`th positional argument inside `mapping()`. For example:

1. `dims(1)` will use the first argument of `mapping(:x, :y, :z)`, that is `:x`.
2. `dims(2)` will use the second argument of `mapping(:x, :y, :z)`, that is `:y`.
3. `dims(3)` will use the third argument of `mapping(:x, :y, :z)`, that is `:z`.
``````data(wide_df) *
mapping(2:6 .=> "Dosage") *
mapping(; color = dims(1) => renamer(labels) => "Time") *
AlgebraOfGraphics.density() |> draw``````

As you can see `AoG.jl` used the 5 columns as both the first positional argument and, thanks to `dims(1)`, as the `color` keyword argument. No need to pivot!

Now, letâ€™s show a more complex example of a bar plot using the same `dims(1)` also as the `dodge` keyword argument with a faceting by `:IDS`:

``````data(wide_df) *
mapping(labels .=> "Dosage") *
mapping(;
color = dims(1) => renamer(labels) => "Time",
dodge = dims(1) => renamer(labels) => "Time",
layout = :IDS,
) *
visual(BarPlot) |> draw``````

This is still not ideal, we can split the `IDXXX`s and `SXXX`s inside the column `:IDS` and have two columns: one for `:IDS` and other for `:SUBJS`.

This is accomplished with a `@rtransform` macro combined with an `@astable` macro:

``````split_df = @rtransform wide_df @astable begin
split_ids = split(:IDS, '_')
:IDS = first(split_ids)
:SUBJS = last(split_ids)
end``````
15Ã—7 DataFrame
Row IDS 0.0 30 min 1 hrs 2 hrs 4 hrs SUBJS
SubStrinâ€¦ Float64 Float64 Float64 Float64 Float64 SubStrinâ€¦
1 ID001 17.9 16.5 15.5 10.2 17.4 S001
2 ID001 11.9 13.7 11.7 16.0 16.9 S002
3 ID001 14.5 11.0 17.4 10.4 14.1 S003
4 ID002 18.9 18.3 13.6 16.8 18.2 S001
5 ID002 12.1 11.8 11.5 12.6 17.8 S002
6 ID002 10.5 17.4 18.3 19.7 18.3 S003
7 ID003 18.4 18.4 14.9 13.4 18.0 S001
8 ID003 13.1 17.2 10.6 10.2 17.5 S002
9 ID003 18.5 10.1 19.1 10.1 15.2 S003
10 ID004 15.3 18.9 18.3 15.5 17.2 S001
11 ID004 10.5 18.9 10.3 15.6 15.9 S002
12 ID004 16.4 19.2 15.5 19.1 17.6 S003
13 ID005 11.1 16.1 18.0 19.2 17.7 S001
14 ID005 12.4 16.7 11.9 13.0 14.4 S002
15 ID005 10.2 11.4 15.3 19.9 13.8 S003
Note

We cover transformations and the `@astable` macro in Manipulating Tables with `DataFramesMeta.jl` in our Data Wrangling in Julia tutorials. Donâ€™t forget to check it out.

Now we can repeat the same code as before but facetting our `row`s by `:IDS` and our `col`s by `:SUBJS`. This is a much better visualization and showcase the full power of `AoG.jl`â€™s functionality:

``````data(split_df) *
mapping(labels .=> "Dosage") *
mapping(;
color = dims(1) => renamer(labels) => "Time",
dodge = dims(1) => renamer(labels) => "Time",
row = :IDS,
col = :SUBJS,
) *
visual(BarPlot) |> draw``````

We can also combine a `mapping()` layer that has any `dims()` in them with any other layer. For example, if we use the `linear()` transformation from Tutorial 3 - Plotting Statistical Visualizations with `AlgebraOfGraphics.jl`, it just works.

This next plot compares the column `2` (`0.0`, the initial dosage) in the `x`-axis (as the first positional argument inside `mapping()`) with the other remaining columns `3:6` (the subsequent dosage measurements) in the `y`-axis (the second positional argument inside `mapping()`) by using a `Scatter` transformation inside `visual()` along with a `linear()` transformation:

``````data(wide_df) *
mapping(
2 => "Initial Dosage",
3:6 .=> "After Dosage";
color = dims(1) => renamer(labels[2:end]),
) *
(linear() + visual(Scatter)) |> draw``````

Thereâ€™s nothing special about `dims(1)`: it just tells `AoG.jl` to use the first positional argument in `mapping()`. We can do an example where we use both `dims(1)` and `dims(2)` inside `row` and `col` `mapping()`â€™s keywords arguments, respectively.

The first positional argument is the first two columns of `wide_df`: `0.0` and `30 min`. The second positional argument is the subsequent two columns: `1 hrs` and `2 hrs`.

``````data(wide_df) *
mapping(["0.0", "30 min"], ["2 hrs" "1 hrs"]; col = dims(1), row = dims(2)) *
(linear() + visual(Scatter)) |> draw``````
Tip

Note that we are using the common 1-D array (vector) as the first positional argument, but we are using a row vector (without the `,`) as the second positional argument.

This is because `AoG.jl` will combine them, and since we want the outer product of these vectors, we have to use one of them as a row vector. Also the elements inside the row vector are ordered different because of such operation.

2 ðŸŒ” `facet` arguments: `link[x|y]axes`

We can control how x- and y-axes behave while faceting. This is done with the `facet` keyword inside the `draw()` function.

To pass keyword arguments to customize `facet`â€™s attributes, you need to pass a `NamedTuple` of the desired keyword arguments to `draw()` via:

``draw(...; facet = (; keyword_1 = value_1, keyword_2 = value_2))``

To begin, letâ€™s create a `Layer`/`Layers` object for us to call inside `draw()`. We use the same visualization specifications as above:

``````plt =
data(wide_df) *
mapping(["0.0", "30 min"], ["2 hrs" "1 hrs"]; col = dims(1), row = dims(2)) *
(linear() + visual(Scatter));``````

`link[x|y]axes` takes three options:

1. `:all`: links all axes.
2. `:none`: unlinks all axes.
3. `:minimal`: links x-axes in each column / y-axes in each row

`AoG.jl`â€™s default is `:all` for both x and y:

``draw(plt; facet = (; linkxaxes = :all, linkyaxes = :all))``

Now, the second option: fully unlinked axes can be specified with `:none`. Notice that now all 4 facets have their own individual x- and y-axis:

``draw(plt; facet = (; linkxaxes = :none, linkyaxes = :none))``

Now the `:minimal` option, you can see how rows 1 and 2 have different y-axes and vice versa for the x-axis:

``draw(plt; facet = (; linkxaxes = :minimal, linkyaxes = :minimal))``