using SummaryTables
Creating tables with SummaryTables.jl
Pumas’ SummaryTables.jl
is a Julia package that generates publication-ready summary tables for data analysis and reporting. In this tutorial, we will be going over some of the fundamental features of the package and explore how to use it to create informative tables with summary statistics, group data, and much more.
1 Libraries
Of course, we will need to begin by importing SummaryTables.jl
:
As for our data, we will be using PharmaDatasets.jl
as our source and DataFramesMeta.jl
for some data wrangling:
using PharmaDatasets
using DataFrames
using DataFramesMeta
Lastly, we need to import StatsBase.jl
to have access to the summary statistics functions that we will be using later on:
using StatsBase
2 Load data
We will load our dataset with PharmaDatasets.jl
’s dataset()
function:
= dataset("nca/dapa_IV_ORAL")
df first(df, 5)
Row | ID | TIME | TAD | COBS | AMT | OCC | AGE | WEIGHT | GENDER | FORMULATION | DOSE |
---|---|---|---|---|---|---|---|---|---|---|---|
Int64 | Float64 | Float64 | Float64 | Int64? | Int64 | Int64 | Float64 | Int64 | String7 | Int64? | |
1 | 1 | 0.0 | 0.0 | 157.021 | 5000 | 1 | 44 | 70.5 | 0 | IV | 5000 |
2 | 1 | 0.05 | 0.05 | 141.892 | missing | 1 | 44 | 70.5 | 0 | IV | missing |
3 | 1 | 0.35 | 0.35 | 116.228 | missing | 1 | 44 | 70.5 | 0 | IV | missing |
4 | 1 | 0.5 | 0.5 | 109.353 | missing | 1 | 44 | 70.5 | 0 | IV | missing |
5 | 1 | 0.75 | 0.75 | 66.4814 | missing | 1 | 44 | 70.5 | 0 | IV | missing |
You probably noticed that this dataset contains both IV and oral formulations. Here we will only consider the IV
formulation.
Also, the GENDER
column has been encoded as 0
for males and 1
for females, so we would like to transform this column to use these more meaningful values instead of 0
and 1
:
= @chain df begin
df_iv @rsubset :FORMULATION == "IV"
@rtransform :GENDER = :GENDER == 0 ? "Male" : "Female"
end;
Make sure to check our tutorial on Manipulating Tables with DataFramesMeta.jl
for a more detailed look into DataFramesMeta.jl
’s macros for data wrangling
3 📋 Listing tables
As their name suggests, listing tables are used to list raw values from a dataset, which can be a useful starting point for reporting and further analyses.
In order to create a listing table in SummaryTables.jl
, we use the listingtable
function, which takes the following positional arguments:
- A table: in this case
df_iv
, which is aDataFrame
. - A variable: the variable whose raw values will be listed.
In addition to that, listingtable
supports keyword arguments for grouping columns and rows (cols
and rows
, respectively). Let’s work on an example to see how it works:
Our documentation on SummaryTables.jl
contains a comprehensive list of all keyword arguments available for listingtable
and all the other functions used in this tutorial.
As an example, let’s try to generate a listing table containing the concentration measurements for each of our subjects, which are in the COBS
column of our DataFrame
:
listingtable(df_iv, :COBS; cols = :TIME, rows = :ID)
TIME | ||||||||||||||||
0 | 0.05 | 0.35 | 0.5 | 0.75 | 1 | 2 | 3 | 4 | 6 | 8 | 10 | 12 | 16 | 20 | 24 | |
ID | COBS | |||||||||||||||
1 | 157 | 142 | 116 | 109 | 66.5 | 74.8 | 39.2 | 25.4 | 13 | 3.81 | 1.47 | 1.11 | 0.911 | 0.83 | 0.624 | 0.654 |
2 | 59.8 | 66.4 | 55.5 | 59 | 55.8 | 53.7 | 38.9 | 31 | 24.2 | 15.9 | 10.7 | 7.33 | 5.83 | 3.3 | 2.32 | 1.72 |
3 | 166 | 130 | 127 | 97.8 | 86.6 | 81.9 | 35.8 | 22.3 | 12.8 | 6.47 | 4.98 | 3.38 | 3.33 | 2.69 | 2.22 | 2.04 |
4 | 134 | 124 | 112 | 122 | 83.4 | 78.4 | 48 | 31.6 | 18.7 | 11 | 6.01 | 5.02 | 3.31 | 3.32 | 2.45 | 2.3 |
5 | 94.9 | 94.4 | 80.7 | 60.4 | 54.4 | 46.1 | 17.8 | 5.78 | 3.51 | 1.06 | 0.556 | 0.537 | 0.464 | 0.359 | 0.295 | 0.204 |
6 | 57.6 | 56.7 | 61.8 | 56.7 | 50.5 | 52.6 | 34.7 | 28.9 | 25.9 | 16.7 | 11.4 | 6.88 | 4.66 | 2.98 | 2.01 | 1.75 |
7 | 81.5 | 70.8 | 81 | 78.7 | 69.3 | 59.8 | 39.5 | 35.1 | 25.7 | 12.6 | 9.81 | 6.23 | 6.54 | 5.14 | 3.52 | 3.06 |
8 | 113 | 96.5 | 92.6 | 75.8 | 67.9 | 68.9 | 41.2 | 23.2 | 16.8 | 8.01 | 4.59 | 3.56 | 2.27 | 1.87 | 1.41 | 1.13 |
9 | 111 | 123 | 102 | 90.3 | 79.2 | 63.1 | 38.1 | 21.8 | 9.74 | 5.21 | 3.34 | 2.23 | 1.51 | 1.11 | 0.701 | 0.694 |
10 | 68.4 | 59.2 | 54.7 | 55.3 | 56.9 | 47.3 | 31 | 24 | 21.5 | 11.3 | 6.54 | 4.51 | 3.45 | 2.63 | 1.99 | 1.38 |
11 | 111 | 106 | 95.6 | 62.1 | 60.9 | 42.6 | 17.5 | 7.57 | 4.11 | 1.68 | 0.929 | 0.851 | 0.7 | 0.51 | 0.357 | 0.36 |
12 | 106 | 119 | 84.9 | 87.3 | 69.6 | 63.1 | 43.3 | 28.3 | 19.7 | 11.4 | 7.59 | 5.52 | 4.38 | 3.71 | 2.66 | 2.63 |
13 | 158 | 130 | 97.5 | 89.1 | 85.1 | 79.3 | 60.3 | 41.6 | 26 | 17.8 | 11.5 | 9.59 | 7.18 | 5.67 | 4.58 | 4.26 |
14 | 103 | 133 | 116 | 87.9 | 82.8 | 67 | 38.7 | 17.3 | 12 | 4.17 | 3.15 | 2.54 | 2.14 | 1.76 | 1.19 | 1.11 |
15 | 95.1 | 84.9 | 79.8 | 67.1 | 74.8 | 75.9 | 45.9 | 31.9 | 23.4 | 11.5 | 6.57 | 4.23 | 4.72 | 2.74 | 2.38 | 2.13 |
16 | 88.4 | 108 | 88 | 80.6 | 68.8 | 49.3 | 27.5 | 15.3 | 9.15 | 4.47 | 2.7 | 2.18 | 1.68 | 1.48 | 0.987 | 0.776 |
17 | 194 | 134 | 96.7 | 108 | 92.7 | 87.1 | 44.7 | 24.4 | 16.4 | 8.45 | 5.65 | 5.74 | 4.21 | 2.97 | 2.14 | 1.7 |
18 | 114 | 110 | 82.7 | 79.8 | 80.6 | 53.9 | 29.1 | 17.9 | 7.71 | 3.04 | 1.04 | 0.706 | 0.652 | 0.54 | 0.443 | 0.418 |
19 | 318 | 274 | 189 | 177 | 132 | 93 | 33.3 | 22.1 | 14.5 | 9.95 | 8 | 6.73 | 5.32 | 4.68 | 3.06 | 2.35 |
20 | 57.2 | 53.4 | 46.2 | 55 | 38 | 36.2 | 35.7 | 24.7 | 16.8 | 9.46 | 6.19 | 3.59 | 2.03 | 0.954 | 0.794 | 0.622 |
21 | 47.2 | 46.8 | 42.4 | 37.8 | 40.3 | 40.5 | 25.9 | 26 | 18.4 | 11 | 7.51 | 5.4 | 4.38 | 2.82 | 1.72 | 1.11 |
22 | 86.8 | 102 | 86.9 | 82.5 | 68.8 | 64.5 | 44.9 | 37.7 | 25.1 | 17.5 | 10.8 | 7.87 | 6.97 | 4.2 | 3.56 | 3.29 |
23 | 77.3 | 66.5 | 72.1 | 62.9 | 57 | 50.3 | 40 | 29.8 | 18.9 | 11.7 | 7.18 | 5.16 | 3.26 | 2.58 | 2.01 | 1.6 |
24 | 66.6 | 74.4 | 54.4 | 48.8 | 47.4 | 39.1 | 26.3 | 17.4 | 11.2 | 3.86 | 2.12 | 1.19 | 0.833 | 0.755 | 0.629 | 0.546 |
As you can see, Quarto
will automatically display the table that we just generated. If you are interested in taking your results elsewhere, make sure to check our section on generating output later on in this tutorial to learn about the available formats to export your table results.
Let’s stop here for a second to understand what we just did.
Since we wanted to list our concentration values, we passed COBS
as our second positional argument. However, if we had tried to run listingtable(df_iv, :COBS)
we would have gotten an error saying that there are too many rows:
listingtable(df_iv, :COBS)
SummaryTables.TooManyRowsError: TooManyRowsError: Found a group which has more than one value. This is not allowed, only one value of "COBS" per table cell may exist.
384×1 DataFrame
Row │ COBS
│ Float64
─────┼────────────
1 │ 157.021
2 │ 141.892
3 │ 116.228
4 │ 109.353
5 │ 66.4814
6 │ 74.7532
7 │ 39.1933
8 │ 25.4495
⋮ │ ⋮
378 │ 3.86121
379 │ 2.11964
380 │ 1.19236
381 │ 0.832999
382 │ 0.755442
383 │ 0.628669
384 │ 0.546453
369 rows omitted
Filter your dataset or use additional row or column grouping factors.
The following columns in the dataset are not uniform in this group and could potentially be used: ["ID", "TIME", "TAD", "AMT", "AGE", "WEIGHT", "GENDER", "DOSE"].
TooManyRowsError: Found a group which has more than one value. This is not allowed, only one value of "COBS" per table cell may exist.
384×1 DataFrame
Row │ COBS
│ Float64
─────┼────────────
1 │ 157.021
2 │ 141.892
3 │ 116.228
4 │ 109.353
5 │ 66.4814
6 │ 74.7532
7 │ 39.1933
8 │ 25.4495
⋮ │ ⋮
378 │ 3.86121
379 │ 2.11964
380 │ 1.19236
381 │ 0.832999
382 │ 0.755442
383 │ 0.628669
384 │ 0.546453
369 rows omitted
Filter your dataset or use additional row or column grouping factors.
The following columns in the dataset are not uniform in this group and could potentially be used: ["ID", "TIME", "TAD", "AMT", "AGE", "WEIGHT", "GENDER", "DOSE"].
Stacktrace:
[1] _listingtable(df::DataFrame, variable::SummaryTables.Variable, rowgroups::Vector{SummaryTables.Group}, colgroups::Vector{SummaryTables.Group}, rowsummary::SummaryTables.Summary, colsummary::SummaryTables.Summary; variable_header::Bool, sort::Bool, celltable_kws::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ SummaryTables ~/_work/PumasTutorials.jl/PumasTutorials.jl/custom_julia_depot/packages/SummaryTables/vabQ8/src/table.jl:201
[2] _listingtable
@ ~/_work/PumasTutorials.jl/PumasTutorials.jl/custom_julia_depot/packages/SummaryTables/vabQ8/src/table.jl:168 [inlined]
[3] listingtable(table::DataFrame, variable::Symbol; rows::Vector{Any}, cols::Vector{Any}, summarize_rows::Vector{Any}, summarize_cols::Vector{Any}, variable_header::Bool, celltable_kws::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ SummaryTables ~/_work/PumasTutorials.jl/PumasTutorials.jl/custom_julia_depot/packages/SummaryTables/vabQ8/src/table.jl:149
[4] listingtable(table::DataFrame, variable::Symbol)
@ SummaryTables ~/_work/PumasTutorials.jl/PumasTutorials.jl/custom_julia_depot/packages/SummaryTables/vabQ8/src/table.jl:126
[5] top-level scope
@ ~/_work/PumasTutorials.jl/PumasTutorials.jl/tutorials/reporting/01-summary_tables.qmd:129
As suggested by the error message, we need to add grouping factors on our rows or columns to ensure that none of the groups will have more than one value.
In this case, we wanted to use the time values and subject IDs, so that’s how we arrived at our final solution:
listingtable(df_iv, :COBS; cols = :TIME, rows = :ID)
Notice that by specifying :TIME
in cols
and :ID
in rows
we have produced a wide format table from our originally long-formatted table. This is probably a more readable format for presentations and reports, but if you wish to stick to the long format you can add all grouping factors in rows
:
listingtable(df_iv, :COBS; rows = [:ID, :TIME])
Note that in this case, we passed a Vector
to the rows
keyword argument because we wanted to use multiple columns. This also works for cols
.
Another thing that we might want to do to this table is customize the table’s headers, since the current COBS
, TIME
, and SUBJID
headers are not very readable.
To do this, we use Pairs
indicating the variable and the name that we want to use in the table (e.g :SUBJID => "Subject ID"
).
Let’s take advantage of this and generate a table with more readable headers and units:
listingtable(
df_iv,:COBS => "Concentration (μg/L)";
= :TIME => "Time (hours)",
cols = :ID => "Subject ID",
rows )
Time (hours) | ||||||||||||||||
0 | 0.05 | 0.35 | 0.5 | 0.75 | 1 | 2 | 3 | 4 | 6 | 8 | 10 | 12 | 16 | 20 | 24 | |
Subject ID | Concentration (μg/L) | |||||||||||||||
1 | 157 | 142 | 116 | 109 | 66.5 | 74.8 | 39.2 | 25.4 | 13 | 3.81 | 1.47 | 1.11 | 0.911 | 0.83 | 0.624 | 0.654 |
2 | 59.8 | 66.4 | 55.5 | 59 | 55.8 | 53.7 | 38.9 | 31 | 24.2 | 15.9 | 10.7 | 7.33 | 5.83 | 3.3 | 2.32 | 1.72 |
3 | 166 | 130 | 127 | 97.8 | 86.6 | 81.9 | 35.8 | 22.3 | 12.8 | 6.47 | 4.98 | 3.38 | 3.33 | 2.69 | 2.22 | 2.04 |
4 | 134 | 124 | 112 | 122 | 83.4 | 78.4 | 48 | 31.6 | 18.7 | 11 | 6.01 | 5.02 | 3.31 | 3.32 | 2.45 | 2.3 |
5 | 94.9 | 94.4 | 80.7 | 60.4 | 54.4 | 46.1 | 17.8 | 5.78 | 3.51 | 1.06 | 0.556 | 0.537 | 0.464 | 0.359 | 0.295 | 0.204 |
6 | 57.6 | 56.7 | 61.8 | 56.7 | 50.5 | 52.6 | 34.7 | 28.9 | 25.9 | 16.7 | 11.4 | 6.88 | 4.66 | 2.98 | 2.01 | 1.75 |
7 | 81.5 | 70.8 | 81 | 78.7 | 69.3 | 59.8 | 39.5 | 35.1 | 25.7 | 12.6 | 9.81 | 6.23 | 6.54 | 5.14 | 3.52 | 3.06 |
8 | 113 | 96.5 | 92.6 | 75.8 | 67.9 | 68.9 | 41.2 | 23.2 | 16.8 | 8.01 | 4.59 | 3.56 | 2.27 | 1.87 | 1.41 | 1.13 |
9 | 111 | 123 | 102 | 90.3 | 79.2 | 63.1 | 38.1 | 21.8 | 9.74 | 5.21 | 3.34 | 2.23 | 1.51 | 1.11 | 0.701 | 0.694 |
10 | 68.4 | 59.2 | 54.7 | 55.3 | 56.9 | 47.3 | 31 | 24 | 21.5 | 11.3 | 6.54 | 4.51 | 3.45 | 2.63 | 1.99 | 1.38 |
11 | 111 | 106 | 95.6 | 62.1 | 60.9 | 42.6 | 17.5 | 7.57 | 4.11 | 1.68 | 0.929 | 0.851 | 0.7 | 0.51 | 0.357 | 0.36 |
12 | 106 | 119 | 84.9 | 87.3 | 69.6 | 63.1 | 43.3 | 28.3 | 19.7 | 11.4 | 7.59 | 5.52 | 4.38 | 3.71 | 2.66 | 2.63 |
13 | 158 | 130 | 97.5 | 89.1 | 85.1 | 79.3 | 60.3 | 41.6 | 26 | 17.8 | 11.5 | 9.59 | 7.18 | 5.67 | 4.58 | 4.26 |
14 | 103 | 133 | 116 | 87.9 | 82.8 | 67 | 38.7 | 17.3 | 12 | 4.17 | 3.15 | 2.54 | 2.14 | 1.76 | 1.19 | 1.11 |
15 | 95.1 | 84.9 | 79.8 | 67.1 | 74.8 | 75.9 | 45.9 | 31.9 | 23.4 | 11.5 | 6.57 | 4.23 | 4.72 | 2.74 | 2.38 | 2.13 |
16 | 88.4 | 108 | 88 | 80.6 | 68.8 | 49.3 | 27.5 | 15.3 | 9.15 | 4.47 | 2.7 | 2.18 | 1.68 | 1.48 | 0.987 | 0.776 |
17 | 194 | 134 | 96.7 | 108 | 92.7 | 87.1 | 44.7 | 24.4 | 16.4 | 8.45 | 5.65 | 5.74 | 4.21 | 2.97 | 2.14 | 1.7 |
18 | 114 | 110 | 82.7 | 79.8 | 80.6 | 53.9 | 29.1 | 17.9 | 7.71 | 3.04 | 1.04 | 0.706 | 0.652 | 0.54 | 0.443 | 0.418 |
19 | 318 | 274 | 189 | 177 | 132 | 93 | 33.3 | 22.1 | 14.5 | 9.95 | 8 | 6.73 | 5.32 | 4.68 | 3.06 | 2.35 |
20 | 57.2 | 53.4 | 46.2 | 55 | 38 | 36.2 | 35.7 | 24.7 | 16.8 | 9.46 | 6.19 | 3.59 | 2.03 | 0.954 | 0.794 | 0.622 |
21 | 47.2 | 46.8 | 42.4 | 37.8 | 40.3 | 40.5 | 25.9 | 26 | 18.4 | 11 | 7.51 | 5.4 | 4.38 | 2.82 | 1.72 | 1.11 |
22 | 86.8 | 102 | 86.9 | 82.5 | 68.8 | 64.5 | 44.9 | 37.7 | 25.1 | 17.5 | 10.8 | 7.87 | 6.97 | 4.2 | 3.56 | 3.29 |
23 | 77.3 | 66.5 | 72.1 | 62.9 | 57 | 50.3 | 40 | 29.8 | 18.9 | 11.7 | 7.18 | 5.16 | 3.26 | 2.58 | 2.01 | 1.6 |
24 | 66.6 | 74.4 | 54.4 | 48.8 | 47.4 | 39.1 | 26.3 | 17.4 | 11.2 | 3.86 | 2.12 | 1.19 | 0.833 | 0.755 | 0.629 | 0.546 |
Finally, we might also be interested in a version of this table that summarizes the concentration values across all subjects for each time point.
Luckily for us, listingtable
supports adding summary columns or rows by using the keyword arguments summarize_rows/summarize_cols
, which take a Vector
containing the summary functions that we want to use.
Let’s compute the geometric mean and the coefficient of variation as an example to see how that works. In this case, we will use summarize_rows
:
listingtable(
df_iv,:COBS => "Concentration (μg/L)";
= :TIME => "Time (hours)",
cols = :ID => "Subject ID",
rows = [
summarize_rows => "Geometric mean (μg/L)",
geomean -> 100 * variation(i)) => "CV (%)",
(i
], )
Time (hours) | ||||||||||||||||
0 | 0.05 | 0.35 | 0.5 | 0.75 | 1 | 2 | 3 | 4 | 6 | 8 | 10 | 12 | 16 | 20 | 24 | |
Subject ID | Concentration (μg/L) | |||||||||||||||
1 | 157 | 142 | 116 | 109 | 66.5 | 74.8 | 39.2 | 25.4 | 13 | 3.81 | 1.47 | 1.11 | 0.911 | 0.83 | 0.624 | 0.654 |
2 | 59.8 | 66.4 | 55.5 | 59 | 55.8 | 53.7 | 38.9 | 31 | 24.2 | 15.9 | 10.7 | 7.33 | 5.83 | 3.3 | 2.32 | 1.72 |
3 | 166 | 130 | 127 | 97.8 | 86.6 | 81.9 | 35.8 | 22.3 | 12.8 | 6.47 | 4.98 | 3.38 | 3.33 | 2.69 | 2.22 | 2.04 |
4 | 134 | 124 | 112 | 122 | 83.4 | 78.4 | 48 | 31.6 | 18.7 | 11 | 6.01 | 5.02 | 3.31 | 3.32 | 2.45 | 2.3 |
5 | 94.9 | 94.4 | 80.7 | 60.4 | 54.4 | 46.1 | 17.8 | 5.78 | 3.51 | 1.06 | 0.556 | 0.537 | 0.464 | 0.359 | 0.295 | 0.204 |
6 | 57.6 | 56.7 | 61.8 | 56.7 | 50.5 | 52.6 | 34.7 | 28.9 | 25.9 | 16.7 | 11.4 | 6.88 | 4.66 | 2.98 | 2.01 | 1.75 |
7 | 81.5 | 70.8 | 81 | 78.7 | 69.3 | 59.8 | 39.5 | 35.1 | 25.7 | 12.6 | 9.81 | 6.23 | 6.54 | 5.14 | 3.52 | 3.06 |
8 | 113 | 96.5 | 92.6 | 75.8 | 67.9 | 68.9 | 41.2 | 23.2 | 16.8 | 8.01 | 4.59 | 3.56 | 2.27 | 1.87 | 1.41 | 1.13 |
9 | 111 | 123 | 102 | 90.3 | 79.2 | 63.1 | 38.1 | 21.8 | 9.74 | 5.21 | 3.34 | 2.23 | 1.51 | 1.11 | 0.701 | 0.694 |
10 | 68.4 | 59.2 | 54.7 | 55.3 | 56.9 | 47.3 | 31 | 24 | 21.5 | 11.3 | 6.54 | 4.51 | 3.45 | 2.63 | 1.99 | 1.38 |
11 | 111 | 106 | 95.6 | 62.1 | 60.9 | 42.6 | 17.5 | 7.57 | 4.11 | 1.68 | 0.929 | 0.851 | 0.7 | 0.51 | 0.357 | 0.36 |
12 | 106 | 119 | 84.9 | 87.3 | 69.6 | 63.1 | 43.3 | 28.3 | 19.7 | 11.4 | 7.59 | 5.52 | 4.38 | 3.71 | 2.66 | 2.63 |
13 | 158 | 130 | 97.5 | 89.1 | 85.1 | 79.3 | 60.3 | 41.6 | 26 | 17.8 | 11.5 | 9.59 | 7.18 | 5.67 | 4.58 | 4.26 |
14 | 103 | 133 | 116 | 87.9 | 82.8 | 67 | 38.7 | 17.3 | 12 | 4.17 | 3.15 | 2.54 | 2.14 | 1.76 | 1.19 | 1.11 |
15 | 95.1 | 84.9 | 79.8 | 67.1 | 74.8 | 75.9 | 45.9 | 31.9 | 23.4 | 11.5 | 6.57 | 4.23 | 4.72 | 2.74 | 2.38 | 2.13 |
16 | 88.4 | 108 | 88 | 80.6 | 68.8 | 49.3 | 27.5 | 15.3 | 9.15 | 4.47 | 2.7 | 2.18 | 1.68 | 1.48 | 0.987 | 0.776 |
17 | 194 | 134 | 96.7 | 108 | 92.7 | 87.1 | 44.7 | 24.4 | 16.4 | 8.45 | 5.65 | 5.74 | 4.21 | 2.97 | 2.14 | 1.7 |
18 | 114 | 110 | 82.7 | 79.8 | 80.6 | 53.9 | 29.1 | 17.9 | 7.71 | 3.04 | 1.04 | 0.706 | 0.652 | 0.54 | 0.443 | 0.418 |
19 | 318 | 274 | 189 | 177 | 132 | 93 | 33.3 | 22.1 | 14.5 | 9.95 | 8 | 6.73 | 5.32 | 4.68 | 3.06 | 2.35 |
20 | 57.2 | 53.4 | 46.2 | 55 | 38 | 36.2 | 35.7 | 24.7 | 16.8 | 9.46 | 6.19 | 3.59 | 2.03 | 0.954 | 0.794 | 0.622 |
21 | 47.2 | 46.8 | 42.4 | 37.8 | 40.3 | 40.5 | 25.9 | 26 | 18.4 | 11 | 7.51 | 5.4 | 4.38 | 2.82 | 1.72 | 1.11 |
22 | 86.8 | 102 | 86.9 | 82.5 | 68.8 | 64.5 | 44.9 | 37.7 | 25.1 | 17.5 | 10.8 | 7.87 | 6.97 | 4.2 | 3.56 | 3.29 |
23 | 77.3 | 66.5 | 72.1 | 62.9 | 57 | 50.3 | 40 | 29.8 | 18.9 | 11.7 | 7.18 | 5.16 | 3.26 | 2.58 | 2.01 | 1.6 |
24 | 66.6 | 74.4 | 54.4 | 48.8 | 47.4 | 39.1 | 26.3 | 17.4 | 11.2 | 3.86 | 2.12 | 1.19 | 0.833 | 0.755 | 0.629 | 0.546 |
Geometric mean (μg/L) | 100 | 96.4 | 83.3 | 76.2 | 67 | 59.1 | 35.2 | 22.7 | 14.7 | 7.39 | 4.5 | 3.31 | 2.6 | 1.95 | 1.45 | 1.23 |
CV (%) | 52.3 | 44.6 | 35.6 | 36.4 | 28.6 | 26.4 | 26.4 | 34.7 | 41.5 | 55 | 59 | 59 | 62 | 60.9 | 62.1 | 65.9 |
We can see that now we have our summary statistics below the observations. Also, notice that we just provided the functions that we wanted to use as summary statistics, and listingtable
did all the heavy lifting for us.
4 📝 Summary tables
Summary tables are used to present summary statistics for a variable of interest in the dataset.
In SummaryTables.jl
, we generate a summary table with the summarytable
function, which takes the same positional arguments as listingtable
:
- A table: our
DataFrame
. - A variable: the variable from which the summary statistics will be computed.
In addition to that, we need to specify the summary
keyword argument, which should be a Vector
containing the summary functions that we want to use. Let’s try to create an example to see how that works.
For this case, we can turn our attention to the covariates included in our dataset. Because of this, we will need to filter our dataset to include only one observation per subject:
= unique(df_iv, :ID); df_cov
Now let’s create a table for the weight values. We will use the mean and standard deviation for our summary statistics, and we will include the number of subjects used in the calculation:
summarytable(
df_cov,:WEIGHT => "Weight (kg)";
= [mean => "Mean", std => "σ", length => "n"],
summary )
Weight (kg) | |
Mean | 66 |
σ | 8.17 |
n | 24 |
You can also use the pair syntax to specify the names used to refer to the summary functions.
That last table was useful, but it is somewhat simple. Let’s add a grouping variable using cols
:
summarytable(
df_cov,:WEIGHT => "Weight (kg)",
= [mean => "Mean", std => "σ", length => "n"],
summary = :GENDER => "Sex",
cols )
Sex | ||
Female | Male | |
Weight (kg) | ||
Mean | 60.3 | 69.4 |
σ | 7.27 | 6.75 |
n | 9 | 15 |
Now we get our summary statistics for weight, but separated by sex. We can also add a grouping factor related to age, so let’s compute a categorical variable from it:
= median(df_cov.AGE);
age_median
@rtransform! df_cov :AGE_cat = :AGE < age_median ? "Younger" : "Older";
Here we divided subjects into two groups: “Younger” and “Older”. A subject will be considered “Younger” if its age is less than the median. Otherwise, it will fall into the “Older” category.
Now we can add this variable to the table. In this case, we will add it as a row:
summarytable(
df_cov,:WEIGHT => "Weight (kg)",
= [mean => "Mean", std => "σ", length => "n"],
summary = :GENDER => "Sex",
cols = :AGE_cat => "Age group",
rows )
Sex | |||
Female | Male | ||
Age group | Weight (kg) | ||
Older | Mean | 58.2 | 72.2 |
σ | 6.95 | 6.13 | |
n | 7 | 7 | |
Younger | Mean | 67.4 | 67 |
σ | 1.91 | 6.68 | |
n | 2 | 8 | |
One last interesting thing you might want to know is that you don’t necessarily have to use the functions provided by StatsBase.jl
. In fact, summary
accepts user-defined functions.
As an example, let’s create a function that counts how many subjects have a weight greater than 70 kg:
more_than_70(x) = count(>(70), x);
summarytable(
df_cov,:WEIGHT => "Weight (kg)",
= [mean => "Mean", std => "σ", length => "n", more_than_70 => " > 70 kg"],
summary = :GENDER => "Sex",
cols = :AGE_cat => "Age group",
rows )
Sex | |||
Female | Male | ||
Age group | Weight (kg) | ||
Older | Mean | 58.2 | 72.2 |
σ | 6.95 | 6.13 | |
n | 7 | 7 | |
> 70 kg | 0 | 5 | |
Younger | Mean | 67.4 | 67 |
σ | 1.91 | 6.68 | |
n | 2 | 8 | |
> 70 kg | 0 | 2 | |
We encourage you to check our tutorial on functions to learn more about defining functions in Julia.
5 🥇 Table one
SummaryTables.jl
’s table_one
function allows you to create tables following the commonly used “Table 1” format to summarize patient baseline characteristics.
table_one
also takes two positional arguments:
- A table: in this case
df_cov
, which is aDataFrame
. - A
Vector
of variables.
Again, we’ll focus on describing our subjects using their covariates, so let’s generate a table with our continuous covariates:
table_one(df_cov, [:WEIGHT => "Weight (kg)", :AGE => "Age (years)"])
Overall | |
Weight (kg) | |
Mean (SD) | 66 (8.17) |
Median [Min, Max] | 67.8 [49.1, 84.1] |
Age (years) | |
Mean (SD) | 44.6 (3.11) |
Median [Min, Max] | 44 [41, 51] |
As you can see, table_one
shows some interesting summary statistics such as the mean and the standard deviation for the variables that we specified. Let’s now try to generate a table for our discrete covariates:
table_one(df_cov, [:GENDER => "Sex", :AGE_cat => "Age group"])
Overall | |
Sex | |
Female | 9 (37.5%) |
Male | 15 (62.5%) |
Age group | |
Older | 14 (58.3%) |
Younger | 10 (41.7%) |
Notice that when we pass discrete variables we get back the number of occurrences and their corresponding percentages instead of the summary statistics from before.
Lastly, we don’t have to separate continuous and discrete variables. You can include all of them in the same call and table_one
will adjust its output accordingly:
table_one(df_cov, [
:AGE => "Age (years)", # A continuous covariate
:WEIGHT => "Weight (kg)",
:GENDER => "Sex", # A discrete covariate
])
Overall | |
Age (years) | |
Mean (SD) | 44.6 (3.11) |
Median [Min, Max] | 44 [41, 51] |
Weight (kg) | |
Mean (SD) | 66 (8.17) |
Median [Min, Max] | 67.8 [49.1, 84.1] |
Sex | |
Female | 9 (37.5%) |
Male | 15 (62.5%) |
Similarly to what we have been doing before, we can use grouping variables, which in this case are added through the groupby
keyword argument:
table_one(
df_cov,:AGE => "Age (years)", :WEIGHT => "Weight (kg)"];
[= [:AGE_cat => "Age group", :GENDER => "Sex"],
groupby = true,
show_n )
Age group | |||||
Older (n=14) |
Older (n=14) |
Younger (n=10) |
Younger (n=10) |
||
Sex | Sex | ||||
Overall (n=24) |
Female (n=7) |
Male (n=7) |
Female (n=2) |
Male (n=8) |
|
Age (years) | |||||
Mean (SD) | 44.6 (3.11) | 46.6 (2.57) | 46.9 (2.04) | 41.5 (0.707) | 41.6 (0.744) |
Median [Min, Max] | 44 [41, 51] | 46 [44, 51] | 47 [44, 50] | 41.5 [41, 42] | 41.5 [41, 43] |
Weight (kg) | |||||
Mean (SD) | 66 (8.17) | 58.2 (6.95) | 72.2 (6.13) | 67.4 (1.91) | 67 (6.68) |
Median [Min, Max] | 67.8 [49.1, 84.1] | 55.5 [49.1, 68.7] | 70.5 [64.8, 84.1] | 67.4 [66, 68.7] | 67.8 [54.9, 76.4] |
We also set the show_n
keyword argument to true
in order to get the number of rows associated with the grouping factors.
table_one
supports many other useful keyword arguments, including some related to hypothesis testing. We encourage you to check our documentation on table_one
to learn more about the use of this function.
6 📤 Generating output
SummaryTables.jl
currently supports exporting your results to HTML and \(\LaTeX\) formats.
You can save your table into a file using show
in the following way:
= table_one(...) # Using table_one as an example
table
# Save in HTML format
open("table.html", "w") do io
show(io, MIME"text/html"(), table)
end
# Save as LaTeX code
open("table.tex", "w") do io
show(io, MIME"text/latex"(), table)
end
This will surely come in handy when trying to include the tables you generate with SummaryTables.jl
in your publications and reports.
7 Conclusions
We hope this has been helpful in introducing you to the different types of tables that you can generate with SummaryTables.jl
. We covered listing tables, which are useful for displaying raw data, as well as summary tables, which are perfect for presenting summarized data. Lastly, we discussed how to create tables in the commonly used “Table 1” format
Make sure to check our documentation on SummaryTables.jl
for more information on the available keyword arguments for each of the functions covered here and other examples on the use of this package.