Generated quarto documents with QuartoTools

Author

Julius Krumbiegel

1 ::: blocks

Quarto has a lot of features such as figure blocks, complex layouts or interactive widgets like tabsets that are controlled via Pandoc’s div markdown syntax. A Pandoc div is a block of markdown fenced by two matching pairs of at least three colons :::. It can be thought of as an abstract “region” in your document that may be given special treatment by quarto depending on its annotations and content.

Let’s look at tabsets as an example. A tabset is an interactive HTML element that only shows one section at a time and can be used to visually declutter your document.

To define a tabset, you wrap a region of your .qmd file with a ::: block that has the class annotation {.panel-tabset}. Inside that block, you place sections that are separated by level 2 headings. These sections are converted into tabs in the final HTML output.

::: {.panel-tabset}

## For-loop in Julia

```julia
for i in 1:10
    println(i)
end
```

## For-loop in Python

```python
for i in range(1, 11):
    print(i)
```

:::
for i = 1:10
    println(i)
end
for i in range(1, 11):
    print(i, "\n")

2 Limitations of ::: blocks

There are a few problems with ::: blocks that limit their usefulness in practice.

The more :::s you have, the harder it is to tell where one block ends and the next one begins, especially if they are nested. Your .qmd files become more difficult to read and it’s easy to introduce accidental syntax errors which can be painful to debug.

In addition, the structures you create using ::: are hardcoded into the document. That means they cannot react to your data. For example, it could be useful to show one tab with a diagnostic plot for each subject in a population. But it gets really tedious to copy-paste twenty almost identical code blocks for your twenty subjects. And when you then switch datasets and your number of subjects changes, you have to redo everything.

3 Generating output

So we need a way to generate parts of our quarto document programmatically. A simple way is by printing things in a cell with the output: asis option.

output: asis

The following Julia code generates a markdown heading and paragraph.

```{julia}
#| echo: false
#| output: asis
println("### A generated subsection")
println("And some generated section content.")
```

3.1 A generated subsection

And some generated section content.

But this gets much more difficult if the sections you want to generate do not only consist of text but of plots, tables and other assets. You would have to manually store these assets in some temporary location first and splice them into the document from there which is much more work than the normal inline display mechanism where you return some values and these values are displayed appropriately.

4 Cell expansion

We wanted to have a composable, value-based mechanism instead that ties into Julia’s normal display system mechanism and gives users full control over their .qmd outputs while being easy to use. The result of that is QuartoNotebookRunner.jl’s cell expansion mechanism and some convenience tools built on top of that which we’ve added to QuartoTools.

Let’s have a look at a high-level example, first, and then explore the mechanism in more detail.

Returning to our tabset example, here’s how you could realize a tabset showing plots for a dynamic number of subjects that is only known at runtime. To emphasize the runtime aspect, we simply pick a random number between 5 and 10. The QuartoTools.Tabset object does all the heavy lifting for us:

using PharmaDatasets
using NCAUtilities
using NCA
using QuartoTools: Tabset

dapa_IV = let
    df = dataset("nca/dapa_IV")
    df[!, :TIME] .= float.(df[!, :TIME])
    df[!, :CObs] .= float.(df[!, :CObs])
    df[!, :route] .= "iv"
    read_nca(
        df;
        id = :ID,
        time = :TIME,
        observations = :CObs,
        amt = :AMT_IV,
        route = :route,
        llq = 0,
    )
end

n_subjects = rand(5:10)
subpopulation = dapa_IV[1:n_subjects]

Tabset([
    "Subject $(subject.id)" => observations_vs_time(subject) for subject in subpopulation
])

Returning a single Tabset object with a vector of a elements generates all of the ::: blocks we need under the hood for us. But how does this work and how can we use this mechanism for to solve other custom document generation problems?

5 Expansion mechanism

When QuartoNotebookRunner executes the code blocks of a .qmd document, it checks if each block’s return value is expandable. A value that is expandable has a special method that can turn it into a vector of “cells”, where each cell is an object that describes a quarto code block with some source code, cell options and a return value.

QuartoTools has the Cell object which conforms exactly to that interface. Here’s what happens when we return a Cell in a code block:

using QuartoTools: Cell

plot = observations_vs_time(dapa_IV[1])

plot_cell = Cell(plot, code = """
    # This code here did not actually run. That could be useful
    # if you wanted to show less or other code than actually ran
    # for didactic reasons. The displayed plot was directly
    # passed as a value to `Cell`.

    observations_vs_time(dapa_IV[1])""")
# This code here did not actually run. That could be useful
# if you wanted to show less or other code than actually ran
# for didactic reasons. The displayed plot was directly
# passed as a value to `Cell`.

observations_vs_time(dapa_IV[1])

The purpose of Cell is to create an output that can act as if we had executed a normal code block there, that’s why the code parameter exists. But that code doesn’t actually get run because we don’t want to do code generation here. We just execute normal Julia functions, return some values and display those. If we don’t need code to be displayed, we can just not set that keyword:

Cell(plot)

The third part of the Cell interface are the cell options. These are spliced into the intermediate markdown output from which quarto renders the final output. For example, we could create a Cell that acts like a code block in which output: asis was set, and which prints some text which will therefore act as Markdown to quarto.

markdown_cell = Cell(
    () -> println("This is _markdown_ printed from an `output: asis` cell."),
    options = Dict("output" => "asis"),
)

This is markdown printed from an output: asis cell.

The above examples might seem contrived because we don’t really gain anything by using Cell compared to just returning a plot object directly, or writing the printing code into the original code block itself. The value comes from composing these building blocks with each other. We have stored two cells in variables so let’s now make a Tabset with them. A cell that results from expansion may also return an expandable value, so we can use this recursive behavior for nesting:

tabset = Tabset(["Plot Cell" => plot_cell, "Markdown cell" => markdown_cell])
# This code here did not actually run. That could be useful
# if you wanted to show less or other code than actually ran
# for didactic reasons. The displayed plot was directly
# passed as a value to `Cell`.

observations_vs_time(dapa_IV[1])

This is markdown printed from an output: asis cell.

Whenever quarto allows us to nest ::: blocks in the markdown code, we can achieve the same using nested expandable objects. For example, we could make a function that creates a tabset for a single subject, and then make a parent tabset for the subpopulation:

function subtabset(subject)
    Tabset([
        "obs vs. time" => observations_vs_time(subject),
        "subject fit" => subject_fits(subject),
    ])
end

Tabset(["Subject $(subj.id)" => subtabset(subj) for subj in subpopulation])

The Div object simply wraps its content objects in ::: fences. It takes three optional keyword arguments, id, class and attributes which correspond to the different annotation options you find in the quarto docs as well. A block with an id such as ::: {#fig-somefigure} corresponds to Div(..., id = "fig-somefigure"), a block with a class such as ::: {.lightbox} corresponds to Div(..., class = "lightbox") and a block with an attribute like ::: {layout-ncol=2} corresponds to Div(..., attributes = Dict("layout-ncol" => "2")). You can also pass multiple ids or classes with vectors.

Note how in the following example, we access one of quarto’s column layout options with the layout-ncol attribute. The MarkdownCell is another convenience object that acts like a snippet of raw markdown code in the source document.

using QuartoTools: Div, MarkdownCell

Div(
    [
        MarkdownCell("""
        To the right, you can see a `summary_observations_vs_time` plot
        of the whole `dapa_IV` population with log axis and `show_subjects` enabled.
        """),
        summary_observations_vs_time(
            dapa_IV,
            show_subjects = true,
            axis = (; yscale = log10),
            figure = (; size = (300, 300)),
        ),
    ],
    attributes = Dict("layout-ncol" => "2"),
)

To the right, you can see a summary_observations_vs_time plot of the whole dapa_IV population with log axis and show_subjects enabled.

The Expand object takes a vector of objects and acts as if those were added one after another to the source document without an enclosing div. In this example, we insert a sequence of three divs that contain plots as quarto figures (by giving them an id that starts with fig-).

using QuartoTools: Expand

function figure_div(subj)
    Div(
        [subject_fits(subj), MarkdownCell("This figure shows subject $(subj.id)")],
        id = "fig-subject-$(subj.id)",
    )
end

Expand([figure_div(subj) for subj in dapa_IV[1:3]])

Figure 1: This figure shows subject 1

Figure 2: This figure shows subject 2

Figure 3: This figure shows subject 3

6 Lazy vs eager cell values

All of QuartoTools’ expandable objects take either normal values or functions as inputs. Passing functions allows you to lazily instantiate their return values at the moment of rendering the cell, which means you do not have to store all result values in memory. This might allow you to save RAM for very large documents with many assets:

function func_making_a_plot()
    # this will only get called when rendering the cell, so the resulting
    # Makie Figure doesn't have to stay in memory for the rest of the run
    summary_observations_vs_time(dapa_IV)
end

Cell(func_making_a_plot)