SciML using DeepPumas

Authors

Niklas Korsbo

Mohamed Tarek

1 Learning objectives

  • Explore the use of machine learning models, such as multi-layer perceptrons (MLPs), to model time-series data.
  • Understand the concept of neural ordinary differential equations (NODEs) and their application in modeling dynamic systems with doses.
  • Learn how to encode domain knowledge into models using universal differential equations (UDEs).
  • Understand the limitations of models without random effects in capturing individual variability.

2 Setup

First, we need to load the necessary packages. We also set the backend for the MLPDomain to use :staticflux. DeepPumas supports multiple backends for the MLPDomain. :staticflux is a good choice for small neural networks which we will use in this tutorial.

using DeepPumas
using CairoMakie
using StableRNGs
set_mlp_backend(:staticflux)

3 Synthetic data and true model

The following Pumas model is designed to simulate pharmacokinetic (PK) and pharmacodynamic (PD) data. It assumes a linear one-compartment PK model, oral dosing, and a non-linear elimination PD model, serving as the basis for generating synthetic data.

data_model = @model begin
    @param begin
        Ka  RealDomain(; lower = 0) # absorption rate constant
        CL  RealDomain(; lower = 0) # clearance
        Vc  RealDomain(; lower = 0) # volume of distribution
        Smax  RealDomain(; lower = 0) # maximum effect
        n  RealDomain(; lower = 0) # Hill coefficient
        SC50  RealDomain(; lower = 0) # concentration at half-maximal effect
        Kout  RealDomain(; lower = 0) # elimination rate constant
        Kin  RealDomain(; lower = 0) # input rate constant
        σ  RealDomain(; lower = 0) # proportional error standard deviation
    end
    @init begin
        Response = Kin / Kout # initial response
    end
    @vars begin
        cp := max(Central / Vc, 0.0) # plasma concentration
        EFF := Smax * cp^n / (SC50^n + cp^n) # effect function
    end
    @dynamics begin
        Depot' = -Ka * Depot
        Central' = Ka * Depot - (CL / Vc) * Central
        Response' = Kin * (1 + EFF) - Kout * Response # response dynamics
    end
    @derived begin
        Outcome ~ @. Normal(Response, abs(Response) * σ)
    end
end

true_parameters = (;
    Ka = 0.5,
    CL = 1.0,
    Vc = 1.0,
    Smax = 2.9,
    n = 1.5,
    SC50 = 0.05,
    Kout = 2.2,
    Kin = 0.8,
    σ = 0.02, ## <-- tune the observational noise of the data here
)

We will use this model to simulate data for two subjects, A and B, with different dosage regimens. Subject A will receive a dose of 1.0 with an additional dose after 5 hours, while Subject B will receive a lower dose of 0.1 with the same regimen.

We use the simobs function to simulate observations for each subject based on the defined model and true parameters. The obstimes argument specifies the time points at which observations are made, and we use a stable random number generator (StableRNG) to ensure reproducibility.

dr_a = DosageRegimen(1.0, addl = 1, ii = 5)
subj_a = Subject(; events = dr_a, id = "Subject A")
sim_a = simobs(data_model, subj_a, true_parameters; obstimes = 0:0.5:15, rng = StableRNG(1))
data_a = [Subject(sim_a)]

dr_b = DosageRegimen(0.1, addl = 1, ii = 5)
subj_b = Subject(; events = dr_b, id = "Subject B")
sim_b = simobs(data_model, subj_b, true_parameters; obstimes = 0:0.5:15, rng = StableRNG(2))
data_b = [Subject(sim_b)]

We can visualize the simulated data for both subjects using plotgrid, which provides a convenient way to plot multiple subjects’ data in a grid layout.

plotgrid(data_a; data = (; label = "Data (subject A)"))
plotgrid!(data_b; data = (; label = "Data (subject B)"), color = :gray)

We overlay the true model predictions on the data for both subjects. This is the ground truth underlying function that we will try to recover using scientific machine learning. To get the predictions, we use the predict function with the data_model and the true_parameters. The obstimes argument specifies the time points at which we want to evaluate the predictions. Note that we are predicting at more time points than the original data to get a smoother curve.

pred_datamodel_a = predict(data_model, data_a, true_parameters; obstimes = 0:0.01:15)
pred_datamodel_b = predict(data_model, data_b, true_parameters; obstimes = 0:0.01:15)
plotgrid(pred_datamodel_a; ipred = false)
plotgrid!(pred_datamodel_b; data = true, ipred = false)

4 Identification of dynamics using machine learning

One way to identify the dynamics of a system is to use a neural network to model the relationship between time and the observed outcomes. In this section, we will explore how to use a simple neural network function of time to model the dynamics of our system.

The following code defines a model that uses a multi-layer perceptron (MLP) mlp that takes time as input and outputs a predicted outcome, which we will then fit to the observed data. The MLP is defined with 1 input, 1 output, and 2 hidden layers of 6 nodes each. We also include a regularization term to lower the chances of overfitting but avoid regularizing the output layer, as we want the output to be on the same scale as the observed data. This model is not a SciML model, but rather a machine learning model that maps time to a noisy outcome.

time_model = @model begin
    @param begin
        mlp  MLPDomain(1, 6, 6, (1, identity); reg = L2(1.0; output = false))
        σ  RealDomain(; lower = 0.0)
    end
    @derived begin
        nn_output := first.(mlp.(t))
        Outcome ~ @. Normal(nn_output, σ)
    end
end

The @derived block specifies how the outcome is derived from the MLP output, using the first element of the MLP output for each time point. In the @derived block:

  • t is a vector of all time points for which a subject had observations.
  • mlp.(t) applies the mlp function to each element of t. (the . “broadcasts” the function over all elements instead of using the vector directly)
  • first gets the first element of the mlp output (the output is a 1-element vector)

To fit this model to the data, we first need to remove the dose information from the subjects, as this simple model does not account for dosing. We create new subjects without events (doses) for both Subject A and Subject B.

data_a_no_dose = Subject.(data_a; events = nothing)
data_b_no_dose = Subject.(data_b; events = nothing)

We then call the fit function to fit the model to the data for Subject A, using the init_params function to initialize the parameters of the model. We use the MAP(NaivePooled()) method to fit the model. MAP stands for Maximum A Posteriori estimation, and NaivePooled is used because the model does not have random effects. The reason MAP is required is that under the hood, the regularization is implemented as a prior distribution on the parameters. Without MAP, the regularization would not be applied.

fpm_time = fit(time_model, data_a_no_dose, init_params(time_model), MAP(NaivePooled()))

Following the fit, we can visualize the predictions of the model for Subject A. The predict function is used to generate predictions based on the fitted model, and we specify the observation times at which we want to evaluate the predictions.

First, we plot the predictions for Subject A, overlaying them on the observed data.

pred_a = predict(fpm_time; obstimes = 0:0.1:15);
plotgrid(
    pred_a;
    pred = (; label = "Pred (subject A)"),
    data = (; label = "Data (subject A)", color = :gray),
    ipred = false,
)

Next, we can also visualize the predictions for Subject B using the same fitted model.

pred_b = predict(fpm_time, data_b_no_dose; obstimes = 0:0.1:15);
plotgrid!(
    pred_b,
    pred = (; label = "Pred (subject B)", color = :red),
    data = (; label = "Data (subject B)", color = :gray),
    ipred = false,
)

Note that they have the exact same prediction! This is expected since the model does not account for dosing and only uses time as input. The model has learned a function of time that fits the data, but it does not capture the underlying pharmacokinetic and pharmacodynamic processes.

5 Neural ordinary differential equations (NODEs)

For simple dosing regimens, we could have input the dose amount as an additional input to the neural network above. However, this approach does not scale well to more complex dosing regimens, or when we have multiple subjects with different dosing patterns.

To account for the dosing and the underlying dynamics of the system, we can use a neural ordinary differential equation (NODE). This approach allows us to model the dynamics of the system using dynamic compartments and their interactions, while still leveraging the power of neural networks to learn complex relationships. This approach is generally more suitable for PKPD modeling, because it allows us to assign one of the compartments to be the Depot compartment, which is where the drug is administered, and one to be the Central compartment, which is where the drug is distributed and metabolized, and one to be the Response compartment, which is where the effect of the drug is observed.

neural_ode_model = @model begin
    @param begin
        mlp  MLPDomain(3, 6, 6, (3, identity); reg = L2(1.0)) # neural network with 3 inputs and 1 output
        Response₀  RealDomain(; lower = 0) # initial value of response
        σ  RealDomain(; lower = 0) # proportional error standard deviation
    end
    @init Response = Response₀
    @dynamics begin
        Depot' = mlp(Depot, Central, Response)[1]
        Central' = mlp(Depot, Central, Response)[2]
        Response' = mlp(Depot, Central, Response)[3]
    end
    @derived begin
        Outcome ~ @. Normal(Response, abs(Response) * σ)
    end
end

Note that we still utilize a neural network to model the rates of change of the compartments, but any dosing regimen can now be administered to the Depot compartment. Additionally, the same neural network is used to model the rates of change of all three compartments. The first output of the neural network is used for the Depot compartment, the second output for the Central compartment, and the third output for the Response compartment.

For simplicity, no PK data is used in this model which means that the second compartment’s dynamics is not identifiable and its meaning as the Central compartment will generally not be respected! This is an important point to keep in mind when using NODEs. The meaning you assign to a compartment is not enforced by the model unless you tie a compartment to a specific observed variable that is used for fitting. This is also true to an extent for the Depot compartment despite the fact that it is the only compartment that receives a dose directly. But because the rate of change is a general neural network, there is no guarantee that the Depot compartment will be a decreasing function of time between doses, or that the amount there is only that of the actual dose given.

Next, we fit the model to the data for Subject A. We use the fit function with the init_params function to initialize the parameters of the model. init_params will initialize mlp to use random parameter values that are sampled once when a model is defined. This means that subsequent calls to init_params will return the same parameters. This is useful for reproducibility. sample_params can be used instead to sample new random parameters each time it is called, which is useful for exploring the parameter space and finding good initial values for the optimization. We use MAP(NaivePooled()) again to fit the model.

fpm_node = fit(neural_ode_model, data_a, init_params(neural_ode_model), MAP(NaivePooled()))

After fitting the model, we can visualize the predictions for subjects A and B.

pred_a = predict(fpm_node; obstimes = 0:0.01:15)
plotgrid(
    pred_a;
    pred = (; label = "Pred (subject A)"),
    ipred = false,
    data = (; label = "Data (subject A)", color = :gray),
)

pred_b = predict(fpm_node, data_b; obstimes = 0:0.01:15)
plotgrid!(
    pred_b,
    pred = (; label = "Pred (subject B)", color = :red),
    data = (; label = "Data (subject B)", color = :gray),
    ipred = false,
)

We can see that the predictions are different unlike the previous model, which only used time as input. However, the model is clearly not able to predict the data of Subject B very well, indicating that it was unable to learn a generalizable model that captures the underlying dynamics of the system and how these dynamics depend on the dose level.

Perhaps, this is not surprising since the model is only fitted on a single dose instance. Machine learning needs more data to learn generalizable patterns. So despite the model’s ability to learn the dynamics of Subject A, it does not generalize well to Subject B.

6 Universal differential equation (UDE)

The NODE model used previously did not encode any domain knowledge about the system, beside administering drug to a specific compartment. To improve the model’s ability to generalize across subjects, we can encode more domain knowledge into the model. This is where the concept of Universal Differential Equations (UDEs) comes into play.

Below is a model that uses a traditional one compartment PK model which is known to be the correct data-generating PK model. But beside the specific functional form, this PK model encodes the knowledge that the drug is conserved between the Depot and Central compartments. This means that the drug that enters the Central compartment is equal to the drug that leaves the Depot compartment, minus the drug that is cleared from the Central compartment. This mass conservation property ensures a monotonic decreasing behavior of Depot between doses, and ensures a behavior of Central that is consistent with our understanding and meaning assigned to this compartment. Note that the specific dynamics of the compartments may still not be identifiable from the PD data only. However, we have constrained the set of plausible models to ones that are more biologically reasonable.

ude_model = @model begin
    @param begin
        mlp  MLPDomain(2, 6, 6, (1, identity); reg = L2(0.1)) # neural network with 2 inputs and 1 output
        Ka  RealDomain(; lower = 0)
        CL  RealDomain(; lower = 0)
        Vc  RealDomain(; lower = 0)
        Response₀  RealDomain(; lower = 0)
        σ  RealDomain(; lower = 0) # proportional error standard deviation
    end
    @init Response = Response₀
    @dynamics begin
        Depot' = -Ka * Depot # known
        Central' = Ka * Depot - (CL / Vc) * Central # known
        Response' = mlp(Central / Vc / 100, Response / 10)[1]
    end
    @derived begin
        Outcome ~ @. Normal(Response, abs(Response) * σ)
    end
end

In this model, the Response compartment is the only part that is still modelled using a neural network. The neural network takes 2 inputs:

  1. The Central compartment concentration divided by the volume of distribution (Vc) and divided further by 100, and
  2. The current value of the Response compartment (R) divided by 10.

The division by 100 and 10 ensures that the scales of the inputs are small at the beginning of the fit. At the beginning of a fit, the dynamics will not be well behaved and may lead to unreasonably high values for Central or Response. This can cause fitting issues by saturating the neural network’s activation function (tanh), leading to non-informative low gradients and poor convergence. By scaling the inputs, we can avoid this issue and allow the model to learn more effectively. The default activation function of the MLP is tanh, which saturates for large inputs. Alternatively, we could have used softplus as the activation function, which does not saturate for large inputs. However, for consistency with the previous models and for the smoothness of the tanh function, we will stick with tanh for this example.

Next, we fit the model to the data for Subject A and make the same predictions for both subjects A and B as before.

fpm_ude = fit(ude_model, data_a, init_params(ude_model), MAP(NaivePooled()))
pred_a = predict(fpm_ude; obstimes = 0:0.1:15);
plotgrid(
    pred_a;
    pred = (; label = "Pred (subject A)"),
    ipred = false,
    data = (; label = "Data (subject A)", color = :gray),
)

pred_b = predict(ude_model, data_b, coef(fpm_ude); obstimes = 0:0.1:10);
plotgrid!(
    pred_b,
    pred = (; label = "Pred (subject B)", color = :red),
    data = (; label = "Data (subject B)", color = :gray),
    ipred = false,
)

Note that the model is able to generalize better to Subject B compared to the previous NODE model, despite just seeing a single dose level in the training data. In general, encoding more domain knowledge into a model can help improve its ability to generalize and data efficiency. Data efficiency refers to the model’s ability to learn from a limited amount of data, which is particularly important in pharmacometrics where data can be scarce or expensive to obtain.

7 Encoding more domain knowledge

We can go further by encoding knowledge about the functional form of the rate of change of the Response compartment. For example, an additional constraint we can add is that when there is no drug, the Response compartment should follow the classical turnover model:

Response' = Kin - Kout * Response

We also initialize the Response compartment to be the ratio of the input and output rate constants which is the steady state value of the Response compartment when there is no drug. This is a common practice in pharmacometrics.

We achieve this using the following model.

ude_model_knowledge = @model begin
    @param begin
        mlp  MLPDomain(1, 6, 6, (1, identity); reg = L2(0.1), bias = false) # neural network with 1 input and 1 output and no bias/intercept
        Ka  RealDomain(; lower = 0)
        CL  RealDomain(; lower = 0)
        Vc  RealDomain(; lower = 0)
        Kout  RealDomain(; lower = 0)
        Kin  RealDomain(; lower = 0)
        σ  RealDomain(; lower = 0) # proportional error standard deviation
    end
    @init Response = Kin / Kout
    @dynamics begin
        Depot' = -Ka * Depot
        Central' = Ka * Depot - (CL / Vc) * Central
        Response' = Kin * (1 + mlp(Central / Vc)[1]) - Kout * Response
    end
    @derived begin
        Outcome ~ @. Normal(Response, abs(Response) * σ)
    end
end

Note that to ensure the output of the neural network is 0 when its input is 0, we avoid using bias terms in the neural network, hence the bias = false in the MLPDomain definition. Also because this model is much more constrained and well-behaved, we drop the scaling of the input to the neural network which was unnecessary.

Next, we fit and predict using the new model with the same data for Subject A and B.

fpm_knowledge =
    fit(ude_model_knowledge, data_a, init_params(ude_model_knowledge), MAP(NaivePooled());)
pred_a = predict(fpm_knowledge; obstimes = 0:0.1:15);
plotgrid(
    pred_a;
    ipred = false,
    data = (; label = "Data (subject A)", color = (:black, 0.5)),
    pred = (; label = "Pred (subject A)"),
    legend = (; orientation = :horizontal, nbanks = 2),
)

pred_b = predict(ude_model_knowledge, data_b, coef(fpm_knowledge); obstimes = 0:0.1:10);
plotgrid!(
    pred_b;
    ipred = false,
    data = (; label = "Data (subject B)", color = :black),
    pred = (; label = "Pred (subject B)", color = :red),
)

Note how the prediction now takes the correct function shape but it is still not able to predict the scale of the response of Subject B very well. To further appreciate this, we overlay the true model predictions on the same plot.

plotgrid!(
    pred_datamodel_a;
    pred = (; color = (:black, 0.4), label = "Datamodel"),
    ipred = false,
)
plotgrid!(
    pred_datamodel_b;
    pred = (; color = (:black, 0.4), label = "Datamodel"),
    ipred = false,
)

Notice that the steady-state behavior of the model prediction and the true underlying function is the same. Compare the prediction here from that of the first black-box NODE model. While the UDE model prediction is not perfect for Subject B, it is still impressive given that only a single subject was used in the training data. So let’s see what happens when we have more subjects in the training data.

8 Two training subjects

To demonstrate the effect of adding more data, let’s add one more subject with different dose level to the training data. The new subject, Subject C, is given a dose amount of 0.5. Given that Subject B has a dose level of 0.1, by using subjects A and C to fit the model and predicting for Subject B, we are still extrapolating the model to a dose level that is lower than the training data.

First, let’s define the new subject and simulate its data using the same true_parameters as before.

dr_c = DosageRegimen(0.5, addl = 1, ii = 5)
subj_c = Subject(; events = dr_c, id = "Subject C")
sim_c = simobs(data_model, subj_c, true_parameters; obstimes = 0:0.5:15, rng = StableRNG(3))

Next, we fit the UDE model to the data for Subjects A and C, using the same fitting procedure as before, and we predict for Subject B.

data_ac = [Subject(sim_a), Subject(sim_c)]
fpm_knowledge =
    fit(ude_model_knowledge, data_ac, init_params(ude_model_knowledge), MAP(NaivePooled());)
pred_b = predict(ude_model_knowledge, data_b, coef(fpm_knowledge); obstimes = 0:0.1:10);
plotgrid(
    pred_b;
    ipred = false,
    data = (; label = "Data (subject B)", color = :black),
    pred = (; label = "Pred (subject B)", color = :red),
)

Much better!

9 Sparse data

Let’s repeat the same analysis above but with sparsely sampled data for the training data. To compensate for the data sparsity per subject, 4 subjects will be used for training the model. This is a more realistic scenario where we often have sparse data for each subject, which can make it difficult to fit the model.

We sample 4 subjects, each with a different dose level, and simulate their data using the same true_parameters as before. The doses are chosen to be 1.0, 0.8, 0.6, and 0.4. The same Subject B from before will then be used for prediction.

doses = [1.0, 0.8, 0.6, 0.4]
sparse_training_data = map(1:4) do i
    dr = DosageRegimen(doses[i], addl = 1, ii = 5)
    subj = Subject(; events = dr, id = "Sparse subject $i")
    step_size = rand(StableRNG(2i), Uniform(4, 6)) # random step size between 4 and 6 hours
    sim = simobs(
        data_model,
        subj,
        true_parameters;
        obstimes = 0:step_size:15,
        rng = StableRNG(i),
    )
    return Subject(sim)
end

With the above sampling procedure, each subject has 3 to 4 observations. Next, we fit the model to the sparse data and predict for Subject B as before. Note that Subject B has a dose level that is not in the training data.

fpm_knowledge = fit(
    ude_model_knowledge,
    sparse_training_data,
    init_params(ude_model_knowledge),
    MAP(NaivePooled());
)
pred_b = predict(ude_model_knowledge, data_b, coef(fpm_knowledge); obstimes = 0:0.1:10);
plotgrid(
    pred_b;
    ipred = false,
    data = (; label = "Data (subject B)", color = :black),
    pred = (; label = "Pred (subject B)", color = :red),
)

The model is still able to predict the data for Subject B reasonably well, despite the sparse data and the fact that Subject B has a dose level that is not in the training data. This demonstrates the power of UDEs in modeling complex systems with limited data.

10 Multiple subjects pooled

The previous analysis demonstrated the ability of a UDE model to extrapolate to new dose levels with very limited training data. However, a key assumption was made in the data generating process so far, which is that all subjects were assumed to have the same true_parameters. So the only sources of variability between subjects are the covariates and residual noise. This assumption is not realistic in practice. In reality, subjects have different individual parameters, e.g. Ka and CL. In the next analysis, we will try to understand how our UDE model behaves once we relax this assumption in the data generating process without changing the fitting model.

To simulate multiple subjects that have different individual PK and PD parameters, we use a nonlinear mixed effects model (NLME) with a @random block to define random effects for the PK and PD parameters. The model is otherwise identical to the previous data generating model.

data_model_heterogeneous = @model begin
    @param begin
        Ka  RealDomain(; lower = 0)
        CL  RealDomain(; lower = 0)
        Vc  RealDomain(; lower = 0)
        Smax  RealDomain(; lower = 0)
        n  RealDomain(; lower = 0)
        SC50  RealDomain(; lower = 0)
        Kout  RealDomain(; lower = 0)
        Kin  RealDomain(; lower = 0)
        σ  RealDomain(; lower = 0)
    end
    @random begin
        η ~ MvNormal(0.04 * I(5))
    end
    @pre begin
        Smaxᵢ = Smax * exp(η[1])
        SC50ᵢ = SC50 * exp(η[2])
        Kaᵢ = Ka * exp(η[3])
        Vcᵢ = Vc * exp(η[4])
        Koutᵢ = Kout * exp(η[5])
    end
    @init begin
        R = Kin / Koutᵢ
    end
    @vars begin
        cp := max(Central / Vcᵢ, 0.0)
        EFF := Smaxᵢ * cp^n / (SC50ᵢ^n + cp^n)
    end
    @dynamics begin
        Depot' = -Kaᵢ * Depot
        Central' = Kaᵢ * Depot - (CL / Vcᵢ) * Central
        Response' = Kin * (1 + EFF) - Koutᵢ * Response
    end
    @derived begin
        Outcome ~ @. Normal(Response, abs(Response) * σ)
    end
end

To investigate the effect of the existence of random effects in the data generating process, we fix the dose level to 1.0 for all subjects and assume a single dose at baseline, for simplicity.

Using the following code, we can simulate sparse data for 30 subjects. Each subject has 2 observations at random time points within the first 10 hours after the dose. The same true_parameters are used to simulate the data. However, these parameters now refer to the typical values of the individual parameters due to the existence of random effects in the model.

sims_sparse = [
    simobs(
        data_model_heterogeneous,
        Subject(; events = DosageRegimen(1.0), id = i),
        true_parameters;
        obstimes = 10 .* sort!(rand(StableRNG(i), 2)),
    ) for i = 1:30
]
population_sparse = Subject.(sims_sparse)
plotgrid(population_sparse)

We then fit the same UDE model to the sparse data using the same fitting procedure as before. Note that no random effects are assumed in the fitting model, which is fundamentally incorrect given the data-generating process.

fpm_sparse = fit(
    ude_model_knowledge,
    population_sparse,
    init_params(ude_model_knowledge),
    MAP(NaivePooled()),
)

We then plot the predictions of the fitted model for all subjects in the training population and all the observations. The predictions are made at a finer time resolution to visualize the dynamics better.

pred = predict(fpm_sparse; obstimes = 0:0.01:15);
plotgrid(pred)

# plot them all stacked ontop of oneanother
fig = Figure();
ax = Axis(fig[1, 1]; xlabel = "Time", ylabel = "Outcome", title = "Stacked predictions")
for i in eachindex(pred)
    plotgrid!([ax], pred[i:i]; data = (; color = Cycled(i)))
end
fig

Since the dose level is the same across all subjects, the only source of variability in the fitting model is assumed to be the residual noise. Therefore, the model is not able to capture the individual differences in the PK and PD parameters. The prediction made by the model is therefore a single curve that represents the average behavior of the population, but it does not capture the systematic individual differences in the PK and PD responses. This is a fundamental limitation of the UDE model because it lacks random effects.

11 Conclusion

In this tutorial, we explored how to use DeepPumas to model pharmacokinetic and pharmacodynamic data using 3 approaches:

  1. A simple multi-layer perceptron (MLP) that maps time to a noisy outcome.
  2. A neural ordinary differential equation (NODE) that models the dynamics of the system using dynamic compartments.
  3. A universal differential equation (UDE) that encodes domain knowledge about the system and can generalize better across subjects.

We demonstrated how to fit these models to synthetic data and how to visualize the predictions. We also explored how to improve the generalization of the models by encoding more domain knowledge, demonstrating the generalization in both the dense and sparse data cases. Finally, we discussed the limitations of the UDE model when random effects are present in the data-generating process but not in the model.