Multivariate Covariate Simulations in Pumas with Copulas.jl


Vijay Ivaturi

Andreas Noack

Patrick Kofod Mogensen

1 Introduction

When performing simulations as part of a pharmacometric workflow it is always useful to be able to construct flexible covariate distributions.

One method is to sample subjects from a subject bank. This is a database of subjects with covariate values associated to them. These can be simulated by someone else or they can be real subejcts. To generate a new dataset we can then sample subjects until we have enough, and we will have new subjects with preserved correlation structure between covariates.

Another method is to parametrically simulate subjects. If all covariates can fit into a multivariate normal (Gaussian) framework this is trivial. However, once we have distributions that are only positive, have heavy tails, are truncated at some values, it becomes much more complicated to simulate. We will not go into the theory in this tutorial, but one alternative that is still parametric such that we can simulate new unique subjects is Copulas. Copulas work by connecting continuous marginal distributions using Sklar’s theorem (wikipedia link) into on connected Copula that is able to provide correlation structure between the variables we want to sample.

2 Practical example

Here, we will use Copulas as implemented in the Copulas.jl package as described in Laverny and Jimenez (2024). This package implements Copulas that are consistent with the Distributions package that we also use in Pumas for randoms effects, errors distributions on observed quantities, and so on. First, we load the required packages.

using Pumas
using Copulas
using Distributions

Then, we need to define distributions. One nice distribution that is defined on the positive real line is the Gamma distribution (wikipedia link). Here, we parameterize two marginal distributions for height and weight in males where the mean height is 175 cm with a standard deviation of 10, and the mean weight is 70 Kg with a standard deviation of 15.

height_dist_male = Gamma((175 / 10)^2, 10^2 / 175)
weight_dist_male = Gamma((70 / 15)^2, 15^2 / 70)
Distributions.Gamma{Float64}(α=21.777777777777782, θ=3.2142857142857144)

and for females

height_dist_female = Gamma((165 / 8)^2, 8^2 / 165)
weight_dist_female = Gamma((65 / 12)^2, 12^2 / 65)
Distributions.Gamma{Float64}(α=29.340277777777782, θ=2.2153846153846155)

Then, we can construct a GaussianCopula type to represent the correlation structure.

copula_male = GaussianCopula([1 0.7; 0.7 1])
copula_female = GaussianCopula([1 0.6; 0.6 1])
Copulas.GaussianCopula{2, Matrix{Float64}}(
Σ: [1.0 0.6; 0.6 1.0]

The final step in specifying the Copula is to use Sklar’s Theorem to combine the previous distributions into one connected object that we can use to simulate covariates from.

joint_dist_male = SklarDist(copula_male, (height_dist_male, weight_dist_male))
joint_dist_female = SklarDist(copula_female, (height_dist_female, weight_dist_female))
Copulas.SklarDist{Copulas.GaussianCopula{2, Matrix{Float64}}, Tuple{Distributions.Gamma{Float64}, Distributions.Gamma{Float64}}}(
C: Copulas.GaussianCopula{2, Matrix{Float64}}(
Σ: [1.0 0.6; 0.6 1.0]

m: (Distributions.Gamma{Float64}(α=425.390625, θ=0.3878787878787879), Distributions.Gamma{Float64}(α=29.340277777777782, θ=2.2153846153846155))

Then, we can sample from each Copula using the usual rand function:

height, weight = rand(joint_dist_female)
2-element Vector{Float64}:

It will return the sampled height and weight for a sampled female subject.

3 Simulating a Population

Finally, we can do the actual simulation. We defined one Copula for males and one for females, so the only remaining component is to define the probability of being either one of the two categories in the SEX covariates. This is controlled by ismale_prob below.

# Define the population size and sex distribution
population_size = 100
ismale_prob = 0.5  # Equal probability of male and female

# Create an array of virtual subjects
pop = map(1:population_size) do i
    # Sample sex
    ismale = rand(Bernoulli(ismale_prob))

    # Sample height and weight based on sex, using the copula distributions
    height, weight = rand(ismale ? joint_dist_male : joint_dist_female)

    # Create a virtual subject with the sampled values
    Subject(id = i, covariates = (; height, weight, ismale))
  Subjects: 100
  Covariates: height, weight, ismale

Lastly, we can check the covariates sampled as part of the population creation in terms of their means but also their correlations. The whole point here was that our subjects should have correlated covariates not just useful levels and dispersion as measured independently per covariate.

    groupby(DataFrame(pop), "ismale"),
    "height" .=> [mean, std],
    "weight" .=> [mean, std],
    ["weight", "height"] => cor,
2×6 DataFrame
Row ismale height_mean height_std weight_mean weight_std weight_height_cor
Bool? Float64 Float64 Float64 Float64 Float64
1 false 165.128 7.86433 63.7466 12.5318 0.407383
2 true 174.2 10.2845 69.641 15.7639 0.61145

We conclude that we got the means, standard deviations and correlations that we expected.

4 Conclusion

In this small tutorial, we showed how to simulate multivariate, parametric covariate distributions. This can be a great alternative to generating covariate samples from resampling when there are not enough candidates in the “subject bank” or if the method is not useful for some other reason. We saw how to specify the Copulay through the marginal distributions, how to specify correlations, how to sample and how to put it all together into a final Population. In future tutorials, we will explain the underlying concepts of Copulas and provide workflows on the use of copulas for covariate sampling and population creation for simple to complex trial designs.


Laverny, Oskar, and Santiago Jimenez. 2024. “Copulas.jl: A Fully Distributions.jl-Compliant Copula Package.” Journal of Open Source Software 9 (94): Santiago Jimenez.