# Simulations, Random Number Generator, and Reproducibility

Author

Jose Storopoli

Several Pumas’ functions use random number generators. Unless specifically controlled the results of these functions will change from run to run. Additionally, when functions calling random number generators are run in parallel, the results can be even more unpredictable. This behavior can be problematic when we want to reproduce the results of a simulation.

In this tutorial, we will learn how to set the seed of the random number generator, and feed it to a Pumas function, to ensure reproducibility.

First let’s recap the simulation in Generating and Simulating Populations:

``````using Pumas

model = @model begin
@param begin
tvcl ∈ RealDomain(; lower = 0)
tvvc ∈ RealDomain(; lower = 0)
tvka ∈ RealDomain(; lower = 0)
Ω ∈ PSDDomain(3)
σ_prop ∈ RealDomain(; lower = 0)
end

@random begin
η ~ MvNormal(Ω)
end

@covariates Wt

@pre begin
CL = tvcl * (Wt / 70)^0.75 * exp(η[1])
Vc = tvvc * (Wt / 70) * exp(η[2])
Ka = tvka * exp(η[3])
end

@dynamics begin
Depot' = -Ka * Depot
Central' = Ka * Depot - Central * CL / Vc
end

@derived begin
conc = @. Central / Vc
dv ~ @. Normal(conc, conc * σ_prop)
end

end``````
``````PumasModel
Parameters: tvcl, tvvc, tvka, Ω, σ_prop
Random effects: η
Covariates: Wt
Dynamical variables: Depot, Central
Derived: conc, dv
Observed: conc, dv``````

As before, we’ll set the values of the parameter estimates:

``````fixeffs =
(; tvcl = 0.4, tvvc = 20, tvka = 1.1, Ω = Diagonal([0.04, 0.04, 0.04]), σ_prop = 0.2)``````
``````(tvcl = 0.4,
tvvc = 20,
tvka = 1.1,
Ω = [0.04 0.0 0.0; 0.0 0.04 0.0; 0.0 0.0 0.04],
σ_prop = 0.2,)``````

And a `DosageRegimen`:

``ev = DosageRegimen(100; time = 0)``
1×10 DataFrame
Row time cmt amt evid ii addl rate duration ss route
Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 NCA.Route
1 0.0 1 100.0 1 0.0 0 0.0 0.0 0 NullRoute

Finally, we’ll build a `Population`:

``````population = map(
i -> Subject(;
id = i,
events = ev,
covariates = (; Wt = 70),
observations = (; dv = nothing),
),
1:5,
)``````
``````Population
Subjects: 5
Covariates: Wt
Observations: dv``````
`map` function

The `map` function applies a function to each element of an iterable. In this case, we are applying the `Subject` constructor to each element of the range `1:5`. This is the same as writing:

``````population = [
Subject(id = 1, events = ev, covariates = (; Wt = 70), observations = (; dv = nothing)),
Subject(id = 2, events = ev, covariates = (; Wt = 70), observations = (; dv = nothing)),
Subject(id = 3, events = ev, covariates = (; Wt = 70), observations = (; dv = nothing)),
Subject(id = 4, events = ev, covariates = (; Wt = 70), observations = (; dv = nothing)),
Subject(id = 5, events = ev, covariates = (; Wt = 70), observations = (; dv = nothing)),
]``````

If you want to know more about Julia syntax and the `map` function, check our Julia Syntax tutorial.

Now we are ready to simulate things with `simobs`. However, before we do that, let’s explain what a random number generator is.

## 1 Random Number Generator

A random number generator (RNG) is a function that generates a sequence of numbers that is not predictable. These numbers are called random numbers.

In Pumas, a number of functions use random number generators under the hood. Most notably the `simobs` function.

Pumas functions that use random number generators

This a non-exhaustive list of Pumas functions that use random number generators:

• `simobs`
• `inspect` when NPDEs are calculated with the `nsim` keyword argument
• `fit` when using any Bayesian or SAEM estimation method
• `infer` when using `Bootstrap` or `SIR` methods
• any Pumas Bayesian cross-validation method
• `vpc`

At first glance, it may appear that computers can generate random numbers spontaneously. However, it’s crucial to understand that computers, by their very nature, are deterministic machines. This means that their operations can be predicted and replicated given the same initial conditions and inputs.

In the context of generating random numbers, computers employ what are known as Pseudo-Random Number Generators (PRNGs). PRNGs are algorithms that use mathematical formulas or pre-calculated tables to produce sequences of numbers that mimic the properties of truly random numbers. The key characteristic of a PRNG is that it starts with an initial value, known as a “seed”. Given the same seed, a PRNG will always generate the same sequence of numbers.

This predictability, rather than being a limitation, is extremely useful for applications requiring reproducibility, such as simulations in scientific research. By using the same seed, researchers can generate the same sequence of ‘random’ numbers, ensuring that experiments can be replicated and verified by others. Thus, while the numbers generated by a PRNG may not be truly random in a mathematical sense, they provide a valuable tool for scientific and other applications where reproducibility is essential.

Pumas’ default random number generator

Pumas under the hood uses the `Pumas.default_rng()` for generating random numbers. Which is the same as `Random.default_rng()` in Julia:

``````using Random
Random.default_rng === Pumas.default_rng``````
``true``

All functions that expose an `rng` keyword argument, use the `Pumas.default_rng()` as the default value. Under the hood `Pumas.default_rng()` is a `Random.TaskLocalRNG` object:

``typeof(Pumas.default_rng())``
``TaskLocalRNG``

Under the hood `Random.TaskLocalRNG` wraps a random number generator and makes it task-local. `Random.TaskLocalRNG` is a random number generator that is deterministically defined by the parent task at the time of its creation. This means that this random number generator can be used safely in parallel computations.

There are several types of random number generators available in Julia:

• `Random.Xoshiro`: default in Julia and in Pumas.
• `Random.MersenneTwister`: default in R and Python.
• `Random.RandomDevice`: uses the system’s random number generator, e.g. `/dev/urandom` in UNIX/Linux-based systems.

In the past, Julia used the `Random.MersenneTwister` as the default random number generator. However, the `Random.Xoshiro` uses less memory and is faster than the `Random.MersenneTwister`. As a side note, the `Random.RandomDevice` uses the system’s random number generator and is useful for cryptographic applications.

The way we deterministically define the sequence from a pseudo random number generator is by setting a seed value.

## 2 Setting a Seed

A seed is a number that initializes the random number generator. By setting the seed value, we ensure that the sequence of random numbers generated will be reproducible.

In Julia, the `rand` function is a random number generator. It generates a uniformly distributed random number between 0 and 1.

``rand()``
``0.740288761967004``

Have a look at this example. Let’s generate a sequence of 3 random numbers with the seed `123`.

``````Random.seed!(123)
rand(3)``````
``````3-element Vector{Float64}:
0.521213795535383
0.5868067574533484
0.8908786980927811``````

Now let’s generate the same sequence of random numbers with the same seed:

``````Random.seed!(123)
rand(3)``````
``````3-element Vector{Float64}:
0.521213795535383
0.5868067574533484
0.8908786980927811``````

As you can see, the sequence of random numbers is the same.

Note that if we initially set a seed value, but do not reset the random number generator between calls to `rand`, we will get different sequences of random numbers between calls because we are sampling further down the sequence of pseudo-random numbers:

``````Random.seed!(123)
rand(3)``````
``````3-element Vector{Float64}:
0.521213795535383
0.5868067574533484
0.8908786980927811``````
``rand(3)``
``````3-element Vector{Float64}:
0.48719485750855107
0.6665212532492512
0.4797705026351681``````

However, we get the same behavior of this sequence of two calls to `rand` every time we run the code. This is due to the fact that every time that we call a PRNG in Julia, it advances its internal state. Setting a seed resets the internal state of the PRNG to a certain value.

## 3 Reproducibility

Knowing that setting a seed ensures that the random number generator will always generate the same sequence of numbers, there are two main ways to ensure reproducibility in Pumas. They both involve setting the seed of the random number generator:

1. Local reproducibility: Set the seed of the random number generator inside a function call.
2. Global reproducibility: Set the seed of the random number generator globally.
Best Practices for Reproducibility

It is best practice to only use global reproducibility in the finalized analysis script.

Local reproducibility is useful for some interactive work, and debugging purposes. However, it can introduce spurious correlations in the pseudorandom numbers which can bias analyses.

When you set the seed of the random number generator inside a function call, you ensure that the function call will always generate the same sequence of numbers. This is useful when you want to ensure reproducibility in a specific function call.

When you set the seed of the random number generator globally, you ensure that all function calls that use the random number generator will generate the same sequence of numbers.

### 3.1 Local Reproducibility: Setting the Seed Inside a Function Call

To set the seed of the random number generator inside a function call, you need to pass the `rng` keyword argument to the function call. Note that the `rng` value should be a PRNG that has the seed set.

PRNGs in Julia

As we’ve seen, Julia has several types of PRNGs. To pass a seed to a PRNG, you need to instantiate the PRNG with the seed value. For example, to set the seed of the `Random.Xoshiro` PRNG to 123, you would do:

``Random.Xoshiro(123)``

Analogously, to set the seed of the `Random.MersenneTwister` PRNG to 123, you would do:

``Random.MersenneTwister(123)``

Here’s an example of how to set the seed of the random number generator inside a `simobs` function call:

``using DataFramesMeta``
``````sims = simobs(model, population, fixeffs; rng = Random.Xoshiro(123), obstimes = 1:5)
df_sims = DataFrame(sims)
@chain df_sims begin
groupby(:time)
@combine :mean_conc = mean(:conc)
end``````
6×2 DataFrame
Row time mean_conc
Float64 Float64?
1 0.0 missing
2 1.0 3.4357
3 2.0 4.52999
4 3.0 4.84612
5 4.0 4.89696
6 5.0 4.85515

If we run this code several times, we will always get the same sequence of simulated observations because the random sequence of values from `rng` is reset at each function call by explicitly specifying the seed value:

``````sims = simobs(model, population, fixeffs; rng = Random.Xoshiro(123), obstimes = 1:5)
df_sims = DataFrame(sims)
@chain df_sims begin
groupby(:time)
@combine :mean_conc = mean(:conc)
end``````
6×2 DataFrame
Row time mean_conc
Float64 Float64?
1 0.0 missing
2 1.0 3.4357
3 2.0 4.52999
4 3.0 4.84612
5 4.0 4.89696
6 5.0 4.85515
`@chain` macro

The `@chain` macro is a part of the `DataFramesMeta` package. Similar to R’s pipe operator `%>%`, it allows you to chain operations on a DataFrame.

If you want to learn more about the `@chain` macro, check our `DataFramesMeta` tutorial.

### 3.2 Global Reproducibility: Setting the Seed Globally

To set the seed of the random number generator globally, you need to instantiate a PRNG with the seed value outside of function calls. You will need to pass this PRNG to the `rng` keyword argument of all Pumas’ functions that accept it. Calls to the sequence of random values in `rng` will proceed down the list of random values without resetting because there is no reapplication of the seed value.

Here’s an example of how to set the seed of the random number generator globally:

``````rng = Random.Xoshiro(123)

sims = simobs(model, population, fixeffs; rng = rng, obstimes = 1:5)
df_sims = DataFrame(sims)
@chain df_sims begin
groupby(:time)
@combine :mean_conc = mean(:conc)
end``````
6×2 DataFrame
Row time mean_conc
Float64 Float64?
1 0.0 missing
2 1.0 3.4357
3 2.0 4.52999
4 3.0 4.84612
5 4.0 4.89696
6 5.0 4.85515

If we run the same `simobs` function call that uses the random number generator again, we won’t get the same sequence of simulated observations:

``````sims = simobs(model, population, fixeffs; rng = rng, obstimes = 1:5)
df_sims = DataFrame(sims)
@chain df_sims begin
groupby(:time)
@combine :mean_conc = mean(:conc)
end``````
6×2 DataFrame
Row time mean_conc
Float64 Float64?
1 0.0 missing
2 1.0 3.94973
3 2.0 4.7593
4 3.0 4.86983
5 4.0 4.81396
6 5.0 4.71799

As you can see we have different results. However, the script that sets the seed globally will always generate the same results. So it is up to you what kind of reproducibility you want to achieve: global or local.

Functions that don’t expose an `rng` keyword argument

You might come across functions outside of Pumas that, despite using random number generators under the hood, don’t expose an `rng` keyword argument. In this case, you can use the `Random.seed!` function to set the seed of the random number generator globally. This will ensure that all function calls that use the random number generator will generate the same sequence of numbers.

At the top of your script, after the `using` statement, you can set the seed of the random number generator globally:

``````using Random
Random.seed!(123) # or any other number``````

## 4 Random Covariates

In our population creation above, we set the covariate `Wt` to 70. This was a simple example, but in practice we would often like to set the covariate `Wt` to a random value from some distribution of possible values. This can also be done in a reproducible way. For a uniform random distribution we can do this using the `rand` function and setting a seed value. Other random distributions are possible, such as a normal distribution from the `randn` function.

Every function that uses a random number generator in Julia, as in Pumas, accepts `rng` either as a positional or keyword argument. For example, if you want to set the covariate `Wt` to a random value, first create a `rand_covariates` function that uses the random number generator:

``````function rand_covariates(rng)
Wt = rand(rng, 50:100)
return (; Wt)
end``````
``rand_covariates (generic function with 1 method)``

The `rand_covariates` function above takes a random number generator `rng` as an argument. It then generates a random integer value for the covariate `Wt` between 50 and 100 using the random number generator `rng` and outputs a NamedTuple with the covariate `Wt`.

Now, we just need to incorporate the `rand_covariates` function into the `Subject` constructor:

``````rng = Random.Xoshiro(123)
population = map(
i -> Subject(;
id = i,
events = ev,
covariates = rand_covariates(rng),
observations = (; dv = nothing),
),
1:5,
)``````
``````Population
Subjects: 5
Covariates: Wt
Observations: dv``````

Note that we are not setting the seed of the random number generator inside the `rand_covariates` function. Otherwise, we would always get the same sequence of random values for the covariate `Wt`. And all subjects would have the same value for the covariate `Wt`. Here we don’t need local reproducibility. Instead we need global reproducibility.

Reproducibility across Pumas versions

It is important to note that the reproducibility of simulations across different versions of Pumas is not guaranteed.

## 5 Conclusion

In this tutorial, we learned how to set the seed of the random number generator, and feed it to a Pumas function, to ensure reproducibility. We also learned that there are two main ways to ensure reproducibility in Pumas:

1. Set the seed of the random number generator inside a function call.
2. Set the seed of the random number generator globally.

We also learned that every function that uses a random number generator in Julia, as in Pumas, accepts `rng` either as a positional or keyword argument. This is useful when we want to set random covariates in a reproducible way.

As a final note, it is best practice to only use global reproducibility in the finalized analysis script. Local reproducibility can be especially helpful when you are doing interactive work and you want to ensure that a code snippet will always return the same thing even if you add code before it, but should typically be removed from the final result.