Simulations, Random Number Generator, and Reproducibility
Author
Jose Storopoli
Several Pumas’ functions use random number generators. Unless specifically controlled the results of these functions will change from run to run. Additionally, when functions calling random number generators are run in parallel, the results can be even more unpredictable. This behavior can be problematic when we want to reproduce the results of a simulation.
In this tutorial, we will learn how to set the seed of the random number generator, and feed it to a Pumas function, to ensure reproducibility.
PumasModel
Parameters: tvcl, tvvc, tvka, Ω, σ_prop
Random effects: η
Covariates: Wt
Dynamical system variables: Depot, Central
Dynamical system type: Matrix exponential
Derived: conc, dv
Observed: conc, dv
As before, we’ll set the values of the parameter estimates:
population =map( i ->Subject(; id = i, events = ev, covariates = (; Wt =70), observations = (; dv =nothing), ),1:5,)
Population
Subjects: 5
Covariates: Wt
Observations: dv
map function
The map function applies a function to each element of an iterable. In this case, we are applying the Subject constructor to each element of the range 1:5. This is the same as writing:
If you want to know more about Julia syntax and the map function, check our Julia Syntax tutorial.
Now we are ready to simulate things with simobs. However, before we do that, let’s explain what a random number generator is.
1 Random Number Generator
A random number generator (RNG) is a function that generates a sequence of numbers that is not predictable. These numbers are called random numbers.
In Pumas, a number of functions use random number generators under the hood. Most notably the simobs function.
Pumas functions that use random number generators
This a non-exhaustive list of Pumas functions that use random number generators:
simobs
inspect when NPDEs are calculated with the nsim keyword argument
fit when using any Bayesian or SAEM estimation method
infer when using Bootstrap or SIR methods
any Pumas Bayesian cross-validation method
vpc
At first glance, it may appear that computers can generate random numbers spontaneously. However, it’s crucial to understand that computers, by their very nature, are deterministic machines. This means that their operations can be predicted and replicated given the same initial conditions and inputs.
In the context of generating random numbers, computers employ what are known as Pseudo-Random Number Generators (PRNGs). PRNGs are algorithms that use mathematical formulas or pre-calculated tables to produce sequences of numbers that mimic the properties of truly random numbers. The key characteristic of a PRNG is that it starts with an initial value, known as a “seed”. Given the same seed, a PRNG will always generate the same sequence of numbers.
This predictability, rather than being a limitation, is extremely useful for applications requiring reproducibility, such as simulations in scientific research. By using the same seed, researchers can generate the same sequence of ‘random’ numbers, ensuring that experiments can be replicated and verified by others. Thus, while the numbers generated by a PRNG may not be truly random in a mathematical sense, they provide a valuable tool for scientific and other applications where reproducibility is essential.
Pumas’ default random number generator
Pumas under the hood uses the Pumas.default_rng() for generating random numbers. Which is the same as Random.default_rng() in Julia:
All functions that expose an rng keyword argument, use the Pumas.default_rng() as the default value. Under the hood Pumas.default_rng() is a Random.TaskLocalRNG object:
typeof(Pumas.default_rng())
TaskLocalRNG
Under the hood Random.TaskLocalRNG wraps a random number generator and makes it task-local. Random.TaskLocalRNG is a random number generator that is deterministically defined by the parent task at the time of its creation. This means that this random number generator can be used safely in parallel computations.
There are several types of random number generators available in Julia:
Random.Xoshiro: default in Julia and in Pumas.
Random.MersenneTwister: default in R and Python.
Random.RandomDevice: uses the system’s random number generator, e.g. /dev/urandom in UNIX/Linux-based systems.
In the past, Julia used the Random.MersenneTwister as the default random number generator. However, the Random.Xoshiro uses less memory and is faster than the Random.MersenneTwister. As a side note, the Random.RandomDevice uses the system’s random number generator and is useful for cryptographic applications.
The way we deterministically define the sequence from a pseudo random number generator is by setting a seed value.
2 Setting a Seed
A seed is a number that initializes the random number generator. By setting the seed value, we ensure that the sequence of random numbers generated will be reproducible.
In Julia, the rand function is a random number generator. It generates a uniformly distributed random number between 0 and 1.
rand()
0.6502013236660924
Have a look at this example. Let’s generate a sequence of 3 random numbers with the seed 123.
As you can see, the sequence of random numbers is the same.
Note that if we initially set a seed value, but do not reset the random number generator between calls to rand, we will get different sequences of random numbers between calls because we are sampling further down the sequence of pseudo-random numbers:
However, we get the same behavior of this sequence of two calls to rand every time we run the code. This is due to the fact that every time that we call a PRNG in Julia, it advances its internal state. Setting a seed resets the internal state of the PRNG to a certain value.
3 Reproducibility
Knowing that setting a seed ensures that the random number generator will always generate the same sequence of numbers, there are two main ways to ensure reproducibility in Pumas. They both involve setting the seed of the random number generator:
Local reproducibility: Set the seed of the random number generator inside a function call.
Global reproducibility: Set the seed of the random number generator globally.
Best Practices for Reproducibility
It is best practice to only use global reproducibility in the finalized analysis script.
Local reproducibility is useful for some interactive work, and debugging purposes. However, it can introduce spurious correlations in the pseudorandom numbers which can bias analyses.
When you set the seed of the random number generator inside a function call, you ensure that the function call will always generate the same sequence of numbers. This is useful when you want to ensure reproducibility in a specific function call.
When you set the seed of the random number generator globally, you ensure that all function calls that use the random number generator will generate the same sequence of numbers.
3.1 Local Reproducibility: Setting the Seed Inside a Function Call
To set the seed of the random number generator inside a function call, you need to pass the rng keyword argument to the function call. Note that the rng value should be a PRNG that has the seed set.
PRNGs in Julia
As we’ve seen, Julia has several types of PRNGs. To pass a seed to a PRNG, you need to instantiate the PRNG with the seed value. For example, to set the seed of the Random.Xoshiro PRNG to 123, you would do:
Random.Xoshiro(123)
Analogously, to set the seed of the Random.MersenneTwister PRNG to 123, you would do:
Random.MersenneTwister(123)
Here’s an example of how to set the seed of the random number generator inside a simobs function call:
If we run this code several times, we will always get the same sequence of simulated observations because the random sequence of values from rng is reset at each function call by explicitly specifying the seed value:
3.2 Global Reproducibility: Setting the Seed Globally
To set the seed of the random number generator globally, you need to instantiate a PRNG with the seed value outside of function calls. You will need to pass this PRNG to the rng keyword argument of all Pumas’ functions that accept it. Calls to the sequence of random values in rng will proceed down the list of random values without resetting because there is no reapplication of the seed value.
Here’s an example of how to set the seed of the random number generator globally:
As you can see we have different results. However, the script that sets the seed globally will always generate the same results. So it is up to you what kind of reproducibility you want to achieve: global or local.
Functions that don’t expose an rng keyword argument
You might come across functions outside of Pumas that, despite using random number generators under the hood, don’t expose an rng keyword argument. In this case, you can use the Random.seed! function to set the seed of the random number generator globally. This will ensure that all function calls that use the random number generator will generate the same sequence of numbers.
At the top of your script, after the using statement, you can set the seed of the random number generator globally:
usingRandomRandom.seed!(123) # or any other number
4 Random Covariates
In our population creation above, we set the covariate Wt to 70. This was a simple example, but in practice we would often like to set the covariate Wt to a random value from some distribution of possible values. This can also be done in a reproducible way. For a uniform random distribution we can do this using the rand function and setting a seed value. Other random distributions are possible, such as a normal distribution from the randn function.
Every function that uses a random number generator in Julia, as in Pumas, accepts rng either as a positional or keyword argument. For example, if you want to set the covariate Wt to a random value, first create a rand_covariates function that uses the random number generator:
The rand_covariates function above takes a random number generator rng as an argument. It then generates a random integer value for the covariate Wt between 50 and 100 using the random number generator rng and outputs a NamedTuple with the covariate Wt.
Now, we just need to incorporate the rand_covariates function into the Subject constructor:
rng =Random.Xoshiro(123)population =map( i ->Subject(; id = i, events = ev, covariates =rand_covariates(rng), observations = (; dv =nothing), ),1:5,)
Population
Subjects: 5
Covariates: Wt
Observations: dv
Note that we are not setting the seed of the random number generator inside the rand_covariates function. Otherwise, we would always get the same sequence of random values for the covariate Wt. And all subjects would have the same value for the covariate Wt. Here we don’t need local reproducibility. Instead we need global reproducibility.
Reproducibility across Pumas versions
It is important to note that the reproducibility of simulations across different versions of Pumas is not guaranteed.
5 Conclusion
In this tutorial, we learned how to set the seed of the random number generator, and feed it to a Pumas function, to ensure reproducibility. We also learned that there are two main ways to ensure reproducibility in Pumas:
Set the seed of the random number generator inside a function call.
Set the seed of the random number generator globally.
We also learned that every function that uses a random number generator in Julia, as in Pumas, accepts rng either as a positional or keyword argument. This is useful when we want to set random covariates in a reproducible way.
As a final note, it is best practice to only use global reproducibility in the finalized analysis script. Local reproducibility can be especially helpful when you are doing interactive work and you want to ensure that a code snippet will always return the same thing even if you add code before it, but should typically be removed from the final result.