Simulations, Random Number Generator, and Reproducibility
Author
Jose Storopoli
Several Pumas’ functions use random number generators. Unless specifically controlled the results of these functions will change from run to run. Additionally, when functions calling random number generators are run in parallel, the results can be even more unpredictable. This behavior can be problematic when we want to reproduce the results of a simulation.
In this tutorial, we will learn how to set the seed of the random number generator, and feed it to a Pumas function, to ensure reproducibility.
PumasModel
Parameters: tvcl, tvvc, tvka, Ω, σ_prop
Random effects: η
Covariates: Wt
Dynamical system variables: Depot, Central
Dynamical system type: Matrix exponential
Derived: conc, dv
Observed: conc, dv
As before, we’ll set the values of the parameter estimates:
population =map( i ->Subject(; id = i, events = ev, covariates = (; Wt =70), observations = (; dv =nothing), ),1:5,)
Population
Subjects: 5
Covariates: Wt
Observations: dv
map function
The map function applies a function to each element of an iterable. In this case, we are applying the Subject constructor to each element of the range 1:5. This is the same as writing
population = [Subject(id = i, events = ev, covariates = (; Wt =70), observations = (; dv =nothing)) for i =1:5]
If you want to know more about Julia syntax and the map function, check our Julia Syntax tutorial.
Now we are ready to simulate things with simobs. However, before we do that, let’s explain what a random number generator is.
1 Random Number Generator
A random number generator (RNG) is a function that generates a sequence of numbers that is not predictable. These numbers are called random numbers.
In Pumas, a number of functions use random number generators under the hood. Most notably the simobs function.
Pumas functions that use random number generators
This a non-exhaustive list of Pumas functions that use random number generators:
simobs
inspect when NPDEs are calculated with the nsim keyword argument
fit when using any Bayesian or SAEM estimation method
infer when using Bootstrap or SIR methods
any Pumas Bayesian cross-validation method
vpc
At first glance, it may appear that computers can generate random numbers spontaneously. However, it’s crucial to understand that computers, by their very nature, are deterministic machines. This means that their operations can be predicted and replicated given the same initial conditions and inputs.
In the context of generating random numbers, computers employ what are known as Pseudo-Random Number Generators (PRNGs). PRNGs are algorithms that use mathematical formulas or pre-calculated tables to produce sequences of numbers that mimic the properties of truly random numbers. The key characteristic of a PRNG is that it starts with an initial value, known as a “seed”. Given the same seed, a PRNG will always generate the same sequence of numbers.
This predictability, rather than being a limitation, is extremely useful for applications requiring reproducibility, such as simulations in scientific research. By using the same seed, researchers can generate the same sequence of ‘random’ numbers, ensuring that experiments can be replicated and verified by others. Thus, while the numbers generated by a PRNG may not be truly random in a mathematical sense, they provide a valuable tool for scientific and other applications where reproducibility is essential.
The default random number generator in Julia is Random.default_rng().
Pumas’ default random number generator
By default, in Pumas random numbers are generated with Random.default_rng(). Additionally, all functions in Pumas that expose an rng keyword argument, use Random.default_rng() as the default value.
Under the hood, Random.default_rng() is a Random.TaskLocalRNG object:
usingRandomtypeof(Random.default_rng())
TaskLocalRNG
As suggested by its name, Random.TaskLocalRNG is a task-local random number generator that is deterministically defined by the parent task at the time of its creation. This means that this random number generator can be used safely used by multiple tasks concurrently.
There are several types of random number generators available in Julia:
Random.Xoshiro: default in Julia and in Pumas.
Random.MersenneTwister: default in R and Python.
Random.RandomDevice: uses the system’s random number generator, e.g. /dev/urandom in UNIX/Linux-based systems.
In the past, Julia used the Random.MersenneTwister as the default random number generator. However, the Random.Xoshiro uses less memory and is faster than the Random.MersenneTwister. As a side note, the Random.RandomDevice uses the system’s random number generator and is useful for cryptographic applications.
The way we deterministically define the sequence from a pseudo random number generator is by setting a seed value.
2 Setting a Seed
A seed is a number that initializes the random number generator. By setting the seed value, we ensure that the sequence of random numbers generated will be reproducible.
In Julia, the rand function is a random number generator. It generates a uniformly distributed random number between 0 and 1.
rand()
0.9963212260159797
Have a look at this example. Let’s generate a sequence of 3 random numbers with the seed 123.
As you can see, the sequence of random numbers is the same.
Note that if we initially set a seed value, but do not reset the random number generator between calls to rand, we will get different sequences of random numbers between calls because we are sampling further down the sequence of pseudo-random numbers:
However, we get the same behavior of this sequence of two calls to rand every time we run the code. This is due to the fact that every time that we call a PRNG in Julia, it advances its internal state. Setting a seed resets the internal state of the PRNG to a certain value.
3 Reproducibility
Knowing that setting a seed ensures that the random number generator will always generate the same sequence of numbers, there are two main ways to ensure reproducibility in Pumas. They both involve setting the seed of the random number generator:
Local reproducibility: Set the seed of the random number generator inside a function call.
Global reproducibility: Set the seed of the random number generator globally.
Best Practices for Reproducibility
It is best practice to only use global reproducibility in the finalized analysis script.
Local reproducibility is useful for some interactive work, and debugging purposes. However, it can introduce spurious correlations in the pseudorandom numbers which can bias analyses.
When you set the seed of the random number generator inside a function call, you ensure that the function call will always generate the same sequence of numbers. This is useful when you want to ensure reproducibility in a specific function call.
When you set the seed of the random number generator globally, you ensure that all function calls that use the random number generator will generate the same sequence of numbers.
3.1 Local Reproducibility: Setting the Seed Inside a Function Call
To set the seed of the random number generator inside a function call, you need to pass the rng keyword argument to the function call. Note that the rng value should be a PRNG that has the seed set.
PRNGs in Julia
As we’ve seen, Julia has several types of PRNGs. To pass a seed to a PRNG, you need to instantiate the PRNG with the seed value. For example, to set the seed of the Random.Xoshiro PRNG to 123, you would do:
Random.Xoshiro(123)
Analogously, to set the seed of the Random.MersenneTwister PRNG to 123, you would do:
Random.MersenneTwister(123)
Here’s an example of how to set the seed of the random number generator inside a simobs function call:
If we run this code several times, we will always get the same sequence of simulated observations because the random sequence of values from rng is reset at each function call by explicitly specifying the seed value:
As you can see we have different results. However, a script that sets the seed globally will always generate the same results. So it is up to you what kind of reproducibility you want to achieve: global or local.
If you want to use a different PRNG, you have to pass it as the rng keyword argument to all Pumas’ function that accept it.
Functions that don’t expose an rng keyword argument
You might come across functions outside of Pumas that, despite using random number generators under the hood, don’t expose an rng keyword argument. If these functions internally use the default PRNG, you can use Random.seed! to make their computations reproducible.
Here’s an example of how to set the seed globally with a non-default PRNG:
As with the default PRNG, if we run the same simobs function call with the same rng object again, we won’t get the same sequence of simulated observations:
In our population creation above, we set the covariate Wt to 70. This was a simple example, but in practice we would often like to set the covariate Wt to a random value from some distribution of possible values. This can also be done in a reproducible way. For a uniform random distribution we can do this using the rand function and setting a seed value. Other random distributions are possible, such as a normal distribution from the randn function.
For example, if you want to set the covariate Wt to a random value, first create a rand_covariates function:
The rand_covariates function generates a random integer value for the covariate Wt between 50 and 100 using the default random number generator and outputs a NamedTuple with the covariate Wt.
Now, we just need to incorporate the rand_covariates function into the Subject constructor:
Random.seed!(123)population =map( i ->Subject(; id = i, events = ev, covariates =rand_covariates(), observations = (; dv =nothing), ),1:5,)
Population
Subjects: 5
Covariates: Wt
Observations: dv
Note that we are not setting the seed of the random number generator inside the rand_covariates function. Otherwise, we would always get the same sequence of random values for the covariate Wt. And all subjects would have the same value for the covariate Wt. Here we don’t need local reproducibility. Instead we need global reproducibility.
Reproducibility across Pumas versions
It is important to note that the reproducibility of simulations across different versions of Pumas is not guaranteed.
5 Conclusion
In this tutorial, we learned how to set the seed of the random number generator, and feed it to a Pumas function, to ensure reproducibility. We also learned that there are two main ways to ensure reproducibility:
Set the seed of the random number generator inside a function call.
Set the seed of the random number generator globally.
As a final note, it is best practice to only use global reproducibility in the finalized analysis script. Local reproducibility can be especially helpful when you are doing interactive work and you want to ensure that a code snippet will always return the same thing even if you add code before it, but should typically be removed from the final result.