Negative Binomial Regression

Author

Jose Storopoli

Caution

This lesson is really comprehensive and has a wide audience in mind. We recommend the reader to briefly navigate the Table of Contents and choose which topics to dive in. If you are in doubt feel free to read it from top to bottom.

In this lesson we’ll dive into negative binomial regression. We’ll start by defining what it is and, more importantly, when to use it.

using Pumas

1 What is Negative Binomial Regression?

Negative binomial regression is almost the same as Poisson regression. First, it is also a generalized linear model (GLM). GLMs uses a linear regression under the hood for inference but combines that with a non-linear transformation of the dependent variable (also called response).

This allows us to extend the interface of linear regression to more use cases that it would not be possible to apply it directly. This happens because of the nature of some dependent variables, being not linear and, subsequently, not continuous. Such instances occurs in binary data, discrete data, count data, or ordinal data.

Second, like Poisson regression, Negative binomial regression is a GLM that can handle count data. That is, when our response variable takes positive discrete values only, without an upper limit and with a natural lower limit of 0. It handles that by using the Poisson distrisbution to condition the values of our dependent variable.

But there is one important difference between negative binomial and Poisson regression which we will address now.

1.1 Poisson Distribution versus Negative Binomial Distribution

The Poisson distribution is used to model a random variable that takes positive discrete values. It accepts only a single parameter named \(\lambda\) which represents the constant mean rate of events in a single unit of time. In other words, \(\lambda\) represents the expected value of the distribution.

The parameter \(\lambda\) from Poisson distribution controls both the mean (sometimes referred as location parameter) and variance (sometimes referred as scale parameter). This makes the Poisson distribution not as flexible to model overdispersion in count data, since it has only one parameter to model both mean and variance.

We can deal with overdispersion by introducing an extra parameter and, thus, using a different distribution. This distribution is named negative binomial and has an extra parameter generally named \(\phi\) to control for overdispersion. High values of \(\phi\) allows for less overdispersion while lower values of \(\phi\) allows for more overdispersion. If we set \(\phi = \infty\), i.e. no overdispersion, we get exactly back the Poisson distribution.

Tip

Overdispersion is characterized by excess expected variation in data. This occurs due to the Poisson distribution making an assumption that the dependent variable \(y\) has the same mean and variance, while in the negative binomial distribution, the mean and variance do not need to be equal.

Note

One last thing before we get into the details of the negative binomial distribution is to consider an alternative parameterization.

Julia’s Distributions.jl and, consequently, Pumas’ parameterization of the negative binomial distribution follows the following the Wolfram reference:

\[\text{Negative~Binomial}(k \mid r, p) \sim \frac{\Gamma(k+r)}{k! \Gamma(r)} p^r (1 - p)^k, \quad \text{for } k = 0, 1, 2, \ldots\]

where:

\(k\) – number of failures before the \(r\)th success in a sequence of independent Bernoulli trials
\(r\) – number of successes
\(p\) – probability of success in an individual Bernoulli trial

This is not ideal for most of the modeling situations that we would employ the negative binomial distribution. In particular, we want to have a parameterization that is more appropriate for count data.

What we need is the familiar mean (or location) and variance (or scale) parameterization. If we look in alternative parameterizations, we have the following two equations: \[\begin{aligned} \mu &= \frac{r (1 - p)}{p}\\ \mu + \frac{\mu^2}{\phi} &= \frac{r (1 - p)}{p^2} \end{aligned}\]

With a little bit of algebra, we can substitute the first equation above into the right hand side of the second equation and get the following:

\[\begin{aligned} \mu + \frac{\mu^2}{\phi} &= \frac{μ}{p} \\ 1 + \frac{\mu}{\phi} &= \frac{1}{p} \\ p &= \frac{1}{\frac{1 + \mu}{\phi}} \end{aligned}\]

Then in the equation of the negative binomial distribution we have:

\[\begin{aligned} \mu &= r \left(1 - \left( \frac{1}{\frac{1 + \mu}{\phi}} \right) \right) \cdot \left(1 + \frac{\mu}{\phi} \right) \\ \mu &= r \left( \left(1 + \frac{\mu}{\phi} \right) - 1 \right) \\ r &= \phi \end{aligned}\]

Hence, the resulting map is:

\[\text{Negative-Binomial}(\mu, \phi) \equiv \text{Negative~Binomial} \left( r = \phi, p = \frac{1}{\frac{1 + \mu}{\phi}} \right)\]

So we can use this parameterization in our negative binomial regression model. But first, we need to define an alternative negative binomial distribution function:

function NegativeBinomial2(μ, ϕ)
    p = 1 / (1 + μ / ϕ)
    p = p > 0 ? p : 1e-4 # numerical stability
    r = ϕ

    return NegativeBinomial(r, p)
end

NegativeBinomial2 (generic function with 1 method)

Take a look at some negative binomial distribution plots for different values of \(\phi\) while we set \(\lambda = 3\):

using CairoMakie
using AlgebraOfGraphics

x = 0:0.1:10.0
ϕs = [1, 3, 5]
λ = 3
dists = [NegativeBinomial2(λ, ϕ) for ϕ in ϕs]
pdfs = mapreduce(d -> pdf.(d, x), vcat, dists)
plt =
    data((; ϕs = repeat(ϕs; inner = length(x)), x = repeat(x; outer = length(ϕs)), pdfs)) *
    mapping(
        :x => "outcome",
        :pdfs => "PMF";
        color = :ϕs => nonnumeric => L"\mathrm{parameter}~\phi",
        col = :ϕs => nonnumeric,
    ) *
    visual(BarPlot)
draw(plt; axis = (; xticks = 0:10), legend = (; position = :bottom, titleposition = :left))

As you can see, the higher the value of \(\phi\), the mean is unchanged but we have lower variance of the outcome value.

mean.([NegativeBinomial2(3, ϕ) for ϕ in [1, 3, 5]])

3-element Vector{Float64}:
 3.0
 3.0
 3.0

var.([NegativeBinomial2(3, ϕ) for ϕ in [1, 3, 5]])

3-element Vector{Float64}:
 12.0
  6.0
  4.8

1.2 Link Function \(e^x\)

The negative binomial distribution has the same caveat as the Poisson distribution: \(\lambda\) has to be positive real number higher than 0; i.e. \(\lambda > 0\). Not only that but also \(\phi\) has the same restrition: \(\phi > 0\).

This means that all parameters (or combinations of parameters) that we condition our negative binomial distribution distribution have to be higher than 0. We can accomplish this restriction with a non-linear transformation.

The exponential function is an ideal candidate to accomplish that. This is the exponential function:

\[\mathrm{exp}(x) = e^x\]

Let’s take a look at the graph of our exponential function:

data((; x = 0:0.1:10, y = exp.(0:0.1:10))) *
mapping(:x, :y => L"e^x") *
visual(Lines; color = :steelblue, linewidth = 3) |> draw

Note

The parameterization of the mean (\(\lambda\)) of a negative binomial follows the same procedure of the Poisson distribution.

Let’s define our dependent variable as \(y\) which follows a negative binomial distribution. We can use the exponential function as a link function for the mean parameter (\(\lambda\)) of the model, such that \(y\) is parameterized by the exponential transformation of these parameter values:

\[y \sim \mathrm{Negative~Binomial}(e^\alpha, \phi)\]

where \(\lambda = e^\alpha\).

That allows us to parametrize \(y\) with a real-valued parameters, while restraining \(y\) to being positive.

In this case, \(\alpha\) takes the role of an intercept, i.e. it is the basal rate of the values of \(y\).

1.3 Covariate \(x\)

Now with the dependent variable and intercept \(\alpha\) explained, let’s focus on covariate(s) \(x\).

Generally in regression problems, we attribute coefficients to act as weights to our covariate \(x\). These coefficients measure the strength of association between \(x\) and \(y\), where high positive coefficients means high positive association while high negative coefficients indicates high negative association.

Coefficients are usually denominated as the greek letter \(\beta\). Thus, we can parametrize \(y\) as:

\[y \sim \mathrm{Negative~Binomial}\left( e^\left(\alpha + \beta \cdot x \right), \phi \right)\]

Tip

Almost everything that we want to infer or guess in statistics has greek letter such as \(\beta\) and \(\alpha\), while what is observed has roman letters such as \(x\) and \(y\).

We can express it in matrix form:

\[\mathbf{y} = \mathrm{Negative~Binomial}\left( e^\left(\boldsymbol{\alpha} + \mathbf{X} \cdot \boldsymbol{\beta} \right), \phi \right)\]

where \(\mathbf{y}\), \(\boldsymbol{\alpha}\) and \(\boldsymbol{\beta}\) are vectors and \(\mathbf{X}\) is the data matrix where each row is an observation and each column a covariate.

Since \(\mathbf{y}\) and \(\mathbf{X}\) are known, all we have to do is to find the best value of our coefficient \(\beta\). That process is called model fit and we will deal with it in details further in this lesson.

1.3.1 How to Interpret Coefficient \(\beta\)?

Now, suppose we have found our ideal value for \(\beta\). How we would interpret our \(\beta\) estimated value?

First, to recap, \(\beta\) measures the strength of association between the covariate \(x\) and dependent variable \(y\). But, \(\beta\) is nested inside a non-linear transformation called exponential function:

\[\mathrm{linear~predictor} = \mathrm{exponential}(\alpha + \beta \cdot x) = e^{\left( \alpha + \beta \cdot x \right)}\]

where the linear predictor is the rate parameter (\(\lambda\)) of the mean of the negative binomial distribution that we condition our dependent variable \(y\).

You can see that if \(\beta\) takes negative values, the association between covariate \(x\) with the linear predictor is negative, i.e. higher values of \(x\) means lower values of \(y\). Also, the converse is true. If \(\beta\) takes positive values, the association between covariate \(x\) with the linear predictor is positive, i.e. higher values of \(x\) means higher values of \(y\).

Thus, if you want just to analyze the raw directional effect of covariate \(x\) you can leave \(\beta\) in its original scale. However, if you want to have a deeper interpretation of the effect of covariate \(x\) you would need to transform \(\beta\).

First, we need to make \(\beta\) comparable to our dependent variable \(y\). Since \(y\) is modeled as the exponential of a linear combination of parameters and covariates, we need to also apply the exponential function to \(\beta\).

Second, notice that the exponential function has the following property:

\[e^{(a + b)} = e^a \cdot e^b\]

Hence, we can substitute \(a\) for our intercept \(\alpha\) and \(b\) for our covariate \(x\) multiplied by our coefficient \(\beta\):

\[e^{(\alpha + \beta \cdot x)} = e^\alpha \cdot e^{\beta \cdot x}\]

which means that the \(\beta\) has a compounding effect into the basal value (intercept \(\alpha\)), instead of a linear effect as we saw in other discrete models.

Now, in order to interpret \(\beta\) we need to first exponentiate all other parameters in the linear predictor term, such as the intercept \(\alpha\). \(\alpha\) is the basal value take \(y\) takes if all of the covariates \(x\) are set to zero. Then, for each increase of 1 unit of covariate \(x\) we expect to see a compounding effect of \(\beta\) into the basal value of \(y\), i.e. the intercept \(\alpha\).

Here’s an example. Suppose that \(\alpha = 1\) and \(\beta = 0.05\). This makes the basal value of \(y\):

\[e^\alpha = e^1 = e \approx 2.7182\]

The effect of the covariate \(x\) on \(y\) is:

\[e^{\beta \cdot x} = e^{0.05x} \approx 1.05x\]

Thus every increase of 1 unit in \(x\) increases \(y\) by 1.05, i.e. 5% increase.

2 When to use Negative Binomial Regression?

Now that we know what negative binomial regression is, let’s cover when to use it. The simple answer is whenever your dependent variable is discrete positive, i.e. have discrete positive outcome as values; and has overdispersion characterized by excess expected variation in data which cannot be fully captured by a Poisson regression. Then you can use a negative binomial likelihood inside your regression model.

3 `seizurecount` Dataset

Let’s revisit the seizurecount dataset from PharmaDatasets.jl that we’ve used to run our Poisson regression example earlier to run our binomial regression in this lesson:

using DataFrames
using DataFramesMeta
using PharmaDatasets

seizurecount = dataset("seizurecount")
first(seizurecount, 5)

5×6 DataFrame

Row	ID	Age	Gender	isF	TRT	NSeizure
	Int64	Int64	String7	Int64	String7	Int64
1	1	43	Female	1	Placebo	6
2	2	55	Male	0	Placebo	2
3	3	76	Female	1	Placebo	5
4	4	93	Male	0	Placebo	0
5	5	95	Female	1	Placebo	2

We can see that we have 200 observations and also some key variables:

:NSeizure (dependent variable): discrete positive outcome values are in the range (0, 16).
:Age: Age in years.
:Gender: patient gender with 2 levels – Female, Male.
:isF: discrete binary if patient is Female – 1, 0.
:TRT: treatment variable with 2 levels – Placebo, DrugA.

3.1 Exploratory Data Analysis

Let’s see some summary statistics with describe():

describe(seizurecount)

6×7 DataFrame

Row	variable	mean	min	median	max	nmissing	eltype
	Symbol	Union…	Any	Union…	Any	Int64	DataType
1	ID	75.5	1	75.5	150	0	Int64
2	Age	48.4733	18	45.0	112	0	Int64
3	Gender		Female		Male	0	String7
4	isF	0.54	0	1.0	1	0	Int64
5	TRT		DrugA		Placebo	0	String7
6	NSeizure	3.39333	0	2.0	16	0	Int64

As we can see :Age has mean 48.47 and median 45.0 which signals for a bell-shaped (normally-distributed) distribution. We have weight values ranging from 18 and 112.

To conclude our summary statistics, :NSeizure, our dependent variable, has mean 3.39 and median 2.0 which implies a long-tailed distribution (or assimetric shifted to the right).

It is also important to check the categorical columns to see what are the number of unique values (:nunique) along with first and last values:

describe(seizurecount, :first, :last, :nunique; cols = [:Gender, :TRT])

2×4 DataFrame

Row	variable	first	last	nunique
	Symbol	String7	String7	Int64
1	Gender	Female	Female	2
2	TRT	Placebo	DrugA	2

Ok, no surprises in our categorical covariates.

@by seizurecount [:TRT, :isF] :count = length(:isF)

4×3 DataFrame

Row	TRT	isF	count
	String7	Int64	Int64
1	Placebo	0	24
2	Placebo	1	26
3	DrugA	0	45
4	DrugA	1	55

3.2 Visualizations

Now we turn to some visualizations to see what is going on beyond the summary statistics in our seizurecount dataset.

First, we’ll explore our dependent variable: :NSeizure.

data(seizurecount) *
mapping(:Gender, :NSeizure; color = :TRT, dodge = :TRT) *
AlgebraOfGraphics.expectation() |> draw

Female patients have more seizures than male patients. And male patients seems to not benefit from out treatment "DrugA".

data(seizurecount) *
mapping(:NSeizure; color = :TRT, layout = :Gender) *
AlgebraOfGraphics.density() |> draw

We can definitely see a difference between the densities of our dependent variable :NSeizures if we stratify by :TRT and :Gender.

Now let’s turn our attention to our covariates :Age and :TRT.

We can see clearily a negative relationship between :Age and our dependent variable :NSeizure:

data(seizurecount) *
mapping(:Age, :NSeizure; color = :TRT) *
(visual(Scatter; alpha = 0.7) + AlgebraOfGraphics.linear() * visual(; linewidth = 4)) |>
draw

Also, this negative relationship does not appear to have any interaction effect with :TRT since the slopes of the different :TRT arms in the scatter plot above do not differ.

However, if we stratify by :Gender, we can see that the interaction exists in males. The slopes have different values for different :TRT arms in this specific stratum:

data(seizurecount) *
mapping(:Age, :NSeizure; color = :TRT, layout = :Gender) *
(visual(Scatter; alpha = 0.7) + AlgebraOfGraphics.linear() * visual(; linewidth = 4)) |>
draw

4 Pumas modeling

Finally, we’ll create a Pumas model for negative binomial regression.

4.1 Pumas’ Workflow

Pumas’ workflow

Define a model. It can be a PumasModel or a PumasEMModel.
Define a Subject and Population.
Fit your model.
Perform inference on the model’s population-level parameters (also known as fixed effects).
Predict from a fitted model using the empirical Bayes estimate or Simulate random observations from a fitted model using the sampling distribution (also known as likelihood).
Model diagnostics and visual predictive checks.

Note

Pumas modeling is a highly iterative process. Eventually you’ll probably go back and forth in the stages of workflow. This is natural and expected in Pumas’ modeling. Just remember to have a good code organization and version control for all of the workflow iterations.

Before we define the model, we need to convert the :TRT column into a dummy variable. Currently :TRT is a string:

unique(seizurecount.TRT)

2-element Vector{InlineStrings.String7}:
 "Placebo"
 "DrugA"

@rtransform! seizurecount :DrugA = ifelse(:TRT == "DrugA", 1, 0)

4.1.1 Defining a model in Pumas

To define a model in Pumas, you can use either the macros or the function interfaces. We will cover the macros interface in this lesson.

To define a model using the macro interface, you start with a begin .. end block of code with @model:

my_model = @model begin end

PumasModel
  Parameters:
  Random effects:
  Covariates:
  Dynamical system variables:
  Dynamical system type: No dynamical model
  Derived:
  Observed:

This creates an empty model. Now we need to populate it with model components. These are additional macros that we can include inside the @model definition. We won’t be covering all the possible macros that we can use in this lesson, but here is the full list:

@param, fixed effects specifications.
@random, random effects specifications.
@covariates, covariate names.
@pre, pre-processing variables for the dynamic system and statistical specification.
@dosecontrol, specification of any dose control parameters present in the model.
@vars, shorthand notation.
@init, initial conditions for the dynamic system.
@dynamics, dynamics of the model.
@derived, statistical modeling of dependent variables.
@observed, model information to be stored in the model solution.

4.1.1.1 Model parameter with `@param`

Ok, let’s start with our negbin_model using the macro @model interface.

First, we’ll define our parameters with the @parameters macro:

@param begin
    ParameterName ∈ Domain(...)
    ...
end

Tip

To use the “in” (∈) operator in Julia, you can either replace ∈ for in or type \in <TAB> for the Unicode character.

By using the begin ... end block we can specify one parameter per line. We will just name the parameters the same name as it’s respective covariate, while also adding the β prefix to make clear that this is a coefficient.

Regarding the Domain(...), Pumas has several types of domains for you to specify in your @param block. Here is a list:

RealDomain for scalar parameters
VectorDomain for vectors
PDiagDomain for positive definite matrices with diagonal structure
PSDDomain for general positive semi-definite matrices

We will only use scalar parameters in our negative binomial regression model, so we just need the RealDomain.

Tip

If you don’t specify any arguments inside the domain constructor it will either error (for some domains that have required arguments) or will use the defaults. In the case of the RealDomain() without arguments it just uses the following arguments:

RealDomain(; lower = -∞, upper = ∞, init = 0)

@param begin
    α ∈ RealDomain()
    βAge ∈ RealDomain()
    βisF ∈ RealDomain()
    βDrugA ∈ RealDomain()
    ϕ ∈ RealDomain(; lower = 0.0001)
end

Note that, comparing to Poisson regression, we are adding one extra parameter \(\phi\) which has a lower bound set to 0.0001.

4.1.1.2 Model covariates with @covariates

We also have to specify our model covariates using the macro @covariates. This can be done using a one-liner syntax, without begin ... end block:

@covariates Age isF DrugA... # and so on

Or using the begin ... end block and specifying one covariate per line. This is what we will do, since our model has several covariates, which makes easy to visualize them and also to comment out a covariate to perform iterative model exploration:

@covariates begin
    Age
    isF
    DrugA
end

4.1.1.3 Pre-processing variables with `@pre`

We can specify all the necessary variable and statistical pre-processing with the @pre macro.

Note

The @pre block is traditionally used to specify the inputs of the Ordinary D ifferential Equations (ODE) system used for non-linear mixed-effects PK/PD models (NLME).

But we will use @pre to specify our variable transformations instead.

Here, we are defining all the coefficients-covariates multiplication operation here. Note that we are adding the prefix _ to the resulting operation. Finally, we are also defining the linear_pred as the linear predictors that should be added to the intercept and passed to the non-linear exponential function transformation:

@pre begin
    _Age = Age * βAge
    _isF = isF * βisF
    _DrugA = DrugA * βDrugA
    linear_pred = _Age + _isF + _DrugA
end

4.1.1.4 Statistical modeling of dependent variables with `@derived`

Our final block, @derived, is used to specify all the assumed distributions of observed variables that are derived from the blocks above. This is where we include our dependent variable: NSeizures.

Note that NSeizures is being declared as following a NegativeBinomial2 distribution (alternative parameterization). This is why we use the tilde notation ~. It means (much like the mathematical model notation) that NSeizures follows a NegativeBinomial2 distribution. Since NSeizures is a vector of values, we need to broadcast, i.e. vectorize, the operation with the dot . operator:

NSeizures ~ NegativeBinomial2.(exp.(λ), ϕ)

where λ is the rate parameter and ϕ is the overdispersion parameter: both ones that parametrizes the negative binomial distribution. In some cases we would need to do a lot of broadcasting in the righthand side of the ~. For example, since NSeizures depends on the exponential function of all the linear predictors, i.e. the exponential function of α + linear_pred, we would need to vectorize everything:

NSeizures ~ NegativeBinomial2.(exp.(α .+ linear_pred), ϕ)

This is cumbersome, so we can use the @. macro which tells Julia to apply the . in every operator and function call after it:

NSeizures ~ @. NegativeBinomial2(exp(α + linear_pred), ϕ)

Note

We are using the @. macro which tells Julia to vectorize (add the “dot syntax”) to all operations and function calls to the right of it.

Much better!

Tip

In this block we can use all variables defined in the previous blocks, in our example the @param, @covariates and @pre blocks.

This is our @derived block:

@derived begin
    NSeizure ~ @. NegativeBinomial2(exp(α + linear_pred), ϕ)
end

Here is our full model, once we combined all the macro blocks we just covered:

negbin_model = @model begin
    @param begin
        α ∈ RealDomain()
        βAge ∈ RealDomain()
        βisF ∈ RealDomain()
        βDrugA ∈ RealDomain()
        ϕ ∈ RealDomain(; lower = 0.0001)
    end

    @covariates begin
        Age
        isF
        DrugA
    end

    @pre begin
        _Age = Age * βAge
        _isF = isF * βisF
        _DrugA = DrugA * βDrugA
        linear_pred = _Age + _isF + _DrugA
    end

    @derived begin
        NSeizure ~ @. NegativeBinomial2(exp(α + linear_pred), ϕ)
    end
end

PumasModel
  Parameters: α, βAge, βisF, βDrugA, ϕ
  Random effects:
  Covariates: Age, isF, DrugA
  Dynamical system variables:
  Dynamical system type: No dynamical model
  Derived: NSeizure
  Observed: NSeizure

4.1.2 Define a `Subject` and `Population`

Once we have our model defined we have to specify a Subject and a Population.

In Pumas, subjects (i.e. observations) are represented by the Subject type and collections of subjects are represented as vectors of Subjects are defined as Population.

Subjects can be constructed with the Subject() constructor, for example:

Subject()

Subject
  ID: 1

We just constructed a Subject that has ID equal to 1 and no extra information. In order to provide extra information, you would need to pass keyword arguments to the Subject() constructor.

You can use the following keyword arguments inside Subject():

id: identifier.
observations: a named tuple of the dependent variables.
covariates: a named tuple containing the covariates, or nothing.
events: a vector of Events.
time: a vector of time stamps for the observations.

For this lesson, we will just use the covariates keyword argument.

This is an example of a Subject with ID equal to 42 and with some covariates specified as a named tuple:

Subject(id = 42, covariates = (; Age = 35, isF = 1, DrugA = 1))

Subject
  ID: 42
  Covariates: Age, isF, DrugA

Since a Population is just a vector of Subjects, we can use a simple map() function over the IDs present at our seizurecount dataset:

pop = map(
    i -> Subject(
        id = i,
        covariates = (;
            Age = seizurecount.Age,
            isF = seizurecount.isF,
            DrugA = seizurecount.DrugA,
        ),
    ),
    1:maximum(seizurecount.ID),
)

Population
  Subjects: 150
  Covariates: Age, isF, DrugA
  Observations:

4.1.2.1 Reading `Subject`s directly from a `DataFrame`

Another option is to use the read_pumas() function to read Subjects (and Population) directly from a DataFrame. This is more convenient than the map() with the Subject() constructor inside.

The read_pumas() function accepts as first argument a DataFrame followed by the following keyword options (which are very similar to the Subject() constructor):

observations: dependent variables specified by a vector of column names, i.e. [:NSeizure].
covariates: covariates specified by a vector of column names, i.e. [:Age, :isF, :DrugA].
id: specifies the ID column of the DataFrame.
time: specifies the time column of the DataFrame.
event_data: toggles assertions applicable to event data, either true or `false.

Our seizurecount dataset do not have any event data, so we should set event_data=false. Also, we don’t have a :time column, but Pumas need us to specify a column for this keyword argument. So let’s create a dummy column :time where all values are 0.0:

@rtransform! seizurecount :time = 0.0;

Now, we can create our population with read_pumas():

seizurecount_pop = read_pumas(
    seizurecount;
    observations = [:NSeizure],
    covariates = [:Age, :isF, :DrugA],
    id = :ID,
    time = :time,
    event_data = false,
)

Population
  Subjects: 150
  Covariates: Age, isF, DrugA
  Observations: NSeizure

4.1.3 Fit your model

Now we are ready to fit our model! We already have a model specified, negbin_model, along with a Population: seizurecount_pop. We can proceed with model fitting.

Model fiting in Pumas has the purpose of estimate parameters and is done by calling the fit() function with the following positional arguments:

Pumas model.
a Population.
a named tuple of the initial parameter’s values.
an inference algorithm.

4.1.3.1 Initial Parameter’s Values

We already covered model and Population, now let’s talk about initial parameter’s values. It is the 3rd positional argument inside fit().

You can specify you initial parameters as a named tuple. For instance, if you want to have a certain parameter, β1, as having an initial value as 3.14, you can do so by passing it inside a named tuple in the 3rd positional argument of fit():

fit(model, population, (; β1 = 3.14))

You can also use the helper function init_params() which will recover all the initial parameters we specified inside the model’s @param block.

Tip

Since we used the RealDomain() defaults, all of our example’s initial parameters are set to 0.0.

4.1.3.2 Inference Algorithms

Finally, our last (4th) positional argument is the choice of inference algorithm.

Pumas has the following available inference algorithms:

Marginal Likelihood Estimation:
- NaivePooled(): first order approximation without random-effects.
- FO(): first-order approximation.
- FOCE(): first-order conditional estimation with automatic interaction detection.
- LaplaceI(): second-order Laplace approximation.
Bayesian with Markov Chain Monte Carlo (MCMC):
- BayesMCMC(): MCMC using No-U-Turn Sampler (NUTS).

Note

We can also use a Maximum A Posteriori (MAP) estimation procedure for any marginal likelihood estimation algorithm. You just need to call the MAP() constructor with the desired marginal likelihood algorithm inside, for instance:

fit(model, population, init_params(model), MAP(FOCE()))

Ok, we are ready to fit our model. Let’s use the NaivePooled() since we are not using random-effects in this lesson:

negbin_fit =
    fit(negbin_model, seizurecount_pop, init_params(negbin_model), Pumas.NaivePooled())

[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
Iter     Function value   Gradient norm 
     0     4.567898e+02     7.600880e+03
 * time: 0.05167102813720703
     1     3.750301e+02     8.339642e+02
 * time: 1.6399948596954346
     2     3.734758e+02     2.070619e+02
 * time: 1.6404058933258057
     3     3.731815e+02     6.907366e+01
 * time: 1.6406559944152832
     4     3.729203e+02     1.922906e+02
 * time: 1.640887975692749
     5     3.715251e+02     6.063135e+02
 * time: 1.6411468982696533
     6     3.691703e+02     9.710812e+02
 * time: 1.6413979530334473
     7     3.650122e+02     1.148729e+03
 * time: 1.6416568756103516
     8     3.613103e+02     8.359061e+02
 * time: 1.6418938636779785
     9     3.596104e+02     3.354352e+02
 * time: 1.6421329975128174
    10     3.592915e+02     5.241530e+01
 * time: 1.642359972000122
    11     3.592308e+02     3.592072e+01
 * time: 1.6425859928131104
    12     3.591336e+02     1.301364e+02
 * time: 1.6428160667419434
    13     3.588955e+02     2.726936e+02
 * time: 1.6430649757385254
    14     3.583325e+02     4.800599e+02
 * time: 1.64329195022583
    15     3.570482e+02     7.658863e+02
 * time: 1.6435348987579346
    16     3.544024e+02     1.049090e+03
 * time: 1.643765926361084
    17     3.506601e+02     7.739625e+02
 * time: 1.6440088748931885
    18     3.498701e+02     1.001261e+01
 * time: 1.6442458629608154
    19     3.497652e+02     2.715154e+01
 * time: 1.6444859504699707
    20     3.497070e+02     9.406335e+00
 * time: 1.6447010040283203
    21     3.496259e+02     5.061301e+01
 * time: 1.6449120044708252
    22     3.494290e+02     1.233639e+02
 * time: 1.645155906677246
    23     3.489678e+02     2.258965e+02
 * time: 1.6453828811645508
    24     3.480060e+02     3.440584e+02
 * time: 1.645608901977539
    25     3.465074e+02     4.160614e+02
 * time: 1.6458230018615723
    26     3.453306e+02     3.067634e+02
 * time: 1.6460399627685547
    27     3.450179e+02     8.213581e+01
 * time: 1.6462910175323486
    28     3.449829e+02     1.113012e+00
 * time: 1.6465339660644531
    29     3.449796e+02     3.670739e+00
 * time: 1.6467900276184082
    30     3.449793e+02     1.031119e+00
 * time: 1.6470050811767578
    31     3.449789e+02     2.132371e+00
 * time: 1.6472458839416504
    32     3.449777e+02     7.109844e+00
 * time: 1.6474769115447998
    33     3.449750e+02     1.324740e+01
 * time: 1.6477088928222656
    34     3.449701e+02     1.853514e+01
 * time: 1.6479439735412598
    35     3.449642e+02     1.687681e+01
 * time: 1.6481950283050537
    36     3.449609e+02     7.813233e+00
 * time: 1.6484448909759521
    37     3.449602e+02     1.353327e+00
 * time: 1.6486890316009521
    38     3.449602e+02     3.052972e-02
 * time: 1.6489169597625732
    39     3.449602e+02     9.491811e-03
 * time: 1.6491758823394775
    40     3.449602e+02     6.147533e-04
 * time: 1.649420976638794

FittedPumasModel

Dynamical system type:          No dynamical model

Number of subjects:                            150

Observation records:         Active        Missing
    NSeizure:                   150              0
    Total:                      150              0

Number of parameters:      Constant      Optimized
                                  0              5

Likelihood approximation:              NaivePooled
Likelihood optimizer:                         BFGS

Termination Reason:                   GradientNorm
Log-likelihood value:                   -344.96018

-------------------
         Estimate
-------------------
α         2.2244
βAge     -0.019448
βisF      0.21872
βDrugA   -0.36471
ϕ         1.3232
-------------------

4.1.4 Perform inference on the model’s population-level parameters

Once the model is fitted, we can analyze our inference and estimates. But first, let’s discuss what the is the meaning of the estimated coefficients in a negative binomial regression model.

4.1.4.1 Interpreting the Coefficients

We can see that after the model is fitted, it prints a result with a summary and a table of the parameter estimates.

We can also recover the estimates as a named tuple with coef():

coef(negbin_fit)

(α = 2.2243605406194322,
 βAge = -0.019448205433894604,
 βisF = 0.2187223642020725,
 βDrugA = -0.36470539306165195,
 ϕ = 1.3232106736550775,)

Or as a DataFrame with coeftable():

coeftable(negbin_fit)

5×3 DataFrame

Row	parameter	constant	estimate
	String	Bool	Float64
1	α	false	2.22436
2	βAge	false	-0.0194482
3	βisF	false	0.218722
4	βDrugA	false	-0.364705
5	ϕ	false	1.32321

Remember that, similar to Poisson regression, negative binomial regression’s coefficients are the log scale, i.e. the natural logarithm of the effect size.

Log scales provide the facility that anything that is negative pulls down the base values of our dependent value \(y\). Anything positive pulls it up; and zero is a neutral effect. But since it is in a log scale it is not easily interpretable. This is why, in order for interpret it, often we undo the log transformation by exponentiating the effects, i.e. exponentiating the coefficients.

We can do that easily with the DataFrame returned by coeftable():

@rtransform coeftable(negbin_fit) :estimate_exp = exp(:estimate)

5×4 DataFrame

Row	parameter	constant	estimate	estimate_exp
	String	Bool	Float64	Float64
1	α	false	2.22436	9.24757
2	βAge	false	-0.0194482	0.98074
3	βisF	false	0.218722	1.24449
4	βDrugA	false	-0.364705	0.694401
5	ϕ	false	1.32321	3.75546

Now let us interpret the :estimate_exp column.

Our basal value α is the estimated number of seizures when all the other covariates are zero.

Let us now interpret the βAge, the covariate Age’s coefficient. It’s value is approximate 0.98, which means that for every increase in a single unit of Age we can expect a decrease in 2% in the number of seizures. Numerically this is: \(\alpha \cdot 0.98 \cdot \mathrm{Age}\). For instance, a 30 increase in Age would reduce the basal value \(9.24\) to \((9.24 - ( 9.24 \cdot 0.98)) \cdot 30 = 5.54\). Note that this association has the assumption that all other covariates are fixed/controlled.

The βisF, which represents the effect for the dummy variable isF. The exponentiated coefficient value is 1.24, which means that for every unit of increase in the covariate isF we can expected on average an increase of 24% in the number of seizures. Since, isF is a dummy variable taking either 0 or 1 as values, this means that female observations have, on average, 24% more seizures events than male observations.

Our DrugA covariate, which is a binary representation of the observation being either on the placebo or treatment for drug A arm, has an exponentiated coefficient of 0.69. This means that observations treated with drug A, when all other covariates are controlled, have 31% less seizures than when treated with placebo.

We can also inspect our estimated coefficients’ standard errors (SE) along with the 95% confidence intervals with the infer() function:

infer(negbin_fit)

[ Info: Calculating: variance-covariance matrix.
[ Info: Done.

Asymptotic inference results using sandwich estimator

Dynamical system type:          No dynamical model

Number of subjects:                            150

Observation records:         Active        Missing
    NSeizure:                   150              0
    Total:                      150              0

Number of parameters:      Constant      Optimized
                                  0              5

Likelihood approximation:              NaivePooled
Likelihood optimizer:                         BFGS

Termination Reason:                   GradientNorm
Log-likelihood value:                   -344.96018

----------------------------------------------------------
         Estimate    SE          95.0% C.I.
----------------------------------------------------------
α         2.2244     0.2746      [  1.6862  ;  2.7626   ]
βAge     -0.019448   0.0053832   [ -0.029999; -0.0088973]
βisF      0.21872    0.16064     [ -0.096133;  0.53358  ]
βDrugA   -0.36471    0.15809     [ -0.67455 ; -0.054864 ]
ϕ         1.3232     0.25433     [  0.82474 ;  1.8217   ]
----------------------------------------------------------

Also if you prefer other confidence interval band, you can choose with the keyword argument level inside infer().

For instance, one common band for the confidence intervals are 89%:

infer(negbin_fit; level = 0.89)

[ Info: Calculating: variance-covariance matrix.
[ Info: Done.

Asymptotic inference results using sandwich estimator

Dynamical system type:          No dynamical model

Number of subjects:                            150

Observation records:         Active        Missing
    NSeizure:                   150              0
    Total:                      150              0

Number of parameters:      Constant      Optimized
                                  0              5

Likelihood approximation:              NaivePooled
Likelihood optimizer:                         BFGS

Termination Reason:                   GradientNorm
Log-likelihood value:                   -344.96018

---------------------------------------------------------
         Estimate    SE          89.0% C.I.
---------------------------------------------------------
α         2.2244     0.2746      [  1.7855  ;  2.6632  ]
βAge     -0.019448   0.0053832   [ -0.028052; -0.010845]
βisF      0.21872    0.16064     [ -0.038017;  0.47546 ]
βDrugA   -0.36471    0.15809     [ -0.61736 ; -0.11205 ]
ϕ         1.3232     0.25433     [  0.91675 ;  1.7297  ]
---------------------------------------------------------

4.1.5 Predict from a fitted model or Simulate random observations from a fitted model

Now that we understand our model estimates, we can use it to make predictions or simulate random observations.

4.1.5.1 Predictions with `predict()`

In order to calculate prediction from a fitted model, we need to use the predict() function and passing two positional arguments:

a fitted Pumas model.
either a single Subject or a collection of them with a Population.

For example to make predictions onto a single Subject, we can:

my_subject = Subject(id = 42, covariates = (; Age = 35, isF = 1, DrugA = 1))

Subject
  ID: 42
  Covariates: Age, isF, DrugA

Caution

predict() also needs a keyword argument obstimes which represents the observation times that we want to predict the dependent variable. The value must be a vector. Since we used a dummy 0.0 value as :time variable in our read_pumas function, we can pass the same 0.0 but as a vector to the predict() function.

predict(negbin_fit, my_subject; obstimes = [0.0])

SubjectPrediction
  ID: 42
  Predictions: NSeizure: (n=1)
  Covariates: Age, isF, DrugA
  Time: [0.0]

We can also generate prediction for a Population:

predict(negbin_fit, seizurecount_pop; obstimes = [0.0])

Prediction
  Subjects: 150
  Predictions: NSeizure
  Covariates: Age, isF, DrugA

4.1.5.2 Simulations with `simobs()`

We can generate random observations from our fitted Pumas model with the simobs() function. You need to supply at least three positional arguments:

a non-fitted Pumas model, in our case negbin_model.
either a single Subject or a collection of them with a Population.
a named tuple of parameter values.

The 1st and 2nd positional arguments we already covered in our previous examples. Let’s focus on the 3rd positional argument: param. It can be a named tuple with the parameter values. For instance, we can use our estimated parameters from the fitted model with coef(negbin_fit):

coef(negbin_fit)

(α = 2.2243605406194322,
 βAge = -0.019448205433894604,
 βisF = 0.2187223642020725,
 βDrugA = -0.36470539306165195,
 ϕ = 1.3232106736550775,)

simobs(negbin_model, seizurecount_pop, coef(negbin_fit));

Note

You can also pass a single parameter as a Pumas fitted model to simobs().

seizurecount_sim = simobs(negbin_fit);

4.1.6 Model diagnostics

Finally, our last step is to assess model diagnostics.

4.1.6.1 Assessing model diagnostics

To assess model diagnostics we can use the inspect() function in our fitted Pumas model:

inspect_df = DataFrame(inspect(negbin_fit))
first(inspect_df, 5)

[ Info: Calculating predictions.
[ Info: Calculating empirical bayes.
[ Info: Evaluating dose control parameters.
[ Info: Evaluating individual parameters.
[ Info: Done.

5×13 DataFrame

Row	id	time	evid	NSeizure	Age	isF	DrugA	NSeizure_pred	NSeizure_ipred	_Age	_isF	_DrugA	linear_pred
	String	Float64	Int8	Float64?	Int64?	Int64?	Int64?	Float64?	Float64?	Float64?	Float64?	Float64?	Float64?
1	1	0.0	0	6.0	43	1	0	4.98688	4.98688	-0.836273	0.218722	-0.0	-0.61755
2	10	0.0	0	5.0	50	1	0	4.35216	4.35216	-0.97241	0.218722	-0.0	-0.753688
3	100	0.0	0	6.0	42	1	1	3.5309	3.5309	-0.816825	0.218722	-0.364705	-0.962808
4	101	0.0	0	2.0	43	0	1	2.78259	2.78259	-0.836273	0.0	-0.364705	-1.20098
5	102	0.0	0	0.0	55	0	1	2.2034	2.2034	-1.06965	0.0	-0.364705	-1.43436

4.1.6.1.1 AIC

We can also check for Akaike Information Criterion (AIC):

aic(negbin_fit)

699.9203691169835

4.1.6.1.2 BIC

Besides AIC, there is also the Bayesian Information Criterion (BIC):

bic(negbin_fit)

714.9735455874647

5 Conclusion

I hope you enjoyed this tutorial on negative binomial regression. Whenever you have count data and you experience problems fitting your data to a Poisson regression, you can suspect that your data has inherently overdispersion. To overcome the limitations of Poisson regression to overdispersion in count data you can use a more robust approach called negative binomial regression. As before, keep it on you arsenal of discrete data modeling whenever you are dealing with count data.

Reuse

CC BY-SA 4.0

1 What is Negative Binomial Regression?

1.1 Poisson Distribution versus Negative Binomial Distribution

1.2 Link Function \(e^x\)

1.3 Covariate \(x\)

1.3.1 How to Interpret Coefficient \(\beta\)?

2 When to use Negative Binomial Regression?

3 seizurecount Dataset

3.1 Exploratory Data Analysis

3.2 Visualizations

4 Pumas modeling

4.1 Pumas’ Workflow

4.1.1 Defining a model in Pumas

4.1.1.1 Model parameter with @param

4.1.1.2 Model covariates with @covariates

4.1.1.3 Pre-processing variables with @pre

4.1.1.4 Statistical modeling of dependent variables with @derived

4.1.2 Define a Subject and Population

4.1.2.1 Reading Subjects directly from a DataFrame

4.1.3 Fit your model

4.1.3.1 Initial Parameter’s Values

4.1.3.2 Inference Algorithms

4.1.4 Perform inference on the model’s population-level parameters

4.1.4.1 Interpreting the Coefficients

4.1.5 Predict from a fitted model or Simulate random observations from a fitted model

4.1.5.1 Predictions with predict()

4.1.5.2 Simulations with simobs()

4.1.6 Model diagnostics

4.1.6.1 Assessing model diagnostics

4.1.6.1.1 AIC

4.1.6.1.2 BIC

5 Conclusion

Reuse

3 `seizurecount` Dataset

4.1.1.1 Model parameter with `@param`

4.1.1.3 Pre-processing variables with `@pre`

4.1.1.4 Statistical modeling of dependent variables with `@derived`

4.1.2 Define a `Subject` and `Population`

4.1.2.1 Reading `Subject`s directly from a `DataFrame`

4.1.5.1 Predictions with `predict()`

4.1.5.2 Simulations with `simobs()`