Probability of the response \(y\) according to the model given specific values of \(\theta\), \(\eta\) and \(x\).
\[ p_c(y \mid \theta, \eta, x) \]
Fit model by simply finding the values of \(\theta\) and \(\eta\) that maximizes the conditional probability?
This can horribly overfit the data, too many parameters!
Integrates out the effect of the random effects
\[ p_m (y \mid \theta, x) = \int \overbrace{p_c (y \mid \theta, \eta, x) \cdot p_{\text{prior}} (\eta \mid \theta)}^{\text{joint probability}} \, d\eta \]
Average conditional probability weighted by a prior.
Which model has the largest area under the curve?
A good model should have:
Intuitively, even if a model is able to fit the data very well, it is not a good model if it is very sensitive to the value of the random effect.
Over-fitted models will generally have a very high peak, but a very narrow base.
Consider a single subject \(i\). The marginal likelihood is:
\[ p(y_i \mid \theta) = \int p(y_i \mid \eta_i, \theta) \cdot p(\eta_i \mid \theta) \, d\eta_i \]
However, we can write the marginal likelihood in another way:
\[ p(y_i \mid \theta) = \prod_{j=1}^{m_i} p(y_{i,j} \mid y_{i,1:j-1}, \theta) \]
where
\[ p(y_{i,1} \mid y_{i,1:0}, \theta) = p(y_{i,1} \mid \theta) = \int p(y_{i,1} \mid \eta_i, \theta) \cdot p(\eta_i \mid \theta) \, d\eta_i \]
\[ p(y_{i,j} \mid y_{i,1:j-1}, \theta) = \int p(y_{i,j} \mid \eta_i, \theta) \cdot p(\eta_i \mid y_{i,1:j-1}) \, d\eta_i \]
where \(p(\eta_i \mid y_{i,1:j-1}, \theta)\) is the posterior distribution of \(\eta_i\) given partial data \(y_{i,1:j-1}\) and \(\theta\).
Marginal likelihood | Conditional likelihood | |
---|---|---|
Prior / regularization on \(\theta\) |
|
JointMAP() |
No prior / regularization on \(\theta\) |
|
JointMAP() without priors in the @param block |
FO()
, FOCE()
, LaplaceI()
and SAEM()
\[ \begin{aligned} \theta^* & = \text{arg max}_{\theta} \prod_{i=1}^N p(y_i \mid \theta, x) \\ \eta_i^* & = \text{arg max}_{\eta_i} p(y_i \mid \theta = \theta^*, \eta_i, x_i) \cdot p(\eta_i \mid \theta = \theta^*) \end{aligned} \]
MAP(FO())
, MAP(FOCE())
and MAP(LaplaceI())
\[ \begin{aligned} \theta^* & = \text{arg max}_{\theta} p(\theta) \cdot \prod_{i=1}^N p(y_i \mid \theta, x) \\ \eta_i^* & = \text{arg max}_{\eta_i} p(y_i \mid \theta = \theta^*, \eta_i, x_i) \cdot p(\eta_i \mid \theta = \theta^*) \end{aligned} \]
JointMAP()
\[ \begin{aligned} \theta^*, \eta^* & = \text{arg max}_{\theta, \eta} p(\theta, \eta \mid x, y) \\ & = \text{arg max}_{\theta, \eta} p(\theta) \cdot \prod_{i=1}^N p(\eta_i \mid \theta) \cdot p(y_i \mid \theta, \eta_i, x_i) \end{aligned} \]
BayesMCMC()