Warning
Using a trained model to make predictions for a new data point is commonly referred to as “inference” in the machine learning community! This is different from the statistical definition of inference, which refers to drawing conclusions about variable relationships from data.
The following are common data collection biases that can lead to biased parameter estimates and predictions downstream.
The following are common modelling biases that can lead to biased parameter estimates and poor generalization, e.g. poor extrapolation or poor performance in a new environment.
When designing empirical models, we often have to make assumptions about the data and the underlying relationships between variables.
These assumptions introduce bias into the model, but they can also help improve the model’s performance and generalization ability. This is known as the bias-variance trade-off.
Inductive bias is the set of assumptions that a learning algorithm uses to predict outputs for inputs it has not encountered during training, it is caused by the assumed empirical model structure and the training algorithm, e.g. regularization.
Different empirical models make different assumptions about the data, e.g. spatial locality or sparsity, which can lead to different inductive biases.
Assuming the model is correct, parameter estimation bias is defined as:
\[ \text{Bias}(\hat{\theta}) = \mathbb{E}[\hat{\theta}] - \theta \]
The following are common estimation/fitting/analysis biases that can lead to biased parameter estimates and poor generalization.
Note
Consistency of Estimators: An estimator is consistent if it converges in probability to the true value of the parameter being estimated as the sample size increases. Formally, for an estimator \(\hat{\theta}_n\) of a parameter \(\theta\), consistency means: \[ \lim_{n \to \infty} P(|\hat{\theta}_n - \theta| > \epsilon) = 0 \quad \text{for all } \epsilon > 0. \] Consistency ensures that with enough data, the estimator will produce values arbitrarily close to the true parameter value.
Note
In practice, we only have finite data so consistency does not guarantee low bias or variance in finite samples.