Philosophy of Science and Explaining Machine Learning Models

Authors:

Mohamed Tarek, Abdelwahed Khamis

Philosophy of Science

Explanation

explain (verb)

To make known

To make plain or understandable

To give the reason for or cause of

To show the logical development or relationships of

Source: Merriam-Webster

Understanding

Understanding is an unending activity by which, in constant change and variation, we come to terms with and reconcile ourselves to reality, that is, try to be at home in the world.

Source: Hannah Arendt, Understanding and Politics (1954)

Understanding is about finding familiar patterns and relationships in a complex world.
Understanding can improve over time.
Understanding is subjective and context-dependent.
Good explanations often involve analogies to similar concepts and contrasts with different (often opposing) concepts.
Both the similar and different concepts should be familiar to the audience.

Understanding

Software view of science

A scientific theory is like a computer program that predicts our observations, the experimental data.

A useful theory is a compression of the data; comprehension is compression.

You compress things into computer programs, into concise algorithmic descriptions.

The simpler the theory, the better you understand something.

Source: Gregory Chaitin, The Limits of Reason (2006)

Occam’s Razor

Simpler explanations are preferred.

Is Science Understandable?

Science aims to explain complex phenomena using models as theories.
Some scientific models are simple and intuitive (e.g., Newton’s laws).
Others are complex and counterintuitive (e.g., quantum mechanics).

No one really understands quantum mechanics.

Source: Richard Feynman

Experimental Science vs Mathematics

Experimental Science

Experimental science describes the world using guesses, known as hypotheses.
For a hypothesis to be scientific, it must be testable / falsifiable.
If we can design an experiment that could show the hypothesis to be false, it is scientific.
- A hypothesis is falsified if the experimental data contradict it directly or any of its mathematically derivable consequences (simulated data).
If we run a repeatable experiment and the hypothesis is not falsified, we say it is verified.
- A hypothesis is verified if it is consistent with the repeated observations or if the observations can be deduced from it.
Widely accepted and verified hypotheses are often called scientific theories or scientific laws.
Science does not claim these laws are true, only that they have not yet been falsified.

Experimental Science vs Mathematics

Experimental Science

If there is a finite number of hypotheses, we can eventually falsify all wrong hypotheses and find the correct one.
In practice, hypotheses can be infinite in number and complex, making it impossible to find the correct one.
Scientific theories evolve over time as new data emerges that falsifies old theories.
There can be multiple competing scientific theories for the same phenomena.
Scientific laws are compressions of experimental observations.

We never are right. We can only be sure we are wrong.

Source: Richard Feynman in The Feynman Lectures on Physics

Experimental Science vs Mathematics

Frequentist vs Bayesian Experimental Science

In the frequentist approach to science, we reject the (null) hypothesis if the data is highly unlikely given the hypothesis.
In the Bayesian approach, hypotheses are assigned prior probabilities based on existing belief.
Experimental data is used to update these probabilities using Bayes’ theorem.
Hypotheses with high posterior probabilities and good posterior predictive performance are considered more likely to be true (soft verification).
Hypotheses with low posterior probabilities and poor posterior predictive performance are considered less likely to be true (soft falsification).
The more informative experimental data that we collect, the more we can distinguish between competing hypotheses.

Experimental Science vs Mathematics

Frequentist vs Bayesian Experimental Science

The hypothesis with the highest posterior probability can be selected as the best explanation of the data, but it is not guaranteed to be true.
Bayesian methods allow us to quantify the uncertainty in our hypotheses which is useful for decision-making with complex models and/or limited data.
Bayesian methods are more appropriate when data is limited, models are complex, or prior knowledge is available.
Frequentist methods are often simpler and computationally less intensive.
Both frequentist and Bayesian methods are appropriate when informative data is abundant and models are simple.

Experimental Science vs Mathematics

Mathematics

Mathematicians compress their computational experiments into mathematical axioms.
Axioms are irreducible (usually “obvious”) statements that cannot be proven true, but are accepted as true.
From these axioms, mathematicians deduce theorems using logic alone. Logic itself has its own axioms which we need to accept.
Theorems in mathematics (in contrast with scientific theories) are proven rather than verified, assuming the axioms are true.
No experimental observations are needed.
Mathematics is also the language by which scientists describe their hypotheses and deduce the hypotheses’ inevitable consequences to be compared with experimental data.

Experimental Science vs Mathematics

Mathematics

An example of mathematical axioms is the set of Euclid’s Axioms / Postulates in Euclidean geometry. These describe how points, lines, and shapes behave.

Through any two distinct points, there is exactly one straight line.
A straight line segment can be extended indefinitely.
A circle can be drawn with any center and any radius.
All right angles are equal.
(Parallel postulate) Given a line and a point not on it, there is exactly one line through the point parallel to the given line.

Experimental vs Observational Science

Experimental science involves designing and conducting experiments to test hypotheses about the natural world.
Observational science involves collecting and analyzing data from natural settings without manipulating variables.
Both approaches aim to understand complex phenomena, but experimental science allows for more controlled testing of causal relationships.
Experimental science is generally more rigorous and reliable for establishing cause-and-effect relationships.
Observational science is valuable for studying phenomena that cannot be ethically or practically manipulated, such as in ecology, astronomy, epidemiology, evolutionary biology, economics and social sciences.

Decision-making and Science

Science is often a mean to an end: making decisions based on incomplete understanding of complex phenomena.
Decision-making requires some level of belief or trust in the scientific models / theories used.
But science does not directly prove hypotheses right!
Decision-makers take the best available scientific theories that have not yet been falsified and assume/believe they are correct.
If multiple competing theories exist, either select one of them or make decisions that are robust to the competing theories.

Biology and Complexity

In biological systems, hypotheses are often complex and difficult to fully experimentally verify or falsify.
Many models can fit the limited data available.
How to make decisions responsibly in such situations?
Scientists often rely on understandable approximations of biology in the form of mechanistic models.
Mechanistic models can vary in complexity from simple ordinary differential equations (ODEs) to complex multi-scale simulations.
These models are not perfect representations of reality, but they are useful tools for understanding (compressing the experimental data) and decision-making, e.g. dose selection.
In pharmacometrics, individual mechanistic models are often combined with data-driven statistical models (random effects and residual error) to capture variability and uncertainty.

Understanding \(\to\) Trust

When experimental data is limited, domain experts trust models they can understand.
In quantum mechanics, understanding and intuition is limited but trust is high due to extensive experimental verification.
Understandable models use familiar concepts and mechanisms, plausible assumptions and relationships between the variables that make sense to experts.
Trust is essential for adoption of models in high-stakes decision-making (e.g., healthcare).
Complex models (e.g., deep learning) can outperform simpler models but are often less understandable.
How to explain complex models to increase trust?
Should we fully trust complex models even if we can explain some of their behavior?
Why should we trust the explanations themselves?

Explaining Machine Learning (ML) Models

Explainability in ML

There is no single agreed-upon definition of explainability / interpretability in ML.
Explainability is often context-dependent and varies based on the audience and application.

Models can be loosely categorized as:

Intrinsically interpretable models
- Simple models whose structure and parameters can be directly understood.
- Examples: linear regression models with a few variables, decision trees with a few nodes, k-nearest neighbors with a small k.
“Black-box” models with post-hoc explainability, aka explainable models
- Complex models that require additional techniques to explain their predictions.
- Examples deep neural networks, ensemble methods (random forests, gradient boosting).

Types of Explanations

One may want to explain:

The model for a specific set of inputs, aka local explainability, or
The model’s overall behavior across all possible inputs, aka global explainability.

The explanation may focus on:

The internal mechanics of the model, or
Which inputs features are most influential in the model’s predictions.

The techniques used may be:

Model-agnostic: applicable to any model type, or
Model-specific: designed for a particular model architecture.

The field of explaining complex ML models is often called Explainable AI (XAI).

Benefits of Explaining ML Models

When explanation techniques work (they don’t always!), they can provide several benefits:

Trust and Adoption: Users are more likely to trust and adopt models they can understand.
Debugging and Improvement: Understanding model behavior helps identify errors and areas for improvement.
Regulatory Compliance: Some industries require explanations for automated decisions (e.g., finance, healthcare).
Ethical Considerations: Ensures models do not reinforce human biases or unfair practices, present in the data.
Knowledge Discovery: Insights from models can lead to new scientific knowledge.

Spurious Correlations

Complex models can learn spurious correlations present in the training data.
Spurious correlations are relationships that appear in the data but lack a causal basis.
Machine learning models may exploit these correlations to make accurate predictions.
This is not the model’s fault! The model will use all possible associations it can find to make good predictions.
However, relying on spurious correlations is undesirable and can lead to poor generalization to new data.
Explanations can help identify when models rely on such spurious correlations.
Once identified, steps can be taken to mitigate their impact (e.g., collecting more representative data, or modifying the model).

Spurious Correlations

Limitations and Dangers of Explaining ML Models

Warning

Explanation does not imply causation!

Explanations often reveal associations in the model, not necessarily causal relationships.
These can form the basis for causal hypotheses to be investigated but should not be taken at face value.
Just because you can explain the inputs that influenced a prediction does not mean those inputs cause the outcome to change in reality.
Changing the input will change the model’s prediction, but not necessarily the real-world outcome.
Relying on explanations for decision-making without understanding the underlying data and context can lead to incorrect conclusions and decisions.

Limitations and Dangers of Explaining ML Models

Explanations can be used to check that a model is not doing something unreasonable, e.g., relying on spurious correlations.
However, just because a model did not do something unreasonable for many inputs does not mean it will never do so for any input, e.g. with distribution shifts or in adversarial examples.

Source: Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and Harnessing Adversarial Examples.

Limitations and Dangers of Explaining ML Models

Adversarial examples are inputs designed to fool ML models into making incorrect predictions.
Explanations of adversarial examples may highlight features that are imperceptible to humans.

Limitations and Dangers of Explaining ML Models

High dimensional data and models with complex interactions make it inherently difficult to provide faithful explanations.
A faithful explanation is one that accurately reflects the true reasoning of the model.
A post-hoc explanation may not be faithful, i.e., it may not accurately represent how the model actually makes decisions, despite appearing plausible and convincing.
Explanations can be manipulated to hide undesirable model behavior or biases.
Explanations can give a false sense of security and trust in the model.
All current XAI methods rely on extreme simplifications of complex models and only help us gain a limited understanding of them.
Used carefully, XAI techniques can provide useful insights into complex models, but they should be treated with caution and skepticism.

Criticism of Explainable AI

“Everyone who is serious in the field knows that most of today’s explainable A.I. is nonsense,”

Source: Zachary Lipton in “What’s Wrong with Explainable A.I.?” (2022)

Zachary Lipton is an associate professor of computer science at Carnegie Mellon University with a focus on ML and healthcare.

SHAP (SHapley Additive exPlanations)

SHAP is a machine learning explainability method based on Shapley values from cooperative game theory.
It is an input feature attribution method that explains the output of any ML model by assigning each feature an importance value for a particular prediction.
The sum of the feature contributions equals the difference between the model’s prediction for that instance and the average prediction across all instances.
SHAP provides local explanations for individual predictions.
A global explanation can be obtained by averaging local explanations across many instances.
SHAP is model-agnostic: can be applied to any ML model (tree-based, neural networks, etc.).
SHAP is widely used due to its solid theoretical foundation and intuitive explanations.
Reference: Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in neural information processing systems.

Shapley Values Example

Assume a team with 3 players: A, B, C
The team wins a total reward of 60 points.
How much did player \(j\) contribute to the total reward?
We can measure each player’s contribution in the team by looking at the team’s performance with and without that player, for all possible sub-team permutations (order matters).
One way to do this is to consider all possible permutations of players joining the team one by one.
The value function \(v(S)\) is the reward the team gets for each combination (order doesn’t matter) of players \(S\).

Shapley Values Example

Value/reward function \(v(S)\):

Combination (\(S\))	Value
\(\emptyset\)	0
\(\{A\}\)	10
\(\{B\}\)	0
\(\{C\}\)	0
\(\{A,B\}\)	10
\(\{A,C\}\)	10
\(\{B,C\}\)	30
\(\{A,B,C\}\)	60

Total value to distribute:

\[ v(\{A,B,C\}) = 60 \]

Shapley Values Example

There are \(3! = 6\) permutations, each representing a different order in which players join the team:

\(A \to B \to C\)
\(A \to C \to B\)
\(B \to A \to C\)
\(B \to C \to A\)
\(C \to A \to B\)
\(C \to B \to A\)

We compute contributions along each ordering.

Shapley Values Example

Contributions: \(A \to B \to C\)

\(A\): \(v(\{A\}) - v(\emptyset) = 10\)
\(B\): \(v(\{A,B\}) - v(\{A\}) = 0\)
\(C\): \(v(\{A,B,C\}) - v(\{A,B\}) = 50\)

Sum: \(10 + 0 + 50 = 60\)

Contributions: \(B \to A \to C\)

\(B\): \(0\)
\(A\): \(v(\{A,B\}) - v(\{B\}) = 10\)
\(C\): \(v(\{A,B,C\}) - v(\{A,B\}) = 50\)

Sum: \(60\)

Contributions: \(A \to C \to B\)

\(A\): \(10\)
\(C\): \(v(\{A,C\}) - v(\{A\}) = 0\)
\(B\): \(v(\{A,B,C\}) - v(\{A,C\}) = 50\)

Sum: \(60\)

Contributions: \(B \to C \to A\)

\(B\): \(0\)
\(C\): \(v(\{B,C\}) - v(\{B\}) = 30\)
\(A\): \(v(\{A,B,C\}) - v(\{B,C\}) = 30\)

Sum: \(60\)

Shapley Values Example

Contributions: \(C \to A \to B\)

\(C\): \(0\)
\(A\): \(v(\{A,C\}) - v(\{C\}) = 10\)
\(B\): \(v(\{A,B,C\}) - v(\{A,C\}) = 50\)

Sum: \(60\)

Contributions: \(C \to B \to A\)

\(C\): \(0\)
\(B\): \(v(\{B,C\}) - v(\{C\}) = 30\)
\(A\): \(v(\{A,B,C\}) - v(\{B,C\}) = 30\)

Sum: \(60\)

Shapley Values Example

Average each player’s contributions across permutations:

Player A \[ (10 + 10 + 10 + 30 + 10 + 30) / 6 = 16 + \frac{2}{3} \]
Player B \[ (0 + 50 + 0 + 0 + 50 + 30) / 6 = 21 + \frac{2}{3} \]
Player C \[ (50 + 0 + 50 + 30 + 0 + 0) / 6 = 21 + \frac{2}{3} \]
Sum of contributions = \(16 + \frac{2}{3} + 21 + \frac{2}{3} + 21 + \frac{2}{3} = 60\).

Shapley Values Example

Key Insight

Each permutation conserves the total value \(v(\{A,B,C\}) - v(\emptyset) = 60\).
Shapley values average over permutations.
Averaging redistributes credit, but never creates or destroys value.
Players that contribute more in permutations on average get higher Shapley values.
The calculation can more succinctly be expressed using subsets rather than permutations.

Formal Definition of Shapley Values

Let
- \(N = \{A, B, C\}\) be the set of all players,
- \(j\) be a specific player, and
- \(S\) be a subset of players not including \(j\).

\[ \phi_j = \sum_{S \subseteq N \setminus \{j\}} \frac{|S|!(|N|-|S|-1)!}{|N|!} [v(S \cup \{j\}) - v(S)] \]

This formula weights each subset \(S\) by the number of permutations where \(S\) precedes player \(j\).
There are \(|S|\) players before \(j\) and \((|N|-|S|-1)\) players after \(j\), for each combination \(S\).
The number of permutation corresponding to this combination \(S\) is therefore \(|S|!(|N|-|S|-1)!\).

Shapley Values in ML

In ML, the “players” are the input features, and the “payout” is the model’s prediction.
We want to explain the scalar output of \(\hat{f}(x)\) for a specific instance \(x\).
- In regression, \(\hat{f}(x)\) is a real number. In classification, \(\hat{f}(x)\) can be the probability or log odds (avoid discrete outputs).
However, we cannot simply remove features from the model.
Instead, we consider the expected model output when only a subset of features \(S\) is known (taken from \(x\)), and the rest are randomly sampled from the data distribution.
Let \(x_S\) be the subset of \(x\) input to \(\hat{f}\) for combination \(S\) and \(X_{-S}\) be the random vector of removed features, to be sampled from the data distribution.
The value function is defined as: \[ v(S) = E_{X_{-S}}\!\left[\hat f(x_S, X_{-S})\right] \]

Shapley Values in ML

There are different distributions we can use to sample \(X_{-S}\) from.
We can sample each feature in \(X_{-S}\) independently from its marginal distribution in the data.
In some cases, the independence assumption may be too strong and lead to unrealistic samples and distorted attributions.
Alternatively, we can sample \(X_{-S}\) from the conditional distribution given \(x_S\). This is known as conditional SHAP.
Calculating Shapley values exactly for \(p\) features requires evaluating the value function for all \(2^p\) subsets of features, where each evaluation involves marginalizing over the unknown features.
This has exponential complexity and is computationally intractable for a large \(p\), so approximations are often used in practice.
There are specialized methods to estimate Shapley values efficiently for some model types, e.g., TreeSHAP for tree-based models.

Shapley Values in ML

TreeSHAP uses the tree structure to compute exact Shapley values in polynomial time.
The sum of Shapley values across all features equals the difference between the model’s prediction for the instance and the expected prediction across all instances: \[ \begin{aligned} \sum_j \phi_j &= v(N) - v(\emptyset) \\ &= E_{X_{-N}}[\hat f(x_N, X_{-N})] - E_{X_{-\emptyset}}[\hat f(x_{\emptyset}, X_{-\emptyset})] \\ &= \hat f(x) - E_{X_{-\emptyset}}[\hat f(X_{-\emptyset})] \end{aligned} \]
\(E_{X_{-\emptyset}}[\hat f(X_{-\emptyset})]\) is the average model prediction across the entire data distribution.
\(\hat f(x)\) is the model prediction for the specific instance \(x\) being explained.
One can compute Shapley values for each instance \(x\) in the dataset to get local explanations.
A global explanation can be obtained by averaging Shapley values across many instances or by analyzing the distribution of Shapley values for each feature.

Local Interpretable Model-agnostic Explanations (LIME)

LIME is a local explainability method that approximates a complex model \(\hat f\) locally around a specific instance \(x\) with a simple, interpretable surrogate model \(g\).

Pick the instance \(x\) whose prediction \(\hat{f}(x)\) you want to explain.
Generate many perturbed versions of \(x\) (small random changes).
Get the model’s predictions for those samples.
Weight each sample by how close it is to \(x\) — closer points matter more.
Optionally, normalize the weights, treat them as probabilities and resample from the perturbed points to focus on more realistic data points.
Fit a simple, intrinsically interpretable surrogate model \(g\) (e.g., linear regression, decision tree) to these (weighted) samples.

LIME Algorithm

LIME uses a proximity measure \(\pi_x\) to weight perturbed samples based on their distance to the original instance \(x\).
A common choice is an exponential kernel: \[ \pi_x(x_i) = \exp\left(-\frac{D(x, x_i)^2}{\sigma^2}\right) \] where \(D(x, x_i)\) is a distance metric (e.g., Euclidean distance) and \(\sigma\) is a kernel width parameter controlling how quickly the weight decreases with distance.
Once fitted, the surrogate model \(g\) provides a local explanation of the complex model \(\hat f\) around the instance \(x\).
The limit as the perturbations become infinitesimally small gives a local linear approximation of the model \(\hat f\) at \(x\), similar to a first-order Taylor expansion.

Limitations of LIME

The neighborhood definition is tricky, different kernel widths or distance metrics can yield different explanations.
Perturbations may ignore feature correlations, producing unrealistic samples that the real model would never see.
The surrogate model’s complexity (e.g., number of features) must be chosen manually, creating a trade-off between interpretability and fidelity.
Explanations can be unstable, small changes in data or random seed may alter results noticeably.
Explanations are strictly local, they describe model behavior only near the instance, not globally.

Limitations of LIME

Because of the sampling and weighting process, explanations can be sensitive to hyper-parameters and even manipulable.
Needs many perturbations for large or high-dimensional data to get reliable explanations, increasing computational cost.

References

Hannah Arendt (1954), Understanding and Politics
Gregory Chaitin (2006), The Limits of Reason
Richard Feynman. The Feynman Lectures on Physics.
Bach, S., Binder, et. al. (2015). Analyzing Classifiers: Fisher Vectors and Deep Neural Networks.
Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and Harnessing Adversarial Examples.
Zachary Lipton (2022). “What’s Wrong with Explainable A.I.?”
Molnar, C. (2025). Interpretable Machine Learning: A Guide for Making Black Box Models Explainable (3rd ed.).