Unsupervised Learning

Authors:
Abdelwahed Khamis, Mohamed Tarek

What is Unsupervised Learning?

  • Learning from unlabeled data
  • Goal: Find hidden patterns or structures in data
  • No “teacher” providing correct answers

Example Tasks

  1. Dimensionality Reduction
    • Reduce data complexity
    • Preserve important information
    • Methods: PCA, t-SNE
  2. Clustering
    • Group similar data points together
    • Example: Patient stratification
    • Methods: k-means, hierarchical clustering

Common Applications in Drug Development

  • Patient subgroup identification
  • Drug response patterns
  • Drug-drug similarity analysis

Dimension Reduction

Principal Component Analysis (PCA)

  • One of the most popular and powerful tools in data science
  • Helps us make sense of complex data by simplifying it while keeping the most interesting information.
  • Think of it like looking at a complex 3D object from different angles to understand its true shape
  • PCA helps us find the best angles to view our data.
  • A dimension reduction technique that transforms data into a new coordinate system
  • Finds directions of maximum variance (principal components)

Common Uses of PCA

  • Reduce computational complexity, as a preprocessing step for ML models (e.g clustering using k-means)
  • Remove noise and redundancy
  • Visualize high-dimensional data in a low dimensional space
  • Feature extraction and selection
  • Data compression

PCA Algorithm

Input

  • \(X \in \mathbb{R}^{n \times d}\): Data matrix with \(n\) samples and \(d\) features

Center data

For each feature \(j = 1, \ldots, d\):

  • Compute means: \(\mu_j = \text{mean}(X[:, j])\)

  • Center features: \(\tilde{X}[:, j] = X[:, j] - \mu_j\)

Compute covariance matrix

\[ \Sigma = \frac{1}{n - 1} \tilde{X}^T \tilde{X} \quad \text{(a } d \times d \text{ matrix)} \]

PCA Algorithm

Find eigenvectors and eigenvalues

\[ (\Sigma - \lambda I) \cdot v = 0 \]

  • Sort the eigenvalues in descending order.
  • Denote the sorted eigenvalues by \(\lambda = [\lambda_1, \lambda_2, \dots, \lambda_d ]\).
  • Denote the eigenvectors of the sorted eigenvalues by \(V = [v_1 \, v_2 \, \dots \, v_d]\).

Select components

  • Select the top \(k\) eigenvalues and their associated eigenvectors
  • Based on desired dimensionality or explained variance percentage
  • Explained variance percentage is \(\frac{\sum_{i=1}^k \lambda_i}{\sum_{i=1}^d \lambda_i}\)

PCA Algorithm

Project data \[ Z = \overbrace{\tilde{X}}^{n \times d} \cdot \overbrace{V[:, 1:k]}^{d \times k} \quad \text{(an } n \times k \text{ matrix)} \]

Output

\[ Z \in \mathbb{R}^{n \times k} \]

  • Transformed data in reduced \(k\)-dimensional space

Bonus Exercise

Prove that the variance of the projected data along an eigenvector \(v\) is equal to the corresponding eigenvalue \(\lambda\).

PCA Algorithm

Project new point \(x\)

\[ \overbrace{z}^{1 \times k} = \overbrace{(x - \mu)}^{1 \times d} \cdot \overbrace{V[:, 1:k]}^{d \times k} \]

  • \(z\) is the projected point in the new (lower dimensional) coordinate system

Lossy reconstruction

\[ \overbrace{x_{\text{reconstructed}}}^{1 \times d} = \overbrace{z}^{1 \times k} \cdot \overbrace{V[:, 1:k]^T}^{k \times d} + \overbrace{\mu}^{1 \times d} \]

  • \(x_{\text{reconstructed}}\) is the projected point in the original coordinate system as \(x\)