Eigenvalues and Eigenvectors — Linear Algebra for Machine Learning | Sabaoon Academy

When a matrix transforms a vector, it typically both rotates and scales it. But certain special vectors — eigenvectors — are only scaled, not rotated. Understanding them unlocks PCA, PageRank, spectral graph theory, and differential equations.

The Definition

A non-zero vector v is an eigenvector of matrix A if:

Av = λv

Where λ (lambda) is a scalar called the eigenvalue. Applying A to v simply stretches (or shrinks, or flips) v by a factor of λ — its direction is unchanged.

Example:

A = [[3, 0],
     [0, 2]]

v₁ = [1, 0]:   A @ v₁ = [3, 0] = 3 × [1, 0]   → eigenvalue λ₁ = 3
v₂ = [0, 1]:   A @ v₂ = [0, 2] = 2 × [0, 1]   → eigenvalue λ₂ = 2

For a diagonal matrix, the eigenvectors are the coordinate axes and the eigenvalues are the diagonal entries.

Finding Eigenvalues

Rearranging Av = λv:

Av - λv = 0
(A - λI)v = 0

For a non-zero solution v to exist, the matrix (A - λI) must be singular (non-invertible):

det(A - λI) = 0

This is the characteristic equation. Solving it gives the eigenvalues.

For a 2×2 matrix:

A = [[a, b],
     [c, d]]

det(A - λI) = (a-λ)(d-λ) - bc = 0
λ² - (a+d)λ + (ad-bc) = 0

In Python:

import numpy as np
A = np.array([[4., 1.],
              [2., 3.]])

eigenvalues, eigenvectors = np.linalg.eig(A)
# eigenvalues: [5., 2.]
# eigenvectors[:, 0] is the eigenvector for λ=5
# eigenvectors[:, 1] is the eigenvector for λ=2

Geometric Interpretation

Eigenvectors are the axes along which a transformation acts purely as scaling. For any input, the transformation:

Decomposes it into components along eigenvector directions
Scales each component by the corresponding eigenvalue
Recombines

This means if you know all eigenvectors and eigenvalues, you completely understand what the matrix does.

Eigenvalue	Effect on eigenvector
λ > 1	Stretches
0 < λ < 1	Shrinks
λ = 1	No change
λ = 0	Collapses to zero (matrix is singular)
λ < 0	Flips direction
λ is complex	Rotation involved

Diagonalisation

A matrix A with n linearly independent eigenvectors can be written as:

A = PDP⁻¹

Where:

P: matrix whose columns are the eigenvectors
D: diagonal matrix with eigenvalues on the diagonal

This is diagonalisation. Its power: computing Aⁿ (matrix to the nth power) becomes trivial:

Aⁿ = PDⁿP⁻¹

And Dⁿ is just each diagonal entry raised to the nth power. This appears in solving recurrences, Fibonacci, Markov chains, and graph problems.

Principal Component Analysis (PCA)

PCA is the most important application of eigenvalues in ML. It finds the directions of maximum variance in data.

Setup: given a dataset represented as an n×d matrix X (n samples, d features), centre it (subtract the mean), then compute the covariance matrix:

C = (1/n) Xᵀ X

C is a d×d symmetric matrix. Symmetric matrices always have real eigenvalues and orthogonal eigenvectors.

The eigenvectors of C are the principal components — the directions of maximum variance. The eigenvalues tell you how much variance each direction captures.

# PCA from scratch
X_centered = X - X.mean(axis=0)
C = X_centered.T @ X_centered / len(X)
eigenvalues, eigenvectors = np.linalg.eigh(C)

# Sort by descending eigenvalue
idx = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]

# Project onto top k components
k = 2
X_reduced = X_centered @ eigenvectors[:, :k]

The top k eigenvectors capture the most variance. Projecting onto them reduces dimensions while preserving as much information as possible.

Explained variance: eigenvalue λᵢ / Σλ gives the fraction of variance captured by component i. Plotting the cumulative explained variance (scree plot) shows how many components are needed.

Singular Value Decomposition (SVD)

SVD generalises eigendecomposition to non-square matrices:

A = UΣVᵀ

Where:

U (m×m): left singular vectors — eigenvectors of AAᵀ
Σ (m×n): diagonal matrix of singular values (always non-negative)
V (n×n): right singular vectors — eigenvectors of AᵀA

Singular values σᵢ = √λᵢ where λᵢ are eigenvalues of AᵀA.

U, sigma, Vt = np.linalg.svd(A)

Applications:

Image compression: keep only the top k singular values and vectors — low-rank approximation
Recommendation systems: matrix factorisation for collaborative filtering
Pseudoinverse: solve least-squares problems (A⁺ = VΣ⁺Uᵀ)
Dimensionality reduction: PCA is equivalent to SVD on the centred data matrix

PageRank: Eigenvectors of the Web

Google's original PageRank algorithm computes the principal eigenvector of the web's link matrix.

Model the web as a directed graph. Build a transition matrix M where M[j][i] = 1/k if page i has k outgoing links to page j.

The PageRank vector r satisfies:

Mr = r

This is the eigenvector equation with λ = 1. The PageRank of each page is its component in this eigenvector — proportional to how many important pages link to it.

Computing this eigenvector for billions of pages uses the power iteration method:

def power_iteration(M, iterations=100):
    r = np.ones(M.shape[0]) / M.shape[0]   # start uniform
    for _ in range(iterations):
        r = M @ r                           # apply transition
        r = r / r.sum()                     # normalise
    return r

Each iteration brings r closer to the true eigenvector. Convergence is guaranteed when the largest eigenvalue is 1 (which is engineered by the matrix construction).

Key Takeaways

An eigenvector v of matrix A satisfies Av = λv — it is only scaled, not rotated.
Eigenvalues are found by solving det(A - λI) = 0.
Diagonalisation: A = PDP⁻¹ — makes computing Aⁿ trivial.
PCA finds the eigenvectors of the covariance matrix — the directions of maximum variance. Eigenvalues quantify how much variance each direction captures.
SVD (A = UΣVᵀ) generalises eigendecomposition to non-square matrices — used in image compression, recommendation systems, and solving least-squares problems.
PageRank computes the principal eigenvector of a web link matrix using power iteration.