Skip to main content

Vectors and Dot Products

Everything in machine learning — images, text, audio, user preferences — is ultimately represented as a vector of numbers. Understanding vectors at the mathematical level gives you real intuition for what ML models are doing.

What Is a Vector?

A vector is an ordered list of numbers. In n-dimensional space, an n-dimensional vector is a point (or arrow from the origin to that point):

v = [3, 1]       # 2D vector
w = [1, 2, 3]    # 3D vector
x = [0.2, -0.5, 0.8, 1.3]   # 4D vector (e.g., a feature vector)

Geometrically in 2D: v = [3, 1] is an arrow pointing 3 units right and 1 unit up.

In NumPy:

import numpy as np
v = np.array([3, 1])
w = np.array([1, 2])

Vector Operations

Addition

Add component-wise. Geometrically: place vectors head-to-tail.

v + w   # [3+1, 1+2] = [4, 3]
[3,1] + [1,2] = [4,3]

Scalar Multiplication

Multiply each component by a scalar. Geometrically: stretches or shrinks the vector (negating reverses direction).

2 * v   # [6, 2]
-1 * v  # [-3, -1]

Magnitude (Length)

The length of a vector, by the Pythagorean theorem extended to n dimensions:

‖v‖ = (v₁² + v₂² + ... + vₙ²)
np.linalg.norm(v)   # ( + ) = √10  3.162

Unit Vector

A vector of magnitude 1. Normalise any vector by dividing by its magnitude:

 = v / ‖v‖
v_hat = v / np.linalg.norm(v)   # unit vector in direction of v

Normalisation appears everywhere in ML: normalising input features, normalising embeddings before computing similarity.

Dot Product

The dot product of two vectors is a scalar:

v · w = v₁w₁ + v₂w₂ + ... + vₙwₙ
np.dot(v, w)      # 3×1 + 1×2 = 5
v @ w             # same, using matrix multiply operator
(v * w).sum()     # same

Geometric Interpretation

v · w = ‖v‖ ‖w‖ cos(θ)

where θ is the angle between the vectors.

This means:

  • v · w > 0: angle < 90°, vectors point in similar directions
  • v · w = 0: angle = 90°, vectors are orthogonal (perpendicular)
  • v · w < 0: angle > 90°, vectors point in opposing directions

Cosine Similarity

By rearranging the geometric formula:

cos(θ) = (v · w) / (‖v‖ ‖w‖)

This is cosine similarity — it measures how similar the directions of two vectors are, ignoring magnitude.

def cosine_similarity(v, w):
    return np.dot(v, w) / (np.linalg.norm(v) * np.linalg.norm(w))

Range: [-1, 1]

  • 1: same direction (maximally similar)
  • 0: orthogonal (no similarity)
  • -1: opposite directions (maximally dissimilar)

Why This Matters for ML

In embedding spaces (text embeddings, image embeddings, user embeddings), semantically similar items are placed in similar directions. Cosine similarity is the standard metric for:

  • Semantic search (find the most similar document to a query)
  • Recommendation systems (find users with similar preference vectors)
  • RAG systems (find the most relevant chunks to include in context)
# Semantic similarity between two text embeddings
query_emb = model.encode("How do I sort a list?")
doc_emb   = model.encode("Python list sorting methods")
similarity = cosine_similarity(query_emb, doc_emb)   #  0.92

Vector Spaces

A vector space is a collection of vectors that is closed under addition and scalar multiplication — adding or scaling vectors always produces another vector in the space.

The span of a set of vectors is all possible linear combinations of those vectors. If vectors v₁, v₂, ..., vₖ span a space, you can reach any point in it by choosing scalars a₁, a₂, ..., aₖ:

point = a₁v₁ + a₂v₂ + ... + aₖvₖ

Neural network layers compute linear combinations of input vectors — this is what a weighted sum (before the activation function) is.

Linear Independence

Vectors are linearly independent if no vector in the set can be expressed as a linear combination of the others. Dependent vectors carry redundant information.

In ML: if two input features are perfectly correlated (linearly dependent), one carries no additional information. This is why feature selection and dimensionality reduction matter.

Basis

A basis of a vector space is a set of linearly independent vectors that span the entire space. Any vector in the space can be uniquely expressed as a linear combination of basis vectors.

The standard basis in 2D: e₁ = [1, 0], e₂ = [0, 1].

[3, 1] = [1,0] + [0,1]

Changing the basis (changing coordinates) is a core operation in PCA and other dimensionality reduction methods.

Key Takeaways

  • A vector is an ordered list of numbers representing a point or direction in n-dimensional space.
  • Magnitude: ‖v‖ = √(Σvᵢ²). Normalising gives a unit vector.
  • Dot product: v · w = Σvᵢwᵢ = ‖v‖‖w‖cos(θ). Positive = similar direction, zero = orthogonal, negative = opposing.
  • Cosine similarity = dot product of normalised vectors — the standard metric for embedding similarity.
  • Neural network layers compute linear combinations (weighted sums) of input vectors.
  • Linearly independent vectors carry non-redundant information; a basis spans a space with minimal vectors.