Skip to main content

Sets and Relations

Set theory is the foundation of mathematics and, more practically, the foundation of relational databases, type theory, and formal specifications. SQL is applied set theory.

Sets

A set is an unordered collection of distinct elements. Order does not matter; duplicates do not exist.

A = {1, 2, 3, 4}
B = {3, 4, 5, 6}
C = {1, 2, 3} = {3, 2, 1}    same set, order irrelevant

Set-builder notation: {x | x , x > 0} — "all integers x such that x is positive."

In Python:

A = {1, 2, 3, 4}
B = {3, 4, 5, 6}

Set Operations

OperationNotationPythonMeaning
UnionA ∪ BA | BElements in A or B
IntersectionA ∩ BA & BElements in both A and B
DifferenceA − BA - BElements in A but not B
Symmetric differenceA △ BA ^ BElements in exactly one
ComplementAᶜU - AElements not in A (relative to universe U)
A = {1, 2, 3, 4}
B = {3, 4, 5, 6}

print(A | B)   # {1, 2, 3, 4, 5, 6}
print(A & B)   # {3, 4}
print(A - B)   # {1, 2}
print(A ^ B)   # {1, 2, 5, 6}

Subset and Power Set

A B (A is a subset of B) means every element of A is also in B.

A B (proper subset): A ⊆ B and A ≠ B.

A = {1, 2}
B = {1, 2, 3}
print(A.issubset(B))   # True  A  B

The power set P(A) is the set of all subsets of A. If |A| = n, then |P(A)| = 2ⁿ.

A = {1, 2, 3}
# P(A) = {, {1}, {2}, {3}, {1,2}, {1,3}, {2,3}, {1,2,3}}
# 8 subsets = 

This is why problems involving "find all subsets" have O(2ⁿ) solutions — you cannot enumerate all subsets faster than that.

Cardinality

The cardinality |A| is the number of elements in A.

Inclusion-exclusion principle: for two sets:

|A  B| = |A| + |B|  |A  B|

If you just add |A| + |B|, elements in the intersection are counted twice. Subtract |A ∩ B| to correct.

Application: how many integers from 1 to 100 are divisible by 2 or 3?

  • |div by 2| = 50, |div by 3| = 33, |div by 6| = 16
  • Answer: 50 + 33 − 16 = 67

Cartesian Products

The Cartesian product A × B is the set of all ordered pairs (a, b) where a ∈ A and b ∈ B.

A = {1, 2}, B = {x, y}
A × B = {(1,x), (1,y), (2,x), (2,y)}
|A × B| = |A| × |B| = 4

This is exactly what an SQL CROSS JOIN computes. Every JOIN in relational algebra starts with a Cartesian product, then filters with a condition.

SELECT * FROM A CROSS JOIN B;   -- A × B
SELECT * FROM A JOIN B ON A.id = B.a_id;   -- filtered Cartesian product

Relations

A relation R from A to B is a subset of A × B. An element (a, b) ∈ R means "a is related to b."

Example: the "less than" relation on integers: R = {(a, b) | a < b}. So (1, 2) ∈ R, (3, 1) ∉ R.

Properties of Relations (on a set A)

PropertyMeaningExample
Reflexive(a, a) ∈ R for all a≤ ("a is ≤ a")
Symmetric(a,b) ∈ R ⟹ (b,a) ∈ R"is sibling of"
Antisymmetric(a,b) ∈ R and (b,a) ∈ R ⟹ a = b
Transitive(a,b) ∈ R and (b,c) ∈ R ⟹ (a,c) ∈ R<

Equivalence Relations

A relation that is reflexive, symmetric, and transitive is an equivalence relation. It partitions the set into equivalence classes — groups of elements that are all related to each other.

Examples:

  • "Same remainder when divided by 3" (modular arithmetic)
  • "Same connected component in a graph"
  • "Same type" in a type system
# Union-Find: efficiently tracks equivalence classes
parent = list(range(n))

def find(x):
    if parent[x] != x:
        parent[x] = find(parent[x])   # path compression
    return parent[x]

def union(x, y):
    parent[find(x)] = find(y)

Union-Find is a direct implementation of equivalence class management — it appears in Kruskal's MST algorithm, connected components, and network problems.

Partial Orders

A relation that is reflexive, antisymmetric, and transitive is a partial order. It defines a ranking where some elements are comparable and some are not.

Example: "is a subset of" — 12 and 23, but 1 and 2 are not related.

Total order (linear order): a partial order where every two elements are comparable. The integers under ≤ are totally ordered. Sorting requires a total order.

Key Takeaways

  • Set operations (union, intersection, difference) directly correspond to SQL JOIN types and Python set operations.
  • |P(A)| = 2ⁿ — the reason subset enumeration is inherently exponential.
  • Inclusion-exclusion: |A ∪ B| = |A| + |B| − |A ∩ B| — avoids double-counting.
  • A Cartesian product A × B is the foundation of relational JOIN.
  • Equivalence relations partition a set into equivalence classes — Union-Find is a practical implementation.
  • Partial orders formalise dependency ordering; total orders enable sorting.