Set theory is the foundation of mathematics and, more practically, the foundation of relational databases, type theory, and formal specifications. SQL is applied set theory.
Sets
A set is an unordered collection of distinct elements. Order does not matter; duplicates do not exist.
A = {1, 2, 3, 4}
B = {3, 4, 5, 6}
C = {1, 2, 3} = {3, 2, 1} — same set, order irrelevantSet-builder notation: {x | x ∈ ℤ, x > 0} — "all integers x such that x is positive."
In Python:
A = {1, 2, 3, 4}
B = {3, 4, 5, 6}Set Operations
| Operation | Notation | Python | Meaning |
|---|---|---|---|
| Union | A ∪ B | A | B | Elements in A or B |
| Intersection | A ∩ B | A & B | Elements in both A and B |
| Difference | A − B | A - B | Elements in A but not B |
| Symmetric difference | A △ B | A ^ B | Elements in exactly one |
| Complement | Aᶜ | U - A | Elements not in A (relative to universe U) |
A = {1, 2, 3, 4}
B = {3, 4, 5, 6}
print(A | B) # {1, 2, 3, 4, 5, 6}
print(A & B) # {3, 4}
print(A - B) # {1, 2}
print(A ^ B) # {1, 2, 5, 6}Subset and Power Set
A ⊆ B (A is a subset of B) means every element of A is also in B.
A ⊂ B (proper subset): A ⊆ B and A ≠ B.
A = {1, 2}
B = {1, 2, 3}
print(A.issubset(B)) # True — A ⊆ BThe power set P(A) is the set of all subsets of A. If |A| = n, then |P(A)| = 2ⁿ.
A = {1, 2, 3}
# P(A) = {∅, {1}, {2}, {3}, {1,2}, {1,3}, {2,3}, {1,2,3}}
# 8 subsets = 2³This is why problems involving "find all subsets" have O(2ⁿ) solutions — you cannot enumerate all subsets faster than that.
Cardinality
The cardinality |A| is the number of elements in A.
Inclusion-exclusion principle: for two sets:
|A ∪ B| = |A| + |B| − |A ∩ B|If you just add |A| + |B|, elements in the intersection are counted twice. Subtract |A ∩ B| to correct.
Application: how many integers from 1 to 100 are divisible by 2 or 3?
- |div by 2| = 50, |div by 3| = 33, |div by 6| = 16
- Answer: 50 + 33 − 16 = 67
Cartesian Products
The Cartesian product A × B is the set of all ordered pairs (a, b) where a ∈ A and b ∈ B.
A = {1, 2}, B = {x, y}
A × B = {(1,x), (1,y), (2,x), (2,y)}
|A × B| = |A| × |B| = 4This is exactly what an SQL CROSS JOIN computes. Every JOIN in relational algebra starts with a Cartesian product, then filters with a condition.
SELECT * FROM A CROSS JOIN B; -- A × B
SELECT * FROM A JOIN B ON A.id = B.a_id; -- filtered Cartesian productRelations
A relation R from A to B is a subset of A × B. An element (a, b) ∈ R means "a is related to b."
Example: the "less than" relation on integers: R = {(a, b) | a < b}. So (1, 2) ∈ R, (3, 1) ∉ R.
Properties of Relations (on a set A)
| Property | Meaning | Example |
|---|---|---|
| Reflexive | (a, a) ∈ R for all a | ≤ ("a is ≤ a") |
| Symmetric | (a,b) ∈ R ⟹ (b,a) ∈ R | "is sibling of" |
| Antisymmetric | (a,b) ∈ R and (b,a) ∈ R ⟹ a = b | ≤ |
| Transitive | (a,b) ∈ R and (b,c) ∈ R ⟹ (a,c) ∈ R | < |
Equivalence Relations
A relation that is reflexive, symmetric, and transitive is an equivalence relation. It partitions the set into equivalence classes — groups of elements that are all related to each other.
Examples:
- "Same remainder when divided by 3" (modular arithmetic)
- "Same connected component in a graph"
- "Same type" in a type system
# Union-Find: efficiently tracks equivalence classes
parent = list(range(n))
def find(x):
if parent[x] != x:
parent[x] = find(parent[x]) # path compression
return parent[x]
def union(x, y):
parent[find(x)] = find(y)Union-Find is a direct implementation of equivalence class management — it appears in Kruskal's MST algorithm, connected components, and network problems.
Partial Orders
A relation that is reflexive, antisymmetric, and transitive is a partial order. It defines a ranking where some elements are comparable and some are not.
Example: "is a subset of" — 1 ⊆ 2 and 2 ⊆ 3, but 1 and 2 are not related.
Total order (linear order): a partial order where every two elements are comparable. The integers under ≤ are totally ordered. Sorting requires a total order.
Key Takeaways
- Set operations (union, intersection, difference) directly correspond to SQL JOIN types and Python set operations.
- |P(A)| = 2ⁿ — the reason subset enumeration is inherently exponential.
- Inclusion-exclusion: |A ∪ B| = |A| + |B| − |A ∩ B| — avoids double-counting.
- A Cartesian product A × B is the foundation of relational JOIN.
- Equivalence relations partition a set into equivalence classes — Union-Find is a practical implementation.
- Partial orders formalise dependency ordering; total orders enable sorting.