Skip to main content

Combinatorics and Probability

Counting problems are among the most varied in competitive programming. This lesson covers techniques beyond basic C(n,k) — stars and bars, inclusion-exclusion, generating functions, and expected value.

Stars and Bars

Problem: how many ways can you distribute n identical items into k distinct bins?

Answer: C(n + k - 1, k - 1)

Intuition: imagine n stars and k-1 bars. Each arrangement of stars and bars represents a distribution. Total objects: n + k - 1, choose positions for k-1 bars.

def distribute(n, k):
    return comb(n + k - 1, k - 1)

distribute(5, 3)   # ways to put 5 apples in 3 bowls = C(7,2) = 21

Variant — minimum 1 per bin: give each bin 1 item first, then distribute remaining n-k with no minimum:

def distribute_min1(n, k):
    if n < k: return 0
    return comb(n - 1, k - 1)

Variant — maximum m per bin: use inclusion-exclusion (see below).

Inclusion-Exclusion Principle

For n sets A₁, A₂, ..., Aₙ:

|A₁  A₂  ...  Aₙ| = Σ|Aᵢ| - Σ|Aᵢ  Aⱼ| + Σ|Aᵢ  Aⱼ  Aₖ| - ...

Application — derangements:

A derangement is a permutation where no element appears in its original position. D(n) = number of derangements of n elements.

By inclusion-exclusion over "element i is in position i":

D(n) = n! × Σ(-1)/k! for k=0..n
      n!/e   for large n
def derangements(n, mod=MOD):
    d = [0] * (n + 1)
    d[0] = 1
    d[1] = 0
    for i in range(2, n + 1):
        d[i] = (i - 1) * (d[i-1] + d[i-2]) % mod
    return d[n]

Application — surjections:

Number of surjective (onto) functions from n elements to k elements:

surj(n, k) = Σ(-1) C(k, i) (k-i)  for i=0..k
def surjections(n, k, mod=MOD):
    result = 0
    for i in range(k + 1):
        sign = 1 if i % 2 == 0 else -1
        result += sign * comb(k, i) * pow(k - i, n, mod)
        result %= mod
    return result

Catalan Numbers

The nth Catalan number:

Cₙ = C(2n, n) / (n + 1)

First few: 1, 1, 2, 5, 14, 42, 132, 429, ...

Catalan numbers count many equivalent structures:

  • Ways to fully parenthesise n+1 factors
  • Number of valid bracket sequences of length 2n
  • Paths from (0,0) to (n,n) not crossing the diagonal
  • Binary trees with n+1 leaves
  • Triangulations of a convex (n+2)-gon
def catalan(n, mod=MOD):
    return comb(2*n, n) * mod_inv(n + 1) % mod

Generating Functions

A generating function encodes a sequence a₀, a₁, a₂, ... as coefficients of a polynomial or power series:

A(x) = a₀ + a₁x + a₂x² + ...

Generating functions turn combinatorial problems into algebra.

Example — coin change count:

Number of ways to make amount n using coins of denominations d₁, d₂, ...:

The generating function is the product:

G(x) = (1 + x^d₁ + x^(2d₁) + ...) × (1 + x^d₂ + x^(2d₂) + ...) × ...

The coefficient of xⁿ in G(x) is the answer. Computing via DP is equivalent to expanding this product.

Convolution of two sequences (a₀, a₁, ...) and (b₀, b₁, ...) produces (c₀, c₁, ...) where cₙ = Σaₖbₙ₋ₖ — the coefficients of the product A(x)B(x). This is polynomial multiplication.

For large n, polynomial multiplication can be computed in O(n log n) using Fast Fourier Transform (FFT):

import numpy as np

def poly_multiply(a, b):
    # Multiply polynomials a and b using FFT
    n = len(a) + len(b) - 1
    fa = np.fft.rfft(a, n)
    fb = np.fft.rfft(b, n)
    return np.fft.irfft(fa * fb, n).round().astype(int)

Probability and Expected Value

Expected value E[X] = Σ x × P(X = x) — the probability-weighted average outcome.

Linearity of expectation: E[X + Y] = E[X] + E[Y], even if X and Y are not independent. This is the key tool for expected value problems.

Example — expected number of comparisons in a random sort:

The expected number of inversions when sorting a random permutation of n elements is n(n-1)/4. By linearity of expectation: sum the probability that each pair (i, j) is inverted = ½ each, over C(n,2) pairs.

Example — coupon collector problem:

How many random draws (with replacement from n coupons) until you collect all n?

Expected draws: n × Hₙ = n × (1 + 1/2 + 1/3 + ... + 1/n) ≈ n ln(n)

from math import log

def expected_coupon_collector(n):
    return n * sum(1/k for k in range(1, n+1))
    #  n * log(n) for large n

Probability with DP

Random walk: expected number of steps to reach state n from state 0, where each step goes right with probability p and left with probability (1-p).

Expected value DP: define E[i] = expected steps from state i to reach n.

# Expected steps from 0 to n, p=probability of going right
# E[n] = 0
# E[i] = 1 + p*E[i+1] + (1-p)*E[i-1]
# Solve the linear system (or use the closed form: E[0] = n/p - (1-p)/(2p-1) × ...)

Key Identities to Memorise

IdentityFormula
Vandermonde'sC(m+n, r) = Σ C(m,k) C(n, r-k)
Hockey stickΣ C(i,r) for i=r..n = C(n+1, r+1)
Sum of rowΣ C(n,k) for k=0..n = 2ⁿ
Negative binomialC(-n, k) = (-1)^k C(n+k-1, k)
DerangementsD(n) = (n-1)(D(n-1) + D(n-2))
CatalanCₙ = C(2n,n)/(n+1) = Σ Cₖ Cₙ₋₁₋ₖ

Key Takeaways

  • Stars and bars: distribute n identical into k distinct bins = C(n+k-1, k-1).
  • Inclusion-exclusion: size of union = sum of individuals - sum of pairwise intersections + ...
  • Derangements: D(n) = (n-1)(D(n-1) + D(n-2)); D(0)=1, D(1)=0.
  • Catalan numbers Cₙ = C(2n,n)/(n+1) count parenthesisations, valid bracket sequences, monotone paths.
  • Generating functions encode sequences as polynomials; convolution = polynomial multiplication = O(n log n) with FFT.
  • Linearity of expectation: E[X+Y] = E[X]+E[Y] regardless of independence — the most powerful probability tool in contests.