When you write code, you make decisions that affect how fast it runs and how much memory it uses. Big O notation is the language programmers use to describe those tradeoffs precisely — not in seconds or megabytes, but in terms of how performance scales as input grows.
The Core Question
Suppose you have a list of 1,000 names and you need to find a specific one. You could read every name from the top until you find it. That might take 1 scan or 1,000 scans — depending on luck. Now imagine the list grows to 1 million names. The time it takes scales with the size of the list.
Big O answers the question: if the input doubles, what happens to the time (or memory) required?
What Big O Measures
Big O notation describes the upper bound of an algorithm's growth rate. It deliberately ignores:
- Hardware speed (your laptop vs a data centre server)
- Constant factors (one loop vs two loops doing the same work)
- Small inputs (small inputs rarely matter in practice)
It focuses on the dominant term — the part that grows fastest as input size (called n) increases.
The Formal Definition (Simplified)
We say a function f(n) is O(g(n)) if there exist constants c and n₀ such that for all n > n₀:
f(n) ≤ c · g(n)In plain terms: after some point, g(n) multiplied by some constant is always an upper bound on f(n). We do not care what happens for tiny inputs; we care about the long-term trend.
A Concrete Example
Consider two functions that both find the maximum value in a list:
def find_max_v1(nums):
max_val = nums[0]
for n in nums: # runs n times
if n > max_val:
max_val = n
return max_val
def find_max_v2(nums):
max_val = nums[0]
for n in nums: # runs n times
for _ in nums: # runs n times per outer loop
if n > max_val:
max_val = n
return max_valBoth return the correct answer. But v1 loops through the list once — it scales with n. v2 has a nested loop — it scales with n². For 1,000 elements, v1 does ~1,000 operations; v2 does ~1,000,000.
Big O lets us write this cleanly: v1 is O(n), v2 is O(n²).
Why Constants Get Dropped
You might wonder: what if v1 actually does three operations per element? Shouldn't it be O(3n) instead of O(n)?
By convention, we drop constant multipliers. O(3n) simplifies to O(n). The reason: constants depend on the specific hardware and compiler, and they do not change the growth shape. An O(n) algorithm always beats an O(n²) algorithm at large enough scale, regardless of the constants.
Best, Average, and Worst Case
Big O typically describes the worst case — the maximum number of operations the algorithm could ever perform for an input of size n. You will also hear about:
- Best case (Ω, "Omega"): the minimum operations (a lucky scenario)
- Average case (Θ, "Theta"): the expected operations across all possible inputs
In practice, worst-case analysis is the most useful because it gives a guarantee. When someone says "this algorithm is O(n log n)," they usually mean in the worst case.
Reading Big O in Code
Here is a quick mental model for identifying complexity as you read code:
| Code Pattern | Complexity |
|---|---|
| Single statement | O(1) |
| One loop over n items | O(n) |
| Two nested loops over n items | O(n²) |
| Halving the input each step | O(log n) |
| Loop + halving | O(n log n) |
| Recursion branching two ways each call | O(2ⁿ) |
You will see all of these patterns in the next lessons with real examples.
Key Takeaways
- Big O describes how an algorithm's resource usage grows as input size grows — not the exact count of operations.
- Constants and lower-order terms are dropped; only the dominant growth term matters.
- Big O typically describes the worst case — a guaranteed upper bound.
- The same algorithm applied to the same problem can have different complexities depending on the approach.