Arrays and hash maps together cover the majority of day-to-day data structure needs. Understanding how they work internally — not just how to use them — lets you reason confidently about performance.
Arrays
An array stores elements in contiguous memory locations. Each element occupies a fixed number of bytes, so the address of element i is:
address(i) = base_address + i × element_sizeThis is why indexing is O(1) — computing the address is arithmetic, not a search.
Key Operations
| Operation | Complexity | Why |
|---|---|---|
| Access by index | O(1) | Direct address arithmetic |
| Search (unsorted) | O(n) | Must scan all elements |
| Search (sorted, binary search) | O(log n) | Halving each step |
| Insert at end | O(1) amortised | May trigger resize |
| Insert at middle | O(n) | Must shift elements right |
| Delete at middle | O(n) | Must shift elements left |
The Resize Mechanism
Dynamic arrays (Python list, Java ArrayList, C++ vector) start with a small capacity and double it when full. When the array is full and you append:
- Allocate a new array of double the size.
- Copy all n elements to the new array — O(n).
- Add the new element — O(1).
Because doubling happens infrequently (after 1, 2, 4, 8, 16... elements), the total cost of n appends is O(n) — amortised O(1) per append.
When to Use Arrays
- You need fast random access by index.
- Your data is ordered and you want cache efficiency (contiguous memory = fewer cache misses).
- You mostly append to the end.
When Arrays Hurt
- Frequent insertions or deletions in the middle (O(n) shifts).
- Unknown size that changes dramatically (constant resizing).
Hash Maps
A hash map (dictionary in Python, HashMap in Java, object in JavaScript) stores key-value pairs. It achieves O(1) average time for insert, delete, and lookup using a hash function.
How Hashing Works
Given a key, a hash function produces an integer:
hash("Alice") → 93821This integer is used as an index into an underlying array (called a bucket array):
bucket_index = hash(key) % num_bucketsTo look up "Alice": hash it, compute the bucket index, go directly to that slot. O(1) — no searching required.
Handling Collisions
Two different keys can hash to the same bucket (collision). The two main resolution strategies:
Chaining: each bucket holds a linked list of entries. On collision, the new entry is appended to the list. Lookup scans the list at that bucket.
bucket[42] → [("Alice", 25), ("Charlie", 31)]Open addressing: on collision, probe nearby buckets until an empty one is found. Lookup follows the same probe sequence.
In the average case with a good hash function and a low load factor (ratio of entries to buckets), chains are short and lookup is O(1). In the worst case (all keys collide), it degrades to O(n).
Load Factor and Rehashing
When the load factor exceeds a threshold (typically 0.75), the hash map rehashes: a new bucket array of double the size is allocated and all entries are reinserted. This is O(n) work but happens infrequently — amortised O(1).
Key Operations
| Operation | Average | Worst Case |
|---|---|---|
| Insert | O(1) | O(n) |
| Delete | O(1) | O(n) |
| Lookup | O(1) | O(n) |
The worst case is rare with a well-implemented hash function. In practice, treat hash maps as O(1).
Common Patterns
Frequency count:
def char_frequency(s):
freq = {}
for ch in s:
freq[ch] = freq.get(ch, 0) + 1
return freqTwo-sum (avoid the O(n²) nested loop):
def two_sum(nums, target):
seen = {} # value → index
for i, n in enumerate(nums):
complement = target - n
if complement in seen:
return [seen[complement], i]
seen[n] = i
return []Without the hash map: O(n²). With the hash map: O(n) time, O(n) space.
Grouping:
def group_anagrams(words):
groups = {}
for word in words:
key = tuple(sorted(word)) # "eat", "tea", "ate" → ('a','e','t')
groups.setdefault(key, []).append(word)
return list(groups.values())When to Use Hash Maps
- You need O(1) lookup by key (not by position).
- Counting frequencies, grouping, or deduplication.
- Replacing nested loops: if you find yourself checking "is X in this collection" inside a loop, a hash set or map removes the inner O(n).
When Hash Maps Hurt
- You need keys in sorted order — use a sorted map (BST-backed) for O(log n) operations.
- Memory is very tight — hash maps have overhead per bucket.
- Keys must be hashable — mutable objects (lists, other dicts) cannot be hash map keys.
Arrays vs Hash Maps
| Array | Hash Map | |
|---|---|---|
| Index access | O(1) | N/A |
| Key lookup | O(n) unsorted, O(log n) sorted | O(1) average |
| Ordered iteration | Natural | Not guaranteed |
| Memory layout | Contiguous, cache-friendly | Scattered |
| Key type | Integer index only | Any hashable type |
Key Takeaways
- Arrays give O(1) index access because of contiguous memory and direct address arithmetic. Middle insertion and deletion are O(n).
- Dynamic arrays amortise resizing cost to O(1) per append over a sequence of appends.
- Hash maps achieve O(1) average-case lookup, insert, and delete via hashing and bucket indexing.
- Collisions are handled by chaining or open addressing; rehashing keeps the load factor low.
- The most common algorithm optimisation is replacing an O(n) list search inside a loop with an O(1) hash map lookup.