Skip to main content

Arrays and Hash Maps

Arrays and hash maps together cover the majority of day-to-day data structure needs. Understanding how they work internally — not just how to use them — lets you reason confidently about performance.

Arrays

An array stores elements in contiguous memory locations. Each element occupies a fixed number of bytes, so the address of element i is:

address(i) = base_address + i × element_size

This is why indexing is O(1) — computing the address is arithmetic, not a search.

Key Operations

OperationComplexityWhy
Access by indexO(1)Direct address arithmetic
Search (unsorted)O(n)Must scan all elements
Search (sorted, binary search)O(log n)Halving each step
Insert at endO(1) amortisedMay trigger resize
Insert at middleO(n)Must shift elements right
Delete at middleO(n)Must shift elements left

The Resize Mechanism

Dynamic arrays (Python list, Java ArrayList, C++ vector) start with a small capacity and double it when full. When the array is full and you append:

  1. Allocate a new array of double the size.
  2. Copy all n elements to the new array — O(n).
  3. Add the new element — O(1).

Because doubling happens infrequently (after 1, 2, 4, 8, 16... elements), the total cost of n appends is O(n) — amortised O(1) per append.

When to Use Arrays

  • You need fast random access by index.
  • Your data is ordered and you want cache efficiency (contiguous memory = fewer cache misses).
  • You mostly append to the end.

When Arrays Hurt

  • Frequent insertions or deletions in the middle (O(n) shifts).
  • Unknown size that changes dramatically (constant resizing).

Hash Maps

A hash map (dictionary in Python, HashMap in Java, object in JavaScript) stores key-value pairs. It achieves O(1) average time for insert, delete, and lookup using a hash function.

How Hashing Works

Given a key, a hash function produces an integer:

hash("Alice")  93821

This integer is used as an index into an underlying array (called a bucket array):

bucket_index = hash(key) % num_buckets

To look up "Alice": hash it, compute the bucket index, go directly to that slot. O(1) — no searching required.

Handling Collisions

Two different keys can hash to the same bucket (collision). The two main resolution strategies:

Chaining: each bucket holds a linked list of entries. On collision, the new entry is appended to the list. Lookup scans the list at that bucket.

bucket[42]  [("Alice", 25), ("Charlie", 31)]

Open addressing: on collision, probe nearby buckets until an empty one is found. Lookup follows the same probe sequence.

In the average case with a good hash function and a low load factor (ratio of entries to buckets), chains are short and lookup is O(1). In the worst case (all keys collide), it degrades to O(n).

Load Factor and Rehashing

When the load factor exceeds a threshold (typically 0.75), the hash map rehashes: a new bucket array of double the size is allocated and all entries are reinserted. This is O(n) work but happens infrequently — amortised O(1).

Key Operations

OperationAverageWorst Case
InsertO(1)O(n)
DeleteO(1)O(n)
LookupO(1)O(n)

The worst case is rare with a well-implemented hash function. In practice, treat hash maps as O(1).

Common Patterns

Frequency count:

def char_frequency(s):
    freq = {}
    for ch in s:
        freq[ch] = freq.get(ch, 0) + 1
    return freq

Two-sum (avoid the O(n²) nested loop):

def two_sum(nums, target):
    seen = {}                          # value  index
    for i, n in enumerate(nums):
        complement = target - n
        if complement in seen:
            return [seen[complement], i]
        seen[n] = i
    return []

Without the hash map: O(n²). With the hash map: O(n) time, O(n) space.

Grouping:

def group_anagrams(words):
    groups = {}
    for word in words:
        key = tuple(sorted(word))    # "eat", "tea", "ate"  ('a','e','t')
        groups.setdefault(key, []).append(word)
    return list(groups.values())

When to Use Hash Maps

  • You need O(1) lookup by key (not by position).
  • Counting frequencies, grouping, or deduplication.
  • Replacing nested loops: if you find yourself checking "is X in this collection" inside a loop, a hash set or map removes the inner O(n).

When Hash Maps Hurt

  • You need keys in sorted order — use a sorted map (BST-backed) for O(log n) operations.
  • Memory is very tight — hash maps have overhead per bucket.
  • Keys must be hashable — mutable objects (lists, other dicts) cannot be hash map keys.

Arrays vs Hash Maps

ArrayHash Map
Index accessO(1)N/A
Key lookupO(n) unsorted, O(log n) sortedO(1) average
Ordered iterationNaturalNot guaranteed
Memory layoutContiguous, cache-friendlyScattered
Key typeInteger index onlyAny hashable type

Key Takeaways

  • Arrays give O(1) index access because of contiguous memory and direct address arithmetic. Middle insertion and deletion are O(n).
  • Dynamic arrays amortise resizing cost to O(1) per append over a sequence of appends.
  • Hash maps achieve O(1) average-case lookup, insert, and delete via hashing and bucket indexing.
  • Collisions are handled by chaining or open addressing; rehashing keeps the load factor low.
  • The most common algorithm optimisation is replacing an O(n) list search inside a loop with an O(1) hash map lookup.