Arrays and Hash Maps — Data Structures & Algorithms | Sabaoon Academy

Arrays and hash maps together cover the majority of day-to-day data structure needs. Understanding how they work internally — not just how to use them — lets you reason confidently about performance.

Arrays

An array stores elements in contiguous memory locations. Each element occupies a fixed number of bytes, so the address of element i is:

address(i) = base_address + i × element_size

This is why indexing is O(1) — computing the address is arithmetic, not a search.

Key Operations

Operation	Complexity	Why
Access by index	O(1)	Direct address arithmetic
Search (unsorted)	O(n)	Must scan all elements
Search (sorted, binary search)	O(log n)	Halving each step
Insert at end	O(1) amortised	May trigger resize
Insert at middle	O(n)	Must shift elements right
Delete at middle	O(n)	Must shift elements left

The Resize Mechanism

Dynamic arrays (Python list, Java ArrayList, C++ vector) start with a small capacity and double it when full. When the array is full and you append:

Allocate a new array of double the size.
Copy all n elements to the new array — O(n).
Add the new element — O(1).

Because doubling happens infrequently (after 1, 2, 4, 8, 16... elements), the total cost of n appends is O(n) — amortised O(1) per append.

When to Use Arrays

You need fast random access by index.
Your data is ordered and you want cache efficiency (contiguous memory = fewer cache misses).
You mostly append to the end.

When Arrays Hurt

Frequent insertions or deletions in the middle (O(n) shifts).
Unknown size that changes dramatically (constant resizing).

Hash Maps

A hash map (dictionary in Python, HashMap in Java, object in JavaScript) stores key-value pairs. It achieves O(1) average time for insert, delete, and lookup using a hash function.

How Hashing Works

Given a key, a hash function produces an integer:

hash("Alice") → 93821

This integer is used as an index into an underlying array (called a bucket array):

bucket_index = hash(key) % num_buckets

To look up "Alice": hash it, compute the bucket index, go directly to that slot. O(1) — no searching required.

Handling Collisions

Two different keys can hash to the same bucket (collision). The two main resolution strategies:

Chaining: each bucket holds a linked list of entries. On collision, the new entry is appended to the list. Lookup scans the list at that bucket.

bucket[42] → [("Alice", 25), ("Charlie", 31)]

Open addressing: on collision, probe nearby buckets until an empty one is found. Lookup follows the same probe sequence.

In the average case with a good hash function and a low load factor (ratio of entries to buckets), chains are short and lookup is O(1). In the worst case (all keys collide), it degrades to O(n).

Load Factor and Rehashing

When the load factor exceeds a threshold (typically 0.75), the hash map rehashes: a new bucket array of double the size is allocated and all entries are reinserted. This is O(n) work but happens infrequently — amortised O(1).

Key Operations

Operation	Average	Worst Case
Insert	O(1)	O(n)
Delete	O(1)	O(n)
Lookup	O(1)	O(n)

The worst case is rare with a well-implemented hash function. In practice, treat hash maps as O(1).

Common Patterns

Frequency count:

def char_frequency(s):
    freq = {}
    for ch in s:
        freq[ch] = freq.get(ch, 0) + 1
    return freq

Two-sum (avoid the O(n²) nested loop):

def two_sum(nums, target):
    seen = {}                          # value → index
    for i, n in enumerate(nums):
        complement = target - n
        if complement in seen:
            return [seen[complement], i]
        seen[n] = i
    return []

Without the hash map: O(n²). With the hash map: O(n) time, O(n) space.

Grouping:

def group_anagrams(words):
    groups = {}
    for word in words:
        key = tuple(sorted(word))    # "eat", "tea", "ate" → ('a','e','t')
        groups.setdefault(key, []).append(word)
    return list(groups.values())

When to Use Hash Maps

You need O(1) lookup by key (not by position).
Counting frequencies, grouping, or deduplication.
Replacing nested loops: if you find yourself checking "is X in this collection" inside a loop, a hash set or map removes the inner O(n).

When Hash Maps Hurt

You need keys in sorted order — use a sorted map (BST-backed) for O(log n) operations.
Memory is very tight — hash maps have overhead per bucket.
Keys must be hashable — mutable objects (lists, other dicts) cannot be hash map keys.

Arrays vs Hash Maps

	Array	Hash Map
Index access	O(1)	N/A
Key lookup	O(n) unsorted, O(log n) sorted	O(1) average
Ordered iteration	Natural	Not guaranteed
Memory layout	Contiguous, cache-friendly	Scattered
Key type	Integer index only	Any hashable type

Key Takeaways

Arrays give O(1) index access because of contiguous memory and direct address arithmetic. Middle insertion and deletion are O(n).
Dynamic arrays amortise resizing cost to O(1) per append over a sequence of appends.
Hash maps achieve O(1) average-case lookup, insert, and delete via hashing and bucket indexing.
Collisions are handled by chaining or open addressing; rehashing keeps the load factor low.
The most common algorithm optimisation is replacing an O(n) list search inside a loop with an O(1) hash map lookup.