Data Structures Internals in Python

Deep dive into how Set, Dict, OrderedDict, HeapQ, List, and Deque are implemented internally in CPython

OrderedDict
HeapQ
Set
Dict & DefaultDict
List
Deque
Hash Tables - Deep Dive
Self-Balancing Binary Search Trees

1. OrderedDict

Overview

OrderedDict is a dict subclass that has methods specialized for rearranging dictionary order. It maintains insertion order while providing O(1) operations.

Key Methods

popitem(last=True) - Returns and removes a (key, value) pair in LIFO order if last is true or FIFO order if false.

move_to_end(key, last=True) - Moves an existing key to either end of an ordered dictionary. Useful in LRU Cache implementations.

d = OrderedDict.fromkeys('abcde')
d.move_to_end('b')
''.join(d)  # 'acdeb'

d.move_to_end('b', last=False)
''.join(d)  # 'bacde'

Internal Implementation (CPython)

A hash table (normal dict) + a doubly linked list

This design gives:

O(1) lookup (via hash table)
O(1) insertion order maintenance (via linked list)
O(1) deletion while preserving order

The OrderedDict Object

typedef struct {
    PyDictObject dict;        // underlying dict
    _ODictNode *od_first;     // head of linked list
    _ODictNode *od_last;      // tail of linked list
    Py_ssize_t od_state;      // mutation counter
} PyODictObject;

It literally embeds a PyDictObject, and order is maintained outside the hash table.

The Linked List Node

typedef struct _odictnode {
    PyObject *key;
    Py_hash_t hash;
    struct _odictnode *next;
    struct _odictnode *prev;
} _ODictNode;

Critical Detail: Each key in the dict has exactly one node in this doubly linked list.

How Lookup Works (Critical Detail)

Unlike a naive design, OrderedDict does not store nodes inside dict values.

Instead:

The dict maps key → node
The node contains the key + hash + pointers

So conceptually:

dict[key] → _ODictNode*

This avoids extra hash lookups.

Insertion Flow

od[key] = value

Compute hash(key)
Check if key already exists in dict
If new key:
- Allocate _ODictNode
- Append node to tail of linked list
- Insert key → node into dict
Store the actual value in a parallel structure managed by dict

Important: Order is updated only on first insertion, not on overwrite.

Deletion Flow

del od[key]

Lookup node via dict
Unlink node from doubly linked list
Remove entry from dict
Free node

All steps are O(1).

Iteration (Why It's Fast)

Iteration does not touch the hash table order at all.

od_first → node → node → node → od_last

This is why:

Iteration is stable
Reverse iteration is trivial
popitem(last=True) is O(1)

Move-to-End Operation

This is the secret sauce behind LRU caches.

od.move_to_end(key, last=True)

Internally:

Find node in dict
Unlink it
Reinsert at head or tail

No rehashing. No copying.

Operation Complexities

Operation	Complexity	Explanation
`popitem(last=True)`	O(1)	Head/tail removal
`move_to_end(key, last=True)`	O(1)	Unlink + relink node
`od[key] = value`	O(1) avg amortized	Hash lookup + append node if new
`od[key]`	O(1) avg amortized	Dict lookup
`del od[key]`	O(1) avg amortized	Dict delete + unlink node

2. HeapQ

Overview

The heapq module provides an implementation of the heap queue algorithm, also known as the priority queue algorithm. Min-heaps are binary trees for which every parent node has a value less than or equal to any of its children (heap invariant).

Why Use HeapQ?

Provides an efficient way to implement priority queues using heaps
Helps maintain a list in heap order with minimal code and high performance
Useful in algorithms like Dijkstra's, Huffman encoding, or any task requiring quick access to smallest element
Offers functions like heapify(), heappush(), and heappop() for efficient insertion and removal

Function Complexities

Function	Complexity	Why
`heappush(heap, x)`	O(log n)	Sift-up
`heappop(heap)`	O(log n)	Sift-down
`heapreplace(heap, x)`	O(log n)	Pop + push combined
`heappushpop(heap, x)`	O(log n) (best-case O(1))	Conditional sift
`heapify(x)`	O(n)	Floyd's algorithm

Internal Implementation

Core design choice: A heap is just a Python list.

There is:

No Heap class, no node objects, no tree pointers
Just heap = []

The heap invariant (min-heap):

heap[i] <= heap[2*i + 1]
heap[i] <= heap[2*i + 2]

Why an Array-Based Heap?

Python design constraints:

Object allocation is expensive
Pointer chasing is cache-unfriendly
Python comparisons are slow

So heapq is optimized to:

Minimize comparisons
Minimize assignments
Allocate nothing

Example Code

import heapq

li = [25, 20, 15, 30, 40]

# Convert the list into a heap
heapq.heapify(li)

print("Heap queue:", li)

The Only Two Real Operations

Everything in heapq reduces to two internal helpers:

_siftdown → used when inserting

Moves an element up the tree.

_siftup → used when removing

Moves an element down the tree.

_siftdown (Bubble Up)

Called after inserting at the end.

Simplified logic (from heapq.py):

def _siftdown(heap, startpos, pos):
    newitem = heap[pos]
    while pos > startpos:
        parentpos = (pos - 1) >> 1
        parent = heap[parentpos]
        if newitem < parent:
            heap[pos] = parent
            pos = parentpos
        else:
            break
    heap[pos] = newitem

Key Internal Optimizations

Uses >> 1 instead of / 2
Moves parents down, inserts once
Avoids repeated swapping

⏱ O(log n) comparisons, minimal writes.

_siftup (Bubble Down)

Used after popping the root.

def _siftup(heap, pos):
    endpos = len(heap)
    startpos = pos
    newitem = heap[pos]
    childpos = 2*pos + 1

    while childpos < endpos:
        rightpos = childpos + 1
        if rightpos < endpos and heap[rightpos] < heap[childpos]:
            childpos = rightpos
        heap[pos] = heap[childpos]
        pos = childpos
        childpos = 2*pos + 1

    heap[pos] = newitem
    _siftdown(heap, startpos, pos)

Why _siftdown Again at the End?

This is a classic CPython micro-optimization:

_siftup pushes the "hole" to a leaf
_siftdown fixes ordering on the way up
Fewer comparisons than swapping at every level

heappush

def heappush(heap, item):
    heap.append(item)
    _siftdown(heap, 0, len(heap) - 1)

That's literally it.

heappop

def heappop(heap):
    lastelt = heap.pop()
    if heap:
        returnitem = heap[0]
        heap[0] = lastelt
        _siftup(heap, 0)
        return returnitem
    return lastelt

Important detail:

Last element is popped first → avoids shifting the list
Root is replaced, then fixed

Why heapify is O(n)

def heapify(x):
    for i in reversed(range(len(x)//2)):
        _siftup(x, i)

Explanation:

Leaves are already heaps
Internal nodes are fixed bottom-up
Total work sums to O(n) (Floyd's algorithm)

How to Implement Max Heap?

Use:

heapq.heappush(heap, -x)

That is, store and pop elements by their negative values.

3. Set

Overview

A Set in Python is used to store a collection of items with the following properties:

No duplicate elements
An unordered collection
Internally uses hashing for efficient search, insert, and delete operations
Mutable (we can add or remove elements after creation)

Operation Complexities

Operation	Complexity (avg)	Worst	Notes
`x in s`	O(1)	O(n)	Hash lookup
`s.add(x)`	O(1)	O(n)	May resize
`s.remove(x)`	O(1)	O(n)	Raises KeyError
`s.discard(x)`	O(1)	O(n)	No error
`len(s)`	O(1)	—	Stored count
`s.clear()`	O(n)	—	Frees table
`x not in s`	O(1)	O(n)	Negated lookup

Operations NOT Supported

Operation	Reason
Indexing (`s[0]`)	No order
Slicing	No order
Stable iteration	Hash-table scan
Sorting in-place	Not meaningful

Internal Implementation

A Python set is a hash table, not a tree, not a bitset.

It is very similar to dict, but:

Stores only keys (no values)
Uses a slightly simpler entry structure
Is optimized for membership testing

High-Level Structure

typedef struct {
    PyObject_HEAD
    Py_ssize_t fill;        // active + dummy entries
    Py_ssize_t used;        // active entries only
    Py_ssize_t mask;        // table size - 1
    setentry *table;        // hash table
    Py_hash_t hash;         // for frozenset only
    Py_ssize_t finger;      // iteration state
} PySetObject;

What These Mean

Field	Meaning
`used`	Number of live elements
`fill`	Live + deleted ("dummy")
`mask`	table_size - 1 (power of two)
`table`	Pointer to entry array
`finger`	Iterator progress
`hash`	Cached hash (frozenset only)

Entry Structure

typedef struct {
    PyObject *key;
    Py_hash_t hash;
} setentry;

Special values:

key == NULL → empty slot
key == dummy → deleted slot (dummy is a global singleton marker)

Hashing & Indexing

Index calculation:

i = hash & mask;

Table size is always a power of two, so modulo is a cheap bitmask.

Collision Resolution (THIS IS KEY)

Python sets use open addressing with perturbation probing (not chaining, not linear probing).

Probe sequence:

i = (i * 5 + 1 + perturb) & mask;
perturb >>= 5;

Why this design?

Starts with hash bits
Gradually mixes in higher bits
Avoids clustering
Extremely cache-friendly

This same scheme is used by dict.

Lookup Operation (x in set)

Steps

Compute hash(x)
Compute initial index: hash & mask
Probe until:
- matching hash + equality → found
- empty slot → not found

Complexity

Case	Cost
Average	O(1)
Worst (pathological hashes)	O(n)

Python randomizes hashes to prevent DOS attacks.

Insertion (set.add(x))

Lookup slot using probing
If key already exists → do nothing
Else:
- Insert (key, hash)
- Increment used and fill
Check load factor → resize if needed

Load Factor Rules

Resize when:

fill * 3 >= mask * 2  (~66%)

This keeps probe lengths short.

Deletion (set.remove, discard)

Deletion does not clear the slot.

Instead:

entry->key = dummy;

Why?

Clearing breaks probe chains
Dummy preserves search correctness

Too many dummies → triggers resize.

Resizing (Rehashing)

When resizing:

Allocate new table (2× or ½)
Reinsert only live keys
Drop dummy entries

Cost: O(n), but amortized O(1) per operation.

Iteration

Iteration is a linear scan over the table:

for i in range(table_size):
    if entry[i].key is real:
        yield entry[i].key

Important Consequences

Order is arbitrary
Mutation during iteration raises RuntimeError
Iteration cost is O(table_size), not O(len(set))

4. Dict & DefaultDict

Overview

Python's dict (dictionary) is one of the most fundamental and heavily optimized data structures in the language. It implements an associative array that maps hashable keys to arbitrary values.

Dict vs Set

A dict is essentially a set with values attached:

Set: Stores only keys
Dict: Stores key-value pairs

Both use the same underlying hash table implementation with perturbation probing.

Internal Structure

Modern Python dicts (3.6+) use a compact dict design:

typedef struct {
    PyObject_HEAD
    Py_ssize_t ma_used;          // number of items
    PyDictKeysObject *ma_keys;   // keys table
    PyObject **ma_values;        // values array
} PyDictObject;

The Compact Design

Before Python 3.6, dicts stored entries in a sparse array. The new design uses two arrays:

Indices array: Sparse, stores indices into entries array
Entries array: Dense, stores actual (key, value, hash) tuples

Benefits:

Preserves insertion order (as of Python 3.7, this is guaranteed)
Uses 20-25% less memory
Faster iteration (iterate over dense array, not sparse)

Operation Complexities

Operation	Average	Worst	Notes
`d[key]`	O(1)	O(n)	Hash lookup
`d[key] = value`	O(1)	O(n)	May resize
`del d[key]`	O(1)	O(n)	Marks entry as deleted
`key in d`	O(1)	O(n)	Hash lookup
`d.get(key, default)`	O(1)	O(n)	Safe lookup
Iteration	O(n)	O(n)	Linear scan of entries

DefaultDict

collections.defaultdict is a subclass of dict that provides a default value for missing keys:

from collections import defaultdict

# Instead of KeyError, returns default value
d = defaultdict(int)
d['missing']  # Returns 0

# Common use case: grouping
groups = defaultdict(list)
for item in items:
    groups[item.category].append(item)

Implementation: DefaultDict simply overrides __missing__() to call the default_factory function when a key is not found.

Dict Views

Python dicts provide dynamic views:

d = {'a': 1, 'b': 2}
keys = d.keys()      # dict_keys view
values = d.values()  # dict_values view
items = d.items()    # dict_items view

These views are dynamic - they reflect changes to the dictionary in real-time and don't create a copy.

5. List

Overview

Python's list is a dynamic array that can grow or shrink as needed. It's one of the most commonly used data structures in Python.

Internal Implementation

A list is a dynamic array with over-allocation for efficiency.

typedef struct {
    PyObject_VAR_HEAD
    PyObject **ob_item;    // pointer to array of object pointers
    Py_ssize_t allocated;  // how many slots allocated
} PyListObject;

Key points:

ob_size: Current number of elements (from PyObject_VAR_HEAD)
allocated: Total capacity (always ≥ ob_size)
ob_item: Array of pointers to Python objects

Growth Strategy

When a list needs to grow, it doesn't just add one slot. Instead, it over-allocates:

Growth pattern:

new_allocated = (size >> 3) + (size < 9 ? 3 : 6) + size

This means:

Small lists: Grow by 3-6 slots
Larger lists: Grow by ~12.5% (size/8 + size)

Why over-allocate? Makes append operations amortized O(1) instead of O(n).

Operation Complexities

Operation	Complexity	Notes
`list[i]`	O(1)	Direct array access
`list.append(x)`	O(1) amortized	May trigger reallocation
`list.insert(i, x)`	O(n)	Shifts all elements after i
`list.pop()`	O(1)	Remove from end
`list.pop(i)`	O(n)	Shifts all elements after i
`list.remove(x)`	O(n)	Search + shift
`x in list`	O(n)	Linear search
`list.sort()`	O(n log n)	Timsort algorithm
`list.reverse()`	O(n)	In-place reversal
Slicing `list[a:b]`	O(b-a)	Creates new list

Memory Considerations

Lists store pointers, not values directly:

# Each list element is a pointer (8 bytes on 64-bit)
lst = [1, 2, 3]  # Stores 3 pointers to int objects

# The actual integers are separate Python objects
# List overhead: ~56 bytes + 8 bytes per element

Why Append is Fast

Example of list growth:

lst = []
for i in range(10):
    lst.append(i)
    print(f"size={len(lst)}, allocated={sys.getsizeof(lst)}")

Output shows allocated space grows in jumps: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88...

This means appending 10 items only triggers ~4 reallocations instead of 10.

6. Deque

Overview

collections.deque (double-ended queue) is optimized for fast appends and pops from both ends. Unlike lists, deques use a different internal structure.

Internal Implementation

A deque is implemented as a doubly-linked list of blocks (arrays).

Structure:

typedef struct {
    PyObject_VAR_HEAD
    block *leftblock;      // leftmost block
    block *rightblock;     // rightmost block
    Py_ssize_t leftindex;  // index in leftblock
    Py_ssize_t rightindex; // index in rightblock
    size_t state;          // modification counter
    Py_ssize_t maxlen;     // max size (or -1)
} dequeobject;

Each block:

#define BLOCKLEN 64  // elements per block

typedef struct block {
    struct block *leftlink;
    struct block *rightlink;
    PyObject *data[BLOCKLEN];
} block;

Why This Design?

Hybrid approach combining benefits of:

Linked list: O(1) insertion/deletion at ends
Array: Good cache locality within blocks

Instead of one node per element, deque uses blocks of 64 elements, reducing memory overhead and improving cache performance.

Operation Complexities

Operation	Complexity	Notes
`deque.append(x)`	O(1)	Add to right end
`deque.appendleft(x)`	O(1)	Add to left end
`deque.pop()`	O(1)	Remove from right end
`deque.popleft()`	O(1)	Remove from left end
`deque[i]`	O(n)	Must traverse blocks
`deque.rotate(k)`	O(k)	Rotate elements
`x in deque`	O(n)	Linear search
`len(deque)`	O(1)	Stored count

Deque vs List

Operation	List	Deque
Append right	O(1) amortized	O(1)
Append left	O(n)	O(1)
Pop right	O(1)	O(1)
Pop left	O(n)	O(1)
Index access	O(1)	O(n)
Memory	Contiguous	Blocks

Use deque when:

You need fast operations at both ends
Implementing queues or stacks
You don't need random access

Use list when:

You need fast random access
You mostly append/pop from one end
You need contiguous memory

Bounded Deque

Deques can have a maximum length:

from collections import deque

# Bounded deque - automatically drops from opposite end
dq = deque(maxlen=3)
dq.extend([1, 2, 3])  # deque([1, 2, 3])
dq.append(4)          # deque([2, 3, 4]) - 1 was dropped

This is perfect for implementing:

Fixed-size buffers
Sliding windows
Recent history tracking

7. Hash Tables - Deep Dive

What a Hash Table Fundamentally Is

A hash table implements a mapping: key → value

With the goal of:

O(1) average lookup
O(1) average insert
O(1) average delete

It achieves this by turning a key into an array index using a hash function.

Hash Functions

A hash function maps a key to an integer: h = hash(key)

Good hash functions aim for:

Determinism: same key → same hash
Uniformity: hashes spread evenly
Speed: called constantly
Avalanche effect: small key change → big hash change

From Hash to Table Index

Hash tables store data in an array of size N. To map a hash to an index:

index = hash % N

In real implementations (including Python):

index = hash & (N - 1)

This only works when N is a power of two (8, 16, 32, 64, 128...). Bitwise AND is much faster than modulo.

Collision Resolution Strategies

A) Separate Chaining

Each bucket holds a linked list: index 3 → (k1,v1) → (k2,v2) → (k3,v3)

Simple, but pointer-heavy and cache-unfriendly. Used in many textbooks, not used in Python.

B) Open Addressing (Python uses this)

All entries live inside the array itself. If a slot is taken, you probe for another slot.

Perturbation Probing (Python's choice):

i = hash & mask
perturb = hash

while slot not usable:
    i = (i * 5 + 1 + perturb) & mask
    perturb >>= 5

Why this is brilliant:

Starts with low hash bits
Gradually mixes in high bits
Avoids clustering
Deterministic but well-distributed
Extremely cache-friendly

Deletion - The Tricky Part

You cannot just clear a slot. Why?

A collides → placed later
B collides → placed even later

If A is deleted and slot cleared,
lookup(B) will stop early → BUG

Solution: DUMMY markers

Instead of clearing: slot = DUMMY

This preserves probe chains. Too many DUMMY slots triggers a resize.

Load Factor & Resizing

Load factor = used_slots / table_size

Python keeps it below ~66%

When exceeded:

Allocate new table (≈ 2× size)
Reinsert all ACTIVE entries
Drop DUMMY slots

Cost: O(n), but amortized O(1) per operation.

Why Hash Tables Beat Trees in Python

Structure	Access	Cache	Allocations
Hash table	O(1)	Excellent	Minimal
Tree	O(log n)	Poor	Many
Linked list	O(n)	Poor	Many

Hash tables align perfectly with modern CPUs.

Hash Randomization (Security)

Python randomizes hash seeds at startup. This prevents attackers from crafting keys that all collide, which stopped real-world DoS attacks.

8. Self-Balancing Binary Search Trees

What are Self-Balancing Binary Search Trees?

In computer science, a self-balancing binary search tree (BST) is any node-based binary search tree that automatically keeps its height (maximal number of levels below the root) small in the face of arbitrary item insertions and deletions.

The Height Problem

Most operations on a binary search tree (BST) take time directly proportional to the height of the tree, so it is desirable to keep the height small.

A binary tree with height h can contain at most:

2^0 + 2^1 + ... + 2^h = 2^(h+1) - 1 nodes

This implies that for any tree with n nodes and height h:

n ≤ 2^(h+1) - 1

Which means:
h ≥ ⌈log₂(n+1) - 1⌉ ≥ ⌊log₂(n)⌋

In other words, the minimum height of a binary tree with n nodes is log₂(n), rounded down: ⌊log₂(n)⌋

The Degenerate Case

However, the simplest algorithms for BST item insertion may yield a tree with height n in rather common situations. For example, when items are inserted in sorted key order, the tree degenerates into a linked list with n nodes.

The difference in performance may be enormous:

For n = 1,000,000:
Minimum height: ⌊log₂(1,000,000)⌋ = 19
Worst-case height: 1,000,000

That's a 50,000x difference!

The Solution

Self-balancing binary trees solve this problem by performing transformations (such as tree rotations) at key insertion times, to keep the height proportional to log₂(n). Although a certain overhead is involved, it is not bigger than the always necessary lookup cost.

Language Implementations

Language	Data Structure	Implementation
C++ STL	`set`, `map`	Red-Black Tree
Java	`TreeSet`, `TreeMap`	Red-Black Tree
Python	No native support	Use `bisect` module or PyPI: `rbtree`, `pyavl`

Use Cases for Self-Balancing BSTs

Self-balancing BSTs can be used in a natural way to construct and maintain:

Ordered lists and priority queues
Associative arrays (key-value pairs with ordering based on keys)
Applications requiring ordered enumeration of items

Advantages vs Hash Tables

Ordered traversal: Fast (O(n)) enumeration of items in key order
Better worst-case: O(log n) vs O(n) for hash tables
Range queries: Can efficiently find all keys in a range
No hash function needed: Works with any comparable type

Disadvantages vs Hash Tables

More complex: Lookup algorithms get complicated with duplicate keys
Worse average-case: O(log n) compared to O(1) for hash tables
Cache unfriendly: Pointer chasing through tree nodes
More memory overhead: Need to store pointers for tree structure

AVL Trees

Definition

An AVL tree is a self-balancing Binary Search Tree (BST) where the difference between heights of left and right subtrees for any node cannot be more than one.

Balance Factor: For each node:

balance_factor = height(left_subtree) - height(right_subtree)

For an AVL tree: -1 ≤ balance_factor ≤ +1

The values above nodes represent the difference between left and right subtree heights:

Example AVL Tree:

        12 (+1)
       /      \
    8 (+1)    18 (+1)
   /   \      /
5 (+1) 11(0) 17(0)
/
4(0)

Each node shows its balance factor in parentheses.

Insertion in AVL Tree

Insertion follows the same basic rules as in a Binary Search Tree (BST):

A new key is placed in its correct position based on BST rules (left < node < right)
After insertion, the balance factor of each node is checked during the path back up to the root
If any node becomes unbalanced (balance factor < -1 or > +1), a rotation is required

Rotation Operations

Two basic operations can rebalance a BST without violating the BST property:

Left Rotation
Right Rotation

Right Rotation:
     y                x
    / \              / \
   x   T3    →     T1  y
  / \                 / \
 T1  T2              T2  T3

Left Rotation:
   x                  y
  / \                / \
 T1  y       →      x   T3
    / \            / \
   T2  T3         T1  T2

Four Cases for Rebalancing

1. Left-Left Case: Node is inserted in left subtree of left child

Solution: Single right rotation at the unbalanced node

       z                y
      / \              / \
     y   T4    →      x   z
    / \                  / \
   x   T3               T3  T4
  / \
 T1  T2

2. Left-Right Case: Node is inserted in right subtree of left child

Solution: Left rotation on left child, then right rotation on root

     z                z                x
    / \              / \              / \
   y   T4    →      x   T4     →    y   z
  / \              / \              / \   \
 T1  x            y   T3           T1 T2  T3 T4
    / \          / \
   T2  T3       T1  T2

3. Right-Right Case: Node is inserted in right subtree of right child

Solution: Single left rotation at the unbalanced node

   z                  y
  / \                / \
 T1  y       →      z   x
    / \            / \   \
   T2  x          T1 T2  T3 T4
      / \
     T3  T4

4. Right-Left Case: Node is inserted in left subtree of right child

Solution: Right rotation on right child, then left rotation on root

   z                z                  x
  / \              / \                / \
 T1  y      →     T1  x         →   z   y
    / \              / \            / \   \
   x   T4           T2  y          T1 T2  T3 T4
  / \                  / \
 T2  T3               T3  T4

Insertion Algorithm

Follow these steps:

Perform the normal BST insertion
Update the height of the current node (ancestor of newly inserted node)
Get the balance factor: left_height - right_height
If balance factor > 1: Left-heavy case
- Compare new key with left child's key
- If new key < left child key → Left-Left case → right rotate
- If new key > left child key → Left-Right case → left rotate child, then right rotate
If balance factor < -1: Right-heavy case
- Compare new key with right child's key
- If new key > right child key → Right-Right case → left rotate
- If new key < right child key → Right-Left case → right rotate child, then left rotate

Time Complexity

Operation	Complexity	Explanation
Search	O(log n)	Tree height is O(log n)
Insert	O(log n)	Search + constant rotation operations
Delete	O(log n)	Search + constant rotation operations
Rotation	O(1)	Only pointer changes

Deletion in AVL Tree

To ensure the tree remains AVL after deletion:

Perform standard BST delete
Update heights of affected nodes
Get balance factor of each ancestor
If unbalanced, perform appropriate rotations (same cases as insertion)

Important differences from insertion:

Deletion may require multiple rotations up the path to root
Insertion requires at most one rotation (or two for double rotation)

Auxiliary Space

O(log n) for recursion call stack (since we use recursive insertion/deletion).

Why Python Doesn't Use Self-Balancing BSTs

Python's standard library does not support self-balancing BSTs natively because:

Hash tables are faster: O(1) average vs O(log n)
Hash tables are simpler: Less code complexity
Hash tables are more cache-friendly: Better for modern CPUs
Ordered iteration less common: Most use cases don't need it
Alternatives exist: bisect module for sorted lists, or PyPI packages

When to use PyPI BST implementations (rbtree, pyavl):

You need guaranteed O(log n) worst-case performance
You need efficient ordered iteration
You need range queries (find all keys between x and y)
You need to find predecessor/successor of keys

Summary

Key Takeaways:

OrderedDict: Combines hash table + doubly linked list for O(1) ordered operations
HeapQ: Array-based min-heap with O(log n) push/pop and O(n) heapify
Set: Hash table optimized for membership testing with O(1) average operations
Dict: Compact hash table design with insertion order preservation (Python 3.7+)
List: Dynamic array with over-allocation for O(1) amortized appends
Deque: Doubly-linked list of blocks for O(1) operations at both ends
Hash Tables: Use open addressing with perturbation probing for excellent cache performance
Python's Philosophy: Prioritize cache-friendly, allocation-light data structures over theoretical complexity

Quick Reference Table

Data Structure	Best For	Worst Operation	Memory
List	Random access, iteration	Insert/delete in middle: O(n)	Contiguous, over-allocated
Deque	Queue, stack operations	Random access: O(n)	Blocks of 64 elements
Dict	Key-value mapping	Worst-case lookup: O(n)	Sparse indices + dense entries
Set	Membership testing, uniqueness	Worst-case lookup: O(n)	Hash table, no values
OrderedDict	Ordered key-value, LRU cache	Extra memory for linked list	Dict + doubly linked list
HeapQ	Priority queue, top-k problems	Not for random access	Just a list

Understanding these internals helps you write more efficient Python code and make better decisions about which data structure to use for your specific use case.

Data Structures Internals in Python

Table of Contents

1. OrderedDict

Overview

Key Methods

Internal Implementation (CPython)

The OrderedDict Object

The Linked List Node

How Lookup Works (Critical Detail)

Insertion Flow

Deletion Flow

Iteration (Why It's Fast)

Move-to-End Operation

Operation Complexities

2. HeapQ

Overview

Why Use HeapQ?

Function Complexities

Internal Implementation

Why an Array-Based Heap?

Example Code

The Only Two Real Operations

_siftdown (Bubble Up)

Key Internal Optimizations

_siftup (Bubble Down)

Why _siftdown Again at the End?

heappush

heappop

Why heapify is O(n)

How to Implement Max Heap?

3. Set

Overview

Operation Complexities

Operations NOT Supported

Internal Implementation

High-Level Structure

What These Mean

Entry Structure

Hashing & Indexing

Collision Resolution (THIS IS KEY)

Lookup Operation (x in set)

Steps

Complexity

Insertion (set.add(x))

Load Factor Rules

Deletion (set.remove, discard)

Resizing (Rehashing)

Iteration

Important Consequences

4. Dict & DefaultDict

Overview

Dict vs Set

Internal Structure

The Compact Design

Operation Complexities

DefaultDict

Dict Views

5. List

Overview

Internal Implementation

Growth Strategy

Operation Complexities

Memory Considerations

Why Append is Fast

6. Deque

Overview

Internal Implementation

Why This Design?

Operation Complexities

Deque vs List

Bounded Deque

7. Hash Tables - Deep Dive

What a Hash Table Fundamentally Is

Hash Functions

From Hash to Table Index

Collision Resolution Strategies

A) Separate Chaining

B) Open Addressing (Python uses this)

Deletion - The Tricky Part

Load Factor & Resizing