Hash Collision in Java

1. What is Hash Collision?
A collision occurs when a hash function returns the same bucket location for two different keys.

What Happens in HashMap?

// Example of Hash Collision
HashMap<String, Integer> map = new HashMap<>();

// Assume these keys produce the same hash bucket location
map.put("Aa", 1);  // hashCode: X, bucket: 5
map.put("BB", 2);  // hashCode: X, bucket: 5  ← COLLISION!

// Both keys end up in bucket 5

Why Do Collisions Occur?

  1. Limited Bucket Space
  2. Poor hashCode() Implementation

// BAD hashCode() - Always returns same value
@Override
public int hashCode() {
    return 1; // All objects go to same bucket! ❌
}

// GOOD hashCode() - Well distributed
@Override
public int hashCode() {
    return Objects.hash(field1, field2, field3);}

2. How HashMap Works (Understanding the Basics)

Internal Structure

HashMap = Array of Buckets

HashMap Internal Structure:
┌─────────────────────────────────────┐
│ Bucket 0:  null                     │
├─────────────────────────────────────┤
│ Bucket 1:  [Key1, Value1]           │
├─────────────────────────────────────┤
│ Bucket 2:  [Key2, Value2] → [Key3, Value3]  ← Collision!
├─────────────────────────────────────┤
│ Bucket 3:  null                     │
├─────────────────────────────────────┤
│ ...                                 │
└─────────────────────────────────────┘

How HashMap Finds the Bucket

Step-by-Step Process:

// 1. Calculate hashCode
int hashCode = key.hashCode();

// 2. Apply HashMap's hash function (spread bits)
int hash = hash(hashCode);

// 3. Calculate bucket index
int bucketIndex = hash & (capacity - 1);  // Same as hash % capacity

// 4. Store/Retrieve from that bucket

Example:

HashMap<String, Integer> map = new HashMap<>();
map.put("John", 25);

// Internal process:
// "John".hashCode() = 2314539
// hash(2314539) = (some value after bit manipulation)
// bucketIndex = hash & 15 (assuming 16 buckets) = 7
// Store in bucket 7

What’s Inside a Bucket?

Before Java 8:

Bucket → LinkedList of Entries

After Java 8:

Bucket → LinkedList (small chains) OR Balanced Tree (large chains)

3. Impact of Hash Collision on Performance

Original Content:

Hash collisions have negative impact on the lookup time of HashMap.

When multiple keys end up in the same bucket, then values along with their keys are placed in a linked list.

In case of retrieval, linked list has to be traversed to get the entry. In worst case scenario, when all keys are mapped to the same bucket, the lookup time of HashMap increases from O(1) to O(n).

Detailed Explanation:

Best Case (No Collision)

HashMap<String, Integer> map = new HashMap<>();
map.put("Key1", 100);  // Goes to bucket 5
map.put("Key2", 200);  // Goes to bucket 8
map.put("Key3", 300);  // Goes to bucket 2

// Retrieval:
int value = map.get("Key2");
// Steps: Calculate hash → Go to bucket 8 → Get value
// Time: O(1) ✓

Performance: O(1) - Direct access, like array indexing


Worst Case (All items in one bucket)

// All these hash to bucket 0 (bad hashCode implementation)
map.put("Key1", 1);
map.put("Key2", 2);
map.put("Key3", 3);
// ... 1000 more keys

/*
Bucket 0: [Key1,1] → [Key2,2] → ... → [Key1000,1000]
All other buckets: empty
*/

// Retrieval:
int value = map.get("Key1000");
// Must traverse 1000 nodes!
// Time: O(1000) = O(n)

Performance: O(n) - HashMap becomes a linked list!


Performance Comparison Table

ScenarioStructureget()put()Example
No CollisionDirect bucket accessO(1)O(1)Perfect hash distribution
Few Collisions (< 8)Linked ListO(k)O(k)k = items in bucket
Many Collisions (≥ 8)Balanced Tree (Java 8+)O(log n)O(log n)n = items in bucket

4. How Hash Collision is Handled

HashMap handles collision by using a linked list to store map entries ended up in same array location or bucket location.

Detailed Explanation:

Separate Chaining (The Method HashMap Uses)

Concept: Each bucket stores a chain of entries (linked list or tree)

// Example: Multiple entries in same bucket
class Node<K,V> {
    final K key;
    V value;
    final int hash;
    Node<K,V> next;  // Points to next node in chain
}

Visual Representation:

HashMap:
┌─────────┬──────────────────────────────┐
│ Bucket 0│ → [K1,V1] → [K2,V2] → null   │
├─────────┼──────────────────────────────┤
│ Bucket 1│ → [K3,V3] → null             │
├─────────┼──────────────────────────────┤
│ Bucket 2│ → null                       │
├─────────┼──────────────────────────────┤
│ Bucket 3│ → [K4,V4] → [K5,V5] → [K6,V6] → null
└─────────┴──────────────────────────────┘

How put() Works with Collisions

HashMap<String, Integer> map = new HashMap<>();

// Step-by-step collision handling
map.put("Aa", 100);
/*
1. Calculate hash for "Aa" → hash = 2112
2. Find bucket: 2112 & 15 = 0
3. Bucket 0 is empty
4. Create new Node("Aa", 100)
5. Place in bucket 0
   Bucket 0: [Aa, 100] → null
*/

map.put("BB", 200);
/*
1. Calculate hash for "BB" → hash = 2112 (collision!)
2. Find bucket: 2112 & 15 = 0
3. Bucket 0 has [Aa, 100]
4. Check if "BB" equals "Aa"? No
5. Create new Node("BB", 200)
6. Add to chain
   Bucket 0: [Aa, 100] → [BB, 200] → null
*/

map.put("Aa", 999);  // Update existing
/*
1. Calculate hash for "Aa" → hash = 2112
2. Find bucket: 2112 & 15 = 0
3. Traverse chain: Find "Aa"
4. Update value: 100 → 999
   Bucket 0: [Aa, 999] → [BB, 200] → null
*/

How get() Works with Collisions

int value = map.get("BB");

/*
Step 1: Calculate hash
hash("BB") = 2112

Step 2: Find bucket
bucketIndex = 2112 & 15 = 0

Step 3: Traverse chain in bucket 0
- Compare first node: "BB".equals("Aa")? No
- Move to next node: "BB".equals("BB")? Yes!
- Return value: 200

Time: O(2) - traversed 2 nodes
*/

Code Example: Complete Collision Scenario

public class CollisionDemo {
    public static void main(String[] args) {
        HashMap<Key, String> map = new HashMap<>();

        // Create keys that will collide
        Key k1 = new Key("Key1");
        Key k2 = new Key("Key2");
        Key k3 = new Key("Key3");

        // All return same hashCode - guaranteed collision!
        System.out.println("k1 hash: " + k1.hashCode()); // 100
        System.out.println("k2 hash: " + k2.hashCode()); // 100
        System.out.println("k3 hash: " + k3.hashCode()); // 100

        // Put in HashMap - all go to same bucket
        map.put(k1, "Value1");
        map.put(k2, "Value2");
        map.put(k3, "Value3");

        // Retrieve - must traverse chain
        System.out.println("Get k2: " + map.get(k2)); // Value2

        // All entries are in same bucket, stored as linked list
        /*
        Internal structure:
        Bucket[X]: [k1,"Value1"] → [k2,"Value2"] → [k3,"Value3"] → null
        */
    }
}

class Key {
    private String key;

    public Key(String key) {
        this.key = key;
    }

    @Override
    public int hashCode() {
        // Bad implementation - always returns 100
        return 100;  // All keys collide! ❌
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj) return true;
        if (obj == null || getClass() != obj.getClass()) return false;
        Key other = (Key) obj;
        return key.equals(other.key);
    }
}

5. Java 8+ Improvements: Tree-based Optimization

From Java 8 onwards, HashMap, ConcurrentHashMap, and LinkedHashMap will use the balanced tree in place of linked list to handle frequently hash collisions.

The idea is to switch to the balanced tree once the number of items in a hash bucket grows beyond a certain threshold.

Above changes ensure performance of O(log(n)) in worst case scenarios (hash function is not distributing keys properly) and O(1) with proper hashCode().

The threshold of switching to the balanced tree is defined as TREEIFY_THRESHOLD constant in java.util.HashMap JDK 8 code. Currently, its values are 8, which means if there are more than 8 elements in the same bucket then HashMap will use a tree instead of a linked list to hold them in the same bucket.

Why This Improvement?

Problem in Java 7:

Bucket with 100 collisions:
[Entry1] → [Entry2] → [Entry3] → ... → [Entry100]
get() time: O(100) ❌ Very slow!

Solution in Java 8:

Bucket with 100 collisions:
       [Entry50]
       /      \
   [Entry25]  [Entry75]
   /    \      /    \
  ...   ...   ...   ...

get() time: O(log 100) = O(7) ✓ Much faster!

Treeification Process

Constants Defined in HashMap

// From java.util.HashMap source code
static final int TREEIFY_THRESHOLD = 8;      // Convert to tree
static final int UNTREEIFY_THRESHOLD = 6;    // Convert back to list
static final int MIN_TREEIFY_CAPACITY = 64;  // Minimum capacity for treeification

When Does Treeification Happen?

Conditions:

  1. Number of entries in bucket ≥ 8 (TREEIFY_THRESHOLD)
  2. AND HashMap capacity ≥ 64 (MIN_TREEIFY_CAPACITY)

If capacity < 64:

  • HashMap will resize (double capacity) instead of treeifying
  • Resizing redistributes entries, potentially resolving collisions
// Pseudo-code from HashMap
if (bucketSize >= TREEIFY_THRESHOLD) {
    if (capacity >= MIN_TREEIFY_CAPACITY) {
        // Convert to tree
        treeifyBucket();
    } else {
        // Resize instead
        resize();
    }
}

Example: Treeification in Action

public class TreeificationDemo {
    public static void main(String[] args) {
        HashMap<CollisionKey, String> map = new HashMap<>(64);

        // Add 10 entries that all hash to same bucket
        for (int i = 0; i < 10; i++) {
            map.put(new CollisionKey("Key" + i), "Value" + i);
        }

        /*
        Internal transformation:

        After 7 entries: Linked List
        Bucket[X]: [K0] → [K1] → [K2] → [K3] → [K4] → [K5] → [K6]

        After 8th entry: TREEIFY!
        Bucket[X]: Red-Black Tree
                      [K4]
                     /    \
                   [K2]    [K6]
                   / \      / \
                 [K1][K3] [K5][K7]
                 /              \
               [K0]              [K8]

        After 9th and 10th: Tree grows
        */

        // Get operation now takes O(log n) instead of O(n)
        String value = map.get(new CollisionKey("Key9"));
        // Tree traversal: much faster than list traversal!
    }
}

class CollisionKey {
    private String key;

    public CollisionKey(String key) {
        this.key = key;
    }

    @Override
    public int hashCode() {
        return 100; // Force collision
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj) return true;
        if (obj == null || getClass() != obj.getClass()) return false;
        return key.equals(((CollisionKey) obj).key);
    }
}

How Linked List is Replaced with Balanced Tree

Original Content:

In Java 8, HashMap replaces linked list with a balanced tree when the number of elements in a bucket reaches certain threshold. While converting the list to balanced tree, hashcode is used as a branching variable.

If there are two different hashcodes in the same bucket, one is considered bigger and goes to the right of the tree and other one to the left. But when both the hashcodes are equal, HashMap assumes that the keys are comparable, and compares the key to determine the direction so that some order can be maintained. It is a good practice to make the keys of HashMap comparable.

This JDK 8 change applies only to HashMap, LinkedHashMap and ConcurrentHashMap.


Tree Node Structure

// Simplified TreeNode from HashMap source
static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
    TreeNode<K,V> parent;
    TreeNode<K,V> left;
    TreeNode<K,V> right;
    TreeNode<K,V> prev;
    boolean red;  // Red-Black tree color
}

Ordering in Tree: The Decision Process

Step 1: Compare Hash Codes

if (hash1 != hash2) {
    if (hash1 < hash2) {
        // Go left
    } else {
        // Go right
    }
}

Step 2: If Hash Codes Equal, Try Comparable

else if (key instanceof Comparable) {
    int cmp = ((Comparable)key).compareTo(otherKey);
    // Use comparison result for tree direction
}

Step 3: If Not Comparable, Use System Identity

else {
    // Use System.identityHashCode() or other tie-breakers
}

Example: Comparable Keys

// GOOD: Comparable keys work well with tree structure
class Employee implements Comparable<Employee> {
    private int id;
    private String name;

    public Employee(int id, String name) {
        this.id = id;
        this.name = name;
    }

    @Override
    public int hashCode() {
        return 100; // Force collision for demo
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj) return true;
        if (obj == null || getClass() != obj.getClass()) return false;
        Employee other = (Employee) obj;
        return id == other.id;
    }

    @Override
    public int compareTo(Employee other) {
        return Integer.compare(this.id, other.id);
    }
}

// Usage
HashMap<Employee, String> map = new HashMap<>(64);
for (int i = 0; i < 10; i++) {
    map.put(new Employee(i, "Emp" + i), "Data" + i);
}

/*
Tree structure (ordered by id):
        [Emp4]
       /      \
    [Emp2]    [Emp7]
    /   \     /    \
  [Emp1][Emp3]...  ...
  /
[Emp0]
*/

Untreeification: Tree Back to List

When bucket size drops to 6 or below:

static final int UNTREEIFY_THRESHOLD = 6;

// If removals reduce bucket size to 6
// Tree → Linked List (simpler structure for small chains)

Why Convert Back?

  • Trees have overhead (more memory, complex operations)
  • For small chains, linked list is simpler and sufficient
  • Threshold of 6 (not 8) provides hysteresis to avoid frequent conversions

Performance Improvement Comparison

// Test with 1000 colliding keys

// Java 7 (Linked List):
Time to add 1000 entries: ~500 ms
Time to retrieve 1000 times: O(n) = O(1000) per get
Total: Slow!// Java 8+ (Red-Black Tree after 8th entry):
Time to add 1000 entries: ~50 ms
Time to retrieve 1000 times: O(log n) = O(10) per get
Total: 10x faster!

6. Load Factor and Performance Correlation

Original Content:

Correlation between performance, hash collision & loading factor.

  • Lower load factor = more free buckets = less chances of collision = high performance = high space requirement
  • Higher load factor = fewer free buckets = higher chance of collision = lower performance = lower space requirement

Understanding Load Factor

Formula:

Load Factor = Number of Entries / Bucket Capacity

Example:
16 entries in 16 buckets = Load Factor 1.0
8 entries in 16 buckets = Load Factor 0.5
12 entries in 16 buckets = Load Factor 0.75 (default)

Default Values in HashMap

// Default initial capacity
static final int DEFAULT_INITIAL_CAPACITY = 16;

// Default load factor
static final float DEFAULT_LOAD_FACTOR = 0.75f;

// When to resize?
// When: size > capacity * loadFactor
// Example: 16 * 0.75 = 12
// HashMap resizes when 13th entry is added

Impact of Different Load Factors

Load Factor 0.5 (Low)

HashMap<String, Integer> map = new HashMap<>(16, 0.5f);

// Resize trigger: 16 * 0.5 = 8 entries
// After 8 entries → resize to 32 buckets

Characteristics:

  • ✅ Fewer collisions (more empty buckets)
  • ✅ Better performance (less traversal)
  • ❌ More memory usage (many empty buckets)
  • ❌ Frequent resizing

Use Case: When performance is critical, memory is abundant


Load Factor 0.75 (Default - Balanced)

HashMap<String, Integer> map = new HashMap<>(); // Uses 0.75

// Resize trigger: 16 * 0.75 = 12 entries
// After 12 entries → resize to 32 buckets

Characteristics:

  • ✅ Good balance of time and space
  • ✅ Reasonable collision rate
  • ✅ Not too frequent resizing
  • ✅ Decent memory efficiency

Use Case: Most applications (recommended)


Load Factor 1.0 (High)

HashMap<String, Integer> map = new HashMap<>(16, 1.0f);

// Resize trigger: 16 * 1.0 = 16 entries
// After 16 entries → resize to 32 buckets

Characteristics:

  • ✅ Better memory utilization
  • ✅ Less frequent resizing
  • ❌ More collisions (buckets fuller)
  • ❌ Slower lookups

Use Case: When memory is limited, occasional slowness acceptable


Detailed Example: Load Factor Impact

public class LoadFactorDemo {
    public static void main(String[] args) {
        // Test with different load factors
        testLoadFactor(0.5f, "Low Load Factor");
        testLoadFactor(0.75f, "Default Load Factor");
        testLoadFactor(1.0f, "High Load Factor");
    }

    static void testLoadFactor(float loadFactor, String description) {
        System.out.println("\n=== " + description + " ===");
        HashMap<Integer, String> map = new HashMap<>(16, loadFactor);

        int resizeCount = 0;
        int previousCapacity = 16;

        for (int i = 0; i < 100; i++) {
            map.put(i, "Value" + i);

            // Check if resize happened (capacity changed)
            // Note: Can't directly access capacity, this is conceptual
            int threshold = (int) (16 * loadFactor);
            if (i == threshold) {
                resizeCount++;
                System.out.println("Resize at entry " + i);
                previousCapacity *= 2;
            }
        }

        System.out.println("Total resizes: " + resizeCount);
    }
}

/*
Output:
=== Low Load Factor (0.5) ===
Resize at entry 8
Resize at entry 16
Resize at entry 32
Resize at entry 64
Total resizes: 4
More resizes, less collisions

=== Default Load Factor (0.75) ===
Resize at entry 12
Resize at entry 24
Resize at entry 48
Resize at entry 96
Total resizes: 4
Balanced

=== High Load Factor (1.0) ===
Resize at entry 16
Resize at entry 32
Resize at entry 64
Total resizes: 3
Fewer resizes, more collisions
*/

Visual Comparison

Load Factor 0.5:
Buckets: [X][ ][ ][X][ ][ ][X][ ][ ][X][ ][ ][X][ ][ ][X]
Collision Probability: LOW ✓

Load Factor 0.75:
Buckets: [X][X][ ][X][X][ ][X][ ][X][X][X][ ][X][X][ ][X]
Collision Probability: MODERATE ✓

Load Factor 1.0:
Buckets: [X][X][X][X][X][X][X][X][X][X][X][X][X][X][X][X]
Collision Probability: HIGH ❌

7. Detailed Examples

Example 1: Perfect Hash Distribution (No Collision)

public class NoCollisionExample {
    public static void main(String[] args) {
        HashMap<Integer, String> map = new HashMap<>();

        // Integer keys with good hash distribution
        map.put(1, "One");
        map.put(2, "Two");
        map.put(3, "Three");
        map.put(4, "Four");

        /*
        Internal structure (assuming 16 buckets):
        Bucket 1: [1,"One"]
        Bucket 2: [2,"Two"]
        Bucket 3: [3,"Three"]
        Bucket 4: [4,"Four"]
        ... other buckets empty

        Performance:
        - put(): O(1)
        - get(): O(1)
        - Perfect scenario! ✓
        */

        long start = System.nanoTime();
        String value = map.get(3);
        long end = System.nanoTime();

        System.out.println("Value: " + value);
        System.out.println("Time: " + (end - start) + " ns");
        // Output: Very fast, direct bucket access
    }
}

Example 2: Collision with Linked List (< 8 entries)

public class LinkedListCollisionExample {
    public static void main(String[] args) {
        HashMap<BadHashKey, String> map = new HashMap<>();

        // Add 5 keys that collide (all hash to same bucket)
        for (int i = 0; i < 5; i++) {
            map.put(new BadHashKey("Key" + i), "Value" + i);
        }

        /*
        Internal structure:
        Bucket[X]: [Key0,Value0] → [Key1,Value1] → [Key2,Value2]
                   → [Key3,Value3] → [Key4,Value4] → null

        Linked List (because < 8 entries)

        Performance:
        - get(Key4): O(5) - must traverse 5 nodes
        - get(Key0): O(1) - first node
        */

        long start = System.nanoTime();
        String value = map.get(new BadHashKey("Key4"));
        long end = System.nanoTime();

        System.out.println("Value: " + value);
        System.out.println("Time: " + (end - start) + " ns");
        // Slower than no collision, must traverse list
    }
}

class BadHashKey {
    private String key;

    public BadHashKey(String key) {
        this.key = key;
    }

    @Override
    public int hashCode() {
        return 42; // All instances return same hash ❌
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj) return true;
        if (obj == null || getClass() != obj.getClass()) return false;
        return key.equals(((BadHashKey) obj).key);
    }
}

Example 3: Collision with Tree (≥ 8 entries, Java 8+)

public class TreeCollisionExample {
    public static void main(String[] args) {
        // Capacity must be ≥ 64 for treeification
        HashMap<ComparableKey, String> map = new HashMap<>(64);

        // Add 10 keys that collide
        for (int i = 0; i < 10; i++) {
            map.put(new ComparableKey(i), "Value" + i);
        }

        /*
        Internal structure after 8th entry:
        Bucket[X]: Red-Black Tree
                       [Key5]
                      /      \
                  [Key3]      [Key7]
                  /   \       /    \
              [Key1] [Key4] [Key6] [Key8]
              /   \                    \
          [Key0] [Key2]                [Key9]

        Performance:
        - get(Key9): O(log 10) = O(3.3) ≈ 4 comparisons
        - Much better than O(10) with linked list!
        */

        long start = System.nanoTime();
        String value = map.get(new ComparableKey(9));
        long end = System.nanoTime();

        System.out.println("Value: " + value);
        System.out.println("Time: " + (end - start) + " ns");
        // Faster than linked list for same collision count
    }
}

class ComparableKey implements Comparable<ComparableKey> {
    private int id;

    public ComparableKey(int id) {
        this.id = id;
    }

    @Override
    public int hashCode() {
        return 42; // Force collision
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj) return true;
        if (obj == null || getClass() != obj.getClass()) return false;
        return id == ((ComparableKey) obj).id;
    }

    @Override
    public int compareTo(ComparableKey other) {
        return Integer.compare(this.id, other.id); // Tree ordering
    }
}

Example 4: Real-World Collision Scenario

public class RealWorldExample {
    public static void main(String[] args) {
        // Employee database
        HashMap<String, Employee> employeeMap = new HashMap<>();

        // Names might collide due to hash function
        employeeMap.put("John Smith", new Employee(101, "Engineering"));
        employeeMap.put("John Smyth", new Employee(102, "Marketing"));
        employeeMap.put("Jon Smith", new Employee(103, "Sales"));

        // Check hash codes
        System.out.println("John Smith hash: " + "John Smith".hashCode());
        System.out.println("John Smyth hash: " + "John Smyth".hashCode());
        System.out.println("Jon Smith hash: " + "Jon Smith".hashCode());

        /*
        If these hash to same bucket:
        Bucket[X]: ["John Smith", Emp101] → ["John Smyth", Emp102]
                   → ["Jon Smith", Emp103]

        When retrieving "Jon Smith":
        1. Calculate hash
        2. Go to bucket
        3. Compare "Jon Smith".equals("John Smith")? No
        4. Compare "Jon Smith".equals("John Smyth")? No
        5. Compare "Jon Smith".equals("Jon Smith")? Yes!
        6. Return Emp103
        */

        Employee emp = employeeMap.get("Jon Smith");
        System.out.println("Retrieved: " + emp);
    }
}

class Employee {
    int id;
    String department;

    public Employee(int id, String department) {
        this.id = id;
        this.department = department;
    }

    @Override
    public String toString() {
        return "Employee[id=" + id + ", dept=" + department + "]";
    }
}

Example 5: Performance Comparison Test

public class PerformanceTest {
    public static void main(String[] args) {
        // Test 1: Good hashCode (minimal collisions)
        testPerformance("Good HashCode", new GoodKey());

        // Test 2: Bad hashCode (maximum collisions)
        testPerformance("Bad HashCode", new BadKey());
    }

    static void testPerformance(String description, Object sampleKey) {
        System.out.println("\n=== " + description + " ===");
        HashMap<Object, String> map = new HashMap<>(64);

        // Add 1000 entries
        long startPut = System.currentTimeMillis();
        for (int i = 0; i < 1000; i++) {
            Object key = createKey(sampleKey, i);
            map.put(key, "Value" + i);
        }
        long endPut = System.currentTimeMillis();

        // Retrieve 1000 entries
        long startGet = System.currentTimeMillis();
        for (int i = 0; i < 1000; i++) {
            Object key = createKey(sampleKey, i);
            map.get(key);
        }
        long endGet = System.currentTimeMillis();

        System.out.println("Put time: " + (endPut - startPut) + " ms");
        System.out.println("Get time: " + (endGet - startGet) + " ms");
    }

    static Object createKey(Object sample, int id) {
        if (sample instanceof GoodKey) {
            return new GoodKey(id);
        } else {
            return new BadKey(id);
        }
    }
}

class GoodKey {
    int id;

    GoodKey() {}
    GoodKey(int id) { this.id = id; }

    @Override
    public int hashCode() {
        return Objects.hash(id); // Good distribution ✓
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj) return true;
        if (obj == null || getClass() != obj.getClass()) return false;
        return id == ((GoodKey) obj).id;
    }
}

class BadKey {
    int id;

    BadKey() {}
    BadKey(int id) { this.id = id; }

    @Override
    public int hashCode() {
        return 1; // All collide! ❌
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj) return true;
        if (obj == null || getClass() != obj.getClass()) return false;
        return id == ((BadKey) obj).id;
    }
}

/*
Expected Output:
=== Good HashCode ===
Put time: 5 ms
Get time: 2 ms

=== Bad HashCode ===
Put time: 150 ms (tree conversion overhead)
Get time: 50 ms (tree traversal)
*/

8. Best Practices

1. Implement Good hashCode()

Rules for Good hashCode():

✅ Use all significant fields

@Override
public int hashCode() {
    return Objects.hash(field1, field2, field3); // ✓
}

❌ Don’t use constant

@Override
public int hashCode() {
    return 1; // ❌ All objects collide!
}

✅ Use prime numbers for manual implementation

@Override
public int hashCode() {
    int result = 17;
    result = 31 * result + field1.hashCode();
    result = 31 * result + field2.hashCode();
    return result; // ✓ Good distribution
}

2. Implement equals() Correctly

Contract:

  • If a.equals(b) then a.hashCode() == b.hashCode()
  • If a.hashCode() == b.hashCode() then a.equals(b) may or may not be true
class Person {
    String name;
    int age;

    @Override
    public boolean equals(Object obj) {
        if (this == obj) return true;
        if (obj == null || getClass() != obj.getClass()) return false;
        Person other = (Person) obj;
        return age == other.age &&
               Objects.equals(name, other.name);
    }

    @Override
    public int hashCode() {
        return Objects.hash(name, age);
    }
}

3. Make Keys Immutable

Why?

  • If key changes after insertion, its hash changes
  • HashMap can’t find it anymore!
// BAD: Mutable key
class MutableKey {
    String value;

    public void setValue(String value) {
        this.value = value; // ❌ Can change!
    }
}

HashMap<MutableKey, String> map = new HashMap<>();
MutableKey key = new MutableKey("original");
map.put(key, "data");

key.setValue("modified"); // Hash changes!
map.get(key); // Returns null! Lost data ❌

// GOOD: Immutable key
final class ImmutableKey {
    private final String value;

    public ImmutableKey(String value) {
        this.value = value;
    }

    // No setters - cannot change ✓
}

4. Implement Comparable for Custom Keys

Helps tree performance:

class Employee implements Comparable<Employee> {
    private int id;

    @Override
    public int compareTo(Employee other) {
        return Integer.compare(this.id, other.id);
    }

    @Override
    public int hashCode() {
        return Objects.hash(id);
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj) return true;
        if (obj == null || getClass() != obj.getClass()) return false;
        return id == ((Employee) obj).id;
    }
}

5. Choose Appropriate Initial Capacity

If you know size in advance:

// Will hold ~1000 entries
int expectedSize = 1000;
int capacity = (int) (expectedSize / 0.75) + 1; // Account for load factor

HashMap<String, String> map = new HashMap<>(capacity);
// Prevents multiple resizing operations ✓

6. Monitor Collision Rate (Advanced)

Custom wrapper to track collisions:

class CollisionTrackingMap<K, V> extends HashMap<K, V> {
    private int collisionCount = 0;

    @Override
    public V put(K key, V value) {
        int hash = hash(key);
        int bucket = hash & (capacity() - 1);

        // Check if bucket already has entry
        if (bucketHasEntry(bucket)) {
            collisionCount++;
            System.out.println("Collision detected! Total: " + collisionCount);
        }

        return super.put(key, value);
    }

    public int getCollisionCount() {
        return collisionCount;
    }
}

9. Common Interview Questions

Q1: What is hash collision?

Answer: A collision occurs when two different keys produce the same hash code and map to the same bucket in a HashMap.


Q2: How does HashMap handle collisions?

Answer:

  • Before Java 8: Uses linked list (separate chaining)
  • Java 8+: Uses linked list for < 8 entries, converts to balanced tree for ≥ 8 entries

Q3: What is the time complexity of HashMap operations?

Answer:

  • Best case: O(1) - no collision
  • Average case: O(1) - few collisions
  • Worst case (Java 7): O(n) - all keys in one bucket (linked list)
  • Worst case (Java 8+): O(log n) - all keys in one bucket (tree)

Q4: What is TREEIFY_THRESHOLD?

Answer: It’s a constant (value = 8) that defines when HashMap converts a linked list in a bucket to a balanced tree. When a bucket has 8 or more entries AND HashMap capacity ≥ 64, treeification occurs.


Q5: Why is load factor 0.75 default?

Answer: It provides a good balance between:

  • Time efficiency (not too many collisions)
  • Space efficiency (not too much wasted space)
  • Resizing frequency (not too often)

Q6: Can collision be completely avoided?

Answer: No. Due to the Pigeonhole Principle (infinite possible keys, finite buckets), collisions are inevitable. We can only minimize them with good hash functions.


Q7: What happens if hashCode() returns same value for all objects?

Answer: All objects go to the same bucket, creating maximum collision. Performance degrades to O(n) for linked list or O(log n) for tree (Java 8+).


Q8: Why implement Comparable for HashMap keys?

Answer: When collision chains convert to trees (Java 8+), Comparable helps establish ordering in the tree, improving performance and reliability.


Q9: What is the difference between HashMap and HashTable regarding collisions?

Answer: Both handle collisions with chaining. Main differences:

  • HashTable is synchronized (thread-safe)
  • HashMap allows null keys/values
  • HashMap uses tree optimization (Java 8+)
  • HashTable doesn’t have tree optimization

Q10: How to reduce collisions?

Answer:

  1. Implement good hashCode() using all significant fields
  2. Use proper initial capacity to avoid frequent resizing
  3. Choose lower load factor if memory permits
  4. Make keys immutable
  5. Implement Comparable for custom keys

Q10: How Hash Collision is handled?

Answer:

  • HashMap handles collision by using a linked list to store map entries ended up in same array location or bucket location.
  • From Java 8 onwards, HashMap, ConcurrentHashMap, and LinkedHashMap will use the balanced tree in place of linked list to handle frequently hash collisions. The idea is to switch to the balanced tree once the number of items in a hash bucket grows beyond a certain threshold. 
  • Above changes ensure performance of O(log(n)) in worst case scenarios (hash function is not distributing keys properly) and O(1) with proper hashCode().
  • The threshold of switching to the balanced tree is defined as TREEIFY_THRESHOLD constant in java.util.HashMap JDK 8 code.  Currently, its values are 8, which means if there are more than 8 elements in the same bucket then HashMap will use a tree instead of a linked list to hold them in the same bucket. 
  • Correlation between performance, hash collision & loading factor. Lower load factor = more free buckets = less chances of collision = high performance = high space requirement. Higher load factor = fewer free buckets = higher chance of collision = lower performance = lower space requirement

Summary Table: Complete Overview

AspectPre-Java 8Java 8+
Collision HandlingLinked ListLinked List → Tree (when ≥ 8 entries)
Best CaseO(1)O(1)
Worst CaseO(n)O(log n)
Treeify ThresholdN/A8 entries
Untreeify ThresholdN/A6 entries
Min Capacity for TreeN/A64
Default Load Factor0.750.75
Default Capacity1616

Key Takeaways

What You Must Remember:

  1. Collision is inevitable - finite buckets, infinite keys
  2. Good hashCode() is crucial - determines distribution quality
  3. Java 8+ uses trees - improves worst case from O(n) to O(log n)
  4. Load factor matters - 0.75 balances time and space
  5. Keys should be immutable - prevents lost entries
  6. Comparable helps trees - better tree performance in Java 8+

Performance Tips:

  • Use Objects.hash() for hashCode implementation
  • Set initial capacity if size is known
  • Make custom keys Comparable
  • Monitor collision rate in production
  • Consider alternative data structures for specific needs

Post a Comment

Previous Post Next Post