1. What is Hash Collision?
A collision occurs when a hash function returns the same bucket location for two different keys.
What Happens in HashMap?
Why Do Collisions Occur?
- Limited Bucket Space
- Poor hashCode() Implementation
2. How HashMap Works (Understanding the Basics)
Internal Structure
HashMap = Array of Buckets
HashMap Internal Structure:
┌─────────────────────────────────────┐
│ Bucket 0: null │
├─────────────────────────────────────┤
│ Bucket 1: [Key1, Value1] │
├─────────────────────────────────────┤
│ Bucket 2: [Key2, Value2] → [Key3, Value3] ← Collision!
├─────────────────────────────────────┤
│ Bucket 3: null │
├─────────────────────────────────────┤
│ ... │
└─────────────────────────────────────┘
How HashMap Finds the Bucket
Step-by-Step Process:
Example:
What’s Inside a Bucket?
Before Java 8:
Bucket → LinkedList of Entries
After Java 8:
Bucket → LinkedList (small chains) OR Balanced Tree (large chains)
3. Impact of Hash Collision on Performance
Original Content:
Hash collisions have negative impact on the lookup time of HashMap.
When multiple keys end up in the same bucket, then values along with their keys are placed in a linked list.
In case of retrieval, linked list has to be traversed to get the entry. In worst case scenario, when all keys are mapped to the same bucket, the lookup time of HashMap increases from O(1) to O(n).
Detailed Explanation:
Best Case (No Collision)
Performance: O(1) - Direct access, like array indexing
Worst Case (All items in one bucket)
Performance: O(n) - HashMap becomes a linked list!
Performance Comparison Table
| Scenario | Structure | get() | put() | Example |
|---|---|---|---|---|
| No Collision | Direct bucket access | O(1) | O(1) | Perfect hash distribution |
| Few Collisions (< 8) | Linked List | O(k) | O(k) | k = items in bucket |
| Many Collisions (≥ 8) | Balanced Tree (Java 8+) | O(log n) | O(log n) | n = items in bucket |
4. How Hash Collision is Handled
HashMap handles collision by using a linked list to store map entries ended up in same array location or bucket location.
Detailed Explanation:
Separate Chaining (The Method HashMap Uses)
Concept: Each bucket stores a chain of entries (linked list or tree)
Visual Representation:
HashMap:
┌─────────┬──────────────────────────────┐
│ Bucket 0│ → [K1,V1] → [K2,V2] → null │
├─────────┼──────────────────────────────┤
│ Bucket 1│ → [K3,V3] → null │
├─────────┼──────────────────────────────┤
│ Bucket 2│ → null │
├─────────┼──────────────────────────────┤
│ Bucket 3│ → [K4,V4] → [K5,V5] → [K6,V6] → null
└─────────┴──────────────────────────────┘
How put() Works with Collisions
How get() Works with Collisions
Code Example: Complete Collision Scenario
5. Java 8+ Improvements: Tree-based Optimization
From Java 8 onwards, HashMap, ConcurrentHashMap, and LinkedHashMap will use the balanced tree in place of linked list to handle frequently hash collisions.
The idea is to switch to the balanced tree once the number of items in a hash bucket grows beyond a certain threshold.
Above changes ensure performance of O(log(n)) in worst case scenarios (hash function is not distributing keys properly) and O(1) with proper hashCode().
The threshold of switching to the balanced tree is defined as TREEIFY_THRESHOLD constant in java.util.HashMap JDK 8 code. Currently, its values are 8, which means if there are more than 8 elements in the same bucket then HashMap will use a tree instead of a linked list to hold them in the same bucket.
Why This Improvement?
Problem in Java 7:
Bucket with 100 collisions:
[Entry1] → [Entry2] → [Entry3] → ... → [Entry100]
get() time: O(100) ❌ Very slow!
Solution in Java 8:
Bucket with 100 collisions:
[Entry50]
/ \
[Entry25] [Entry75]
/ \ / \
... ... ... ...
get() time: O(log 100) = O(7) ✓ Much faster!
Treeification Process
Constants Defined in HashMap
When Does Treeification Happen?
Conditions:
- Number of entries in bucket ≥ 8 (TREEIFY_THRESHOLD)
- AND HashMap capacity ≥ 64 (MIN_TREEIFY_CAPACITY)
If capacity < 64:
- HashMap will resize (double capacity) instead of treeifying
- Resizing redistributes entries, potentially resolving collisions
Example: Treeification in Action
How Linked List is Replaced with Balanced Tree
Original Content:
In Java 8, HashMap replaces linked list with a balanced tree when the number of elements in a bucket reaches certain threshold. While converting the list to balanced tree, hashcode is used as a branching variable.
If there are two different hashcodes in the same bucket, one is considered bigger and goes to the right of the tree and other one to the left. But when both the hashcodes are equal, HashMap assumes that the keys are comparable, and compares the key to determine the direction so that some order can be maintained. It is a good practice to make the keys of HashMap comparable.
This JDK 8 change applies only to HashMap, LinkedHashMap and ConcurrentHashMap.
Tree Node Structure
Ordering in Tree: The Decision Process
Step 1: Compare Hash Codes
Step 2: If Hash Codes Equal, Try Comparable
Step 3: If Not Comparable, Use System Identity
Example: Comparable Keys
Untreeification: Tree Back to List
When bucket size drops to 6 or below:
Why Convert Back?
- Trees have overhead (more memory, complex operations)
- For small chains, linked list is simpler and sufficient
- Threshold of 6 (not 8) provides hysteresis to avoid frequent conversions
Performance Improvement Comparison
6. Load Factor and Performance Correlation
Original Content:
Correlation between performance, hash collision & loading factor.
- Lower load factor = more free buckets = less chances of collision = high performance = high space requirement
- Higher load factor = fewer free buckets = higher chance of collision = lower performance = lower space requirement
Understanding Load Factor
Formula:
Load Factor = Number of Entries / Bucket Capacity
Example:
16 entries in 16 buckets = Load Factor 1.0
8 entries in 16 buckets = Load Factor 0.5
12 entries in 16 buckets = Load Factor 0.75 (default)
Default Values in HashMap
Impact of Different Load Factors
Load Factor 0.5 (Low)
Characteristics:
- ✅ Fewer collisions (more empty buckets)
- ✅ Better performance (less traversal)
- ❌ More memory usage (many empty buckets)
- ❌ Frequent resizing
Use Case: When performance is critical, memory is abundant
Load Factor 0.75 (Default - Balanced)
Characteristics:
- ✅ Good balance of time and space
- ✅ Reasonable collision rate
- ✅ Not too frequent resizing
- ✅ Decent memory efficiency
Use Case: Most applications (recommended)
Load Factor 1.0 (High)
Characteristics:
- ✅ Better memory utilization
- ✅ Less frequent resizing
- ❌ More collisions (buckets fuller)
- ❌ Slower lookups
Use Case: When memory is limited, occasional slowness acceptable
Detailed Example: Load Factor Impact
Visual Comparison
Load Factor 0.5:
Buckets: [X][ ][ ][X][ ][ ][X][ ][ ][X][ ][ ][X][ ][ ][X]
Collision Probability: LOW ✓
Load Factor 0.75:
Buckets: [X][X][ ][X][X][ ][X][ ][X][X][X][ ][X][X][ ][X]
Collision Probability: MODERATE ✓
Load Factor 1.0:
Buckets: [X][X][X][X][X][X][X][X][X][X][X][X][X][X][X][X]
Collision Probability: HIGH ❌
7. Detailed Examples
Example 1: Perfect Hash Distribution (No Collision)
Example 2: Collision with Linked List (< 8 entries)
Example 3: Collision with Tree (≥ 8 entries, Java 8+)
Example 4: Real-World Collision Scenario
Example 5: Performance Comparison Test
8. Best Practices
1. Implement Good hashCode()
Rules for Good hashCode():
✅ Use all significant fields
❌ Don’t use constant
✅ Use prime numbers for manual implementation
2. Implement equals() Correctly
Contract:
- If
a.equals(b)thena.hashCode() == b.hashCode() - If
a.hashCode() == b.hashCode()thena.equals(b)may or may not be true
3. Make Keys Immutable
Why?
- If key changes after insertion, its hash changes
- HashMap can’t find it anymore!
4. Implement Comparable for Custom Keys
Helps tree performance:
5. Choose Appropriate Initial Capacity
If you know size in advance:
6. Monitor Collision Rate (Advanced)
Custom wrapper to track collisions:
9. Common Interview Questions
Q1: What is hash collision?
Answer: A collision occurs when two different keys produce the same hash code and map to the same bucket in a HashMap.
Q2: How does HashMap handle collisions?
Answer:
- Before Java 8: Uses linked list (separate chaining)
- Java 8+: Uses linked list for < 8 entries, converts to balanced tree for ≥ 8 entries
Q3: What is the time complexity of HashMap operations?
Answer:
- Best case: O(1) - no collision
- Average case: O(1) - few collisions
- Worst case (Java 7): O(n) - all keys in one bucket (linked list)
- Worst case (Java 8+): O(log n) - all keys in one bucket (tree)
Q4: What is TREEIFY_THRESHOLD?
Answer: It’s a constant (value = 8) that defines when HashMap converts a linked list in a bucket to a balanced tree. When a bucket has 8 or more entries AND HashMap capacity ≥ 64, treeification occurs.
Q5: Why is load factor 0.75 default?
Answer: It provides a good balance between:
- Time efficiency (not too many collisions)
- Space efficiency (not too much wasted space)
- Resizing frequency (not too often)
Q6: Can collision be completely avoided?
Answer: No. Due to the Pigeonhole Principle (infinite possible keys, finite buckets), collisions are inevitable. We can only minimize them with good hash functions.
Q7: What happens if hashCode() returns same value for all objects?
Answer: All objects go to the same bucket, creating maximum collision. Performance degrades to O(n) for linked list or O(log n) for tree (Java 8+).
Q8: Why implement Comparable for HashMap keys?
Answer: When collision chains convert to trees (Java 8+), Comparable helps establish ordering in the tree, improving performance and reliability.
Q9: What is the difference between HashMap and HashTable regarding collisions?
Answer: Both handle collisions with chaining. Main differences:
- HashTable is synchronized (thread-safe)
- HashMap allows null keys/values
- HashMap uses tree optimization (Java 8+)
- HashTable doesn’t have tree optimization
Q10: How to reduce collisions?
Answer:
- Implement good
hashCode()using all significant fields - Use proper initial capacity to avoid frequent resizing
- Choose lower load factor if memory permits
- Make keys immutable
- Implement Comparable for custom keys
Q10: How Hash Collision is handled?
Answer:
- HashMap handles collision by using a linked list to store map entries ended up in same array location or bucket location.
- From Java 8 onwards, HashMap, ConcurrentHashMap, and LinkedHashMap will use the balanced tree in place of linked list to handle frequently hash collisions. The idea is to switch to the balanced tree once the number of items in a hash bucket grows beyond a certain threshold.
- Above changes ensure performance of O(log(n)) in worst case scenarios (hash function is not distributing keys properly) and O(1) with proper hashCode().
- The threshold of switching to the balanced tree is defined as TREEIFY_THRESHOLD constant in java.util.HashMap JDK 8 code. Currently, its values are 8, which means if there are more than 8 elements in the same bucket then HashMap will use a tree instead of a linked list to hold them in the same bucket.
- Correlation between performance, hash collision & loading factor. Lower load factor = more free buckets = less chances of collision = high performance = high space requirement. Higher load factor = fewer free buckets = higher chance of collision = lower performance = lower space requirement
Summary Table: Complete Overview
| Aspect | Pre-Java 8 | Java 8+ |
|---|---|---|
| Collision Handling | Linked List | Linked List → Tree (when ≥ 8 entries) |
| Best Case | O(1) | O(1) |
| Worst Case | O(n) | O(log n) |
| Treeify Threshold | N/A | 8 entries |
| Untreeify Threshold | N/A | 6 entries |
| Min Capacity for Tree | N/A | 64 |
| Default Load Factor | 0.75 | 0.75 |
| Default Capacity | 16 | 16 |
Key Takeaways
What You Must Remember:
- Collision is inevitable - finite buckets, infinite keys
- Good hashCode() is crucial - determines distribution quality
- Java 8+ uses trees - improves worst case from O(n) to O(log n)
- Load factor matters - 0.75 balances time and space
- Keys should be immutable - prevents lost entries
- Comparable helps trees - better tree performance in Java 8+
Performance Tips:
- Use
Objects.hash()for hashCode implementation - Set initial capacity if size is known
- Make custom keys Comparable
- Monitor collision rate in production
- Consider alternative data structures for specific needs
Post a Comment