What is HashMap?
Basically, HashMap is one of the most popular Collection classes in java. HashMap implementation is based on the hash table data structure. This HashMap class extends AbstractMap class that implements the Map interface.
Few important points about
HashMap
:HashMap
uses its static inner classNode<K,V>
for storing the entries in the map.HashMap
allows at most onenull
key and multiplenull
values.- The
HashMap
class does not preserve the order of insertion of entries into the map. HashMap
has multiple buckets or bins which contain a head reference to a singly linked list. That means there would be as many linked lists as there are buckets. Initially, it has a bucket size of 16 which grows to 32 when the number of entries on the map crosses the 75%. (That means after inserting in 12 buckets bucket size becomes 32)HashMap
is almost similar to Hashtable except that it’sunsynchronized
and allows at max one null key and multiple null values.HashMap
useshashCode()
andequals()
methods on keys for theget
andput
operations. So HashMap key objects should provide a good implementation of these methods.- That’s why the
Wrapper
classes likeInteger
andString
classes are a good choice for keys for HashMap as they are immutable and their object state won’t change over the course of the execution of the program.
How to Initialize the Constructor with Our Own Capacity and Load factor?
There are mostly 4 different constructors that can be used while creating the hashmap.
These are as follows :
Map<String, Long> myPhoneBook = new HashMap<>(); //Bucket size 16 and LD=0.75
Map<String, Long> myPhoneBook = new HashMap<>(64); //Bucket size 64 and LD=0.75
Map<String, Long> myPhoneBook = new HashMap<>(128, 0.90f); //Bucket size 16 and LD=0.9
Map<String, Long> myPhoneBook = new HashMap<>(parentsPhoneBook); //Bucket copy the parentsPhoneBook to myPhoneBook
Frequently Used Hashmap Methods:
public boolean containsKey(Object key)
: Returnstrue
if this map contains a mapping for the specified key.public boolean containsValue(Object value)
: Returns true if this map maps one or more keys to the specified value.public V get(Object key)
: Returns the value to which the specified key is mapped, or null if this map contains no mapping for the key.public V put(K key, V value)
: Associates the specified value with the specified key in this map (optional operation). If the map previously contained a mapping for the key, the old value is replaced by the specified value.public void putAll(Map<? extends K, ? extends V> m)
: Copies all of the mappings from the specified map to this map. These mappings will replace any mappings that this map had for any of the keys currently in the specified map.public V remove(Object key)
: Removes the mapping for a key from this map if it is present (optional operation).public boolean isEmpty()
: A utility method returning true if no key-value mappings are present in the map.public int size()
: Returns the number of key-value mappings in this map. If the map contains more thanInteger.MAX_VALUE
elements returnInteger.MAX_VALUE
.public Set<K> keySet()
: Returns a Set view of the keys contained in this map. The set is backed by the map, so changes to the map are reflected in the set, and vice-versa.public Set<Map.Entry<K,V>> entrySet()
: This method returns a Set view of the HashMap mappings. This set is backed by the map, so changes to the map are reflected in the set, and vice-versa.
Let’s See an Example of Hashmap and Its Functions:
Let’s say we want to have a mapping of all the contacts of our friends and relatives. We’ll have a
myPhoneBook
map of a name (String type) as a key and a number(Long) as a value.Output:
— — — — -: Contacts in my Phone List : — — — — —
Name : John Number : 8923535110
Name : Vikram Number : 8149535110
Name : Mark Number : 7080535110
— — — — — — — — — — — — — — — — — — — — — — — —
No of contacts in myPhoneBook : 3
Contact number of Vikram : 8149535110
Delete Mark’s contact : 7080535110
Now Let’s Look at the Internal Working Of the HashMap:
- HashMap uses its static inner class
Node<K,V>
for storing map entries. That means each entry in hashMap is aNode
. Internally HashMap uses a hashCode of the key Object and this hashCode is further used by the hash function to find the index of the bucket where the new entry can be added. - HashMap uses multiple buckets and each bucket points to a
Singly Linked List
where the entries (nodes) are stored. - Once the bucket is identified by the hash function using hashcode, then hashCode is used to check if there is already a key with the same hashCode or not in the bucket(singly linked list).
- If there already exists a key with the same hashCode, then the
equals()
method is used on thekeys
. If the equals method returns true, that means there is already a node with the same key and hence the value against that key is overwritten in the entry(node), otherwise, a new node is created and added to this Singly Linked List of that bucket. - If there is no key with the same hashCode in the bucket found by the hash function then the new Node is added to the bucket found.
Each Node Has the Following Structure:
final int hash;
final K key;
V value;
Node<K,V> next;
In the above example, we assume that all these five keys have the same hashCode, then the bucket number or index returned by the hash function would be the same for these five keys (in this case bucket 4) and hence they are put into the same bucket. Later the keys are compared using the equals() method to check whether the key is already present or not. But we assumed these five keys are having different key content and hence 5 nodes are there in the list.
To have a high-performance hashMap we need good implementation of hashCode() and equals() method along with hash function.
A Very Important Note That You Must Know:
Before java 8, singly-linked lists were used for storing the nodes. But this implementation has changed to self-balancing BST after a thresold is crossed (static final int TREEIFY_THRESHOLD = 8;).
The motive behind this change is that HashMap buckets normally use linked lists, but for the linked lists the worst-case time is O(n) for lookup. Also note that Ordinary binary search trees have pathological cases where they become O(n) [basically BST becomes skewed], but red-black/AVL trees are specifically designed to prevent these cases.
In a HashMap with linked lists, if we have a really really awful hash function, we could end up with all the items hashing to the same bucket and get O(n) lookup, But it seems like with this red-black/AVL tree scheme, even if all the items hashed into the same bucket, we would get O(logn) lookup in worst of worst scenario.
Let’s See This Mechanism Diagrammatically:
For this example, I have assumed the threshold value as 5 but ideally, it should be by default 8.
In this example, we assume all five keys key1 to key5 have the same hashCode values and hence hash function returns the same bucket number which is 4. Also
equals
method on all these five keys returns false which means the object content of these keys is different and hence 5 nodes are created and added to the tree.Re-Hashing :
Whenever the number of entries in the hashmap crosses the
threshold value
then the bucket size of the hashmap is doubled and rehashing is performed and all already existing entries(nodes) of the map are copied and new entries are added to this increased hashmap.Threshold value = Bucket size * Load factor
Eg. If the bucket size is 16 and the load factor is 0.75 then the threshold value is 12.
Time Complexity:
- In a fairly distributed hashMap where the entries go to all the buckets in such a scenario, the hashMap has
O(1)
time for search, insertion, and deletion operations. - In the worst case, where all the entries go to the same bucket and the singly linked list stores these entries,
O (n)
time is required for operations like search, insert, and delete. - In a case where the threshold for converting this linked list to a self-balancing binary search tree(i.e. AVL/Red black) is used then for the operations, search, insert and delete
O(log(n))
is required as AVL/Red Black tree has a max length oflog(n)
in the worst case.
Why Hashmap Should Not Be Used for Multi-threaded Environments?
ConcurrentModificationException
Output:
Exception in thread “main” java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1445)
at java.util.HashMap$KeyIterator.next(HashMap.java:1469)
at ExceptionExample.main(ExceptionExample.java:16)
When a new entry got inserted into the HashMap, the Iterator for the keys is failing. Actually, the Iterator on Collection objects is
fail-fast
i.e any modification in the structure or modification of its internal structure (e.g., rehash) or the number of entries in the collection object(in our case adding an item to the phoneBook map) will trigger the exception. And hence the exception ConcurrentModificationException
is thrown. Basically, HashMap contains a variable to count the number of modifications (modCount
) and the iterator uses it when you call its next()
function to get the next entry.Note: keyIterator is a reference to phoneBook map
Now Let’s Understand a Challenging Question :
Question: If we’re iterating over a map, why would
phoneBook.remove()
cause a ConcurrentModificationException
to be thrown whereas keyIterator.remove()
wouldn’t?Now consider the below example:
If we uncomment line #1, it will work fine. But if we uncomment line #2 (but leave #1 commented) then it will cause the subsequent call to keyIterator.next() to throw ConcurrentModificationException.
Why did it happen?
The reason is that an iterator is a separate object that has some references to the internal state of the underlying map object. If we modify the map while the iterator is in operation, it could cause the iterator to behave badly, e.g. by skipping elements, repeating elements, indexing off the end of the array, etc. It attempts to detect such modifications and so it throws ConcurrentModificationException .
Another point to note is that removing elements through the iterator works and does not cause exceptions. This updates the underlying map and the iterator’s state which refers to the internals of the map, so everything can stay consistent.
However, there is nothing special about
keyIterator.remove()
that makes it work in all cases. If multiple iterators are iterating over the same map, modifications made by one will cause problems for the others.Consider this example:
Iterator<String> keyIterator1 = phoneBook.keySet().iterator();
Iterator<String> keyIterator2 = phoneBook.keySet().iterator();
keyIterator1.remove();
keyIterator2.remove();
We now have two iterators pointing to the same map. If we modify the map using one of them, it disrupts the operation of the second, so the call to
keyIterator2.remove();
will result in ConcurrentModificationException
.The above scenario can be overcome using
ConcurrentHashMap
and hence it’s advised not to use hashmap for multi-threading applications.Refer to this article for a detailed understanding:
Comparing HashMap and ConcurrentHashMap in Java
I see many readers have posted one common question that might have confused them and I want to answer that question:
Does HashMap internally use Hashtable?
The
HashMap
uses a hash table (note this is not the Hashtable
class that is there in java.util
package). Basically, the hash table is a name for a general way of writing a certain kind of data structure. You can write your own hash table; hash tables are just a general data structure design, just like you can have linked lists that aren't implemented using LinkedList
.(Note
HashSet
is actually implemented using HashMap
)Also, it is important to note that
Hashtable
is no longer recommended to use as HashMap
does a better job than Hashtable
and for multi-threaded environments, ConcurrentHashMap
can be used instead of Hashtable
.
Post a Comment