Is it better to use Hashmap with initial capacity?

Abhishek Kumar
4 min readAug 30, 2023

In Kotlin and Java, you can instantiate a Hashmap with or without an initial capacity. But is one of these ways better than the other? Let’s find out.

What happens when you don’t specify an initial capacity?

A HashMap internally uses an array of “buckets” — each bucket may contain more than one node. The basic structure of a HashMap is as follows:

The default size of this array of buckets is typically 16. (This may vary between different Java/Kotlin versions and implementations). Now let’s say we have a great hashing algorithm which distributes an equal number of elements to all the buckets.

Say we have 16 elements then each bucket will have 1 node, the search for any element would take just 1 lookup. If we have 32 elements then each bucket will have 2 nodes, the search for any element will take atmost 2 lookups. Similarly, for 64 elements search for any element will take atmost 4 lookups. But when the number of elements goes to a very high value, say 10,00,000 — we will require 62,500 lookups. This is not good performance.

To fix this problem, HashMap performs a resize operation when needed. The resize operation is usually the doubling of the size of the internal array of buckets. But this operation is costly, as it involves creating a new array of higher size and then copying the data from the old array to the new array as per the hashing algorithm. This is where the Load Factor comes into play. The Load Factor is a threshold which decides when a resize operation will be performed. Typically this value is 0.75. So if the initial size of the internal array was 16, the resize operation would be triggered when the number of elements exceeds 12 (0.75 * 16).

HashMap with an initial capacity

The HashMap that takes an initial capacity allows you to specify an estimate of the number of elements that it will hold initially. This can be useful for performance optimization in situations where you have some prior knowledge of the expected size of the HashMap.

Kotlin :

//Kotlin HashMap with initial capacity of 5
private val hashMap = HashMap<Int,String>(5)

Java :

//Java HashMap with initial capacity of 5
HashMap<Integer, String> hashMap = new HashMap<>(5);

So what’s better?

When you create a HashMap without specifying an initial capacity, it starts with a default capacity, and as you add more elements, it may need to be resized to accommodate the new elements as described above. Resizing involves creating a new internal array and rehashing the elements, which can be a computationally expensive operation. By providing an initial capacity that's closer to the actual number of elements you expect to store, you can potentially reduce the number of resize operations and improve performance.

However, it’s important to note that by providing an initial capacity that’s much larger than the actual number of elements you end up storing might lead to wasted memory. On the other hand, specifying an initial capacity that’s too small could lead to more frequent resizes, which can also impact performance.

In many cases, the default capacity of a HashMap is sufficient and doesn't require manual adjustment. Modern implementations of HashMaps often have strategies to dynamically resize themselves to accommodate the changing numbers of elements efficiently.

Here are some points to consider:

  • If you have a reasonable estimate of the number of elements you’ll be storing, it’s more beneficial to specify an initial capacity. This can be common when you’re dealing with large amounts of data or specific algorithms where you have insights into the data distribution.
  • Dynamic Resizing: Modern hash map implementations, like those in Kotlin’s standard library, are designed to handle resizing efficiently. They often double the capacity when resizing, which helps mitigate the need for frequent resizes.
  • Profiling and Benchmarking: If you’re unsure whether specifying an initial capacity will yield significant performance improvements, it’s a good practice to profile and benchmark your code with different scenarios to see if there’s a noticeable difference.
  • Readability: To the passionate eye, code is no less than art. Clearly written code with good readability can sometimes be more important than micro-optimizations. If the initial capacity doesn’t drastically impact performance and isn’t crucial to your use case, sticking with the default capacity might result in cleaner code.
  • Flexibility: If you’re uncertain about the exact number of elements your map will hold, it might be better to let the hash map handle resizing automatically.

In summary, using a HashMap with an initial capacity can be beneficial in certain scenarios where you have a good estimate of the expected number of elements. However, it's essential to balance this decision with the potential for wasted memory and the efficiency of modern hash map resizing strategies.

If you liked this article, hit the follow button as I’ll keep posting similar tech articles in the coming time.

Thanks for reading!

--

--

Abhishek Kumar
Abhishek Kumar

Written by Abhishek Kumar

I'm a Tech enthusiast, currently working with VMware. When I'm not turning coffee into code, I enjoy writing, cinematography and reading.

No responses yet