C# Programming > Data

C# HashSet

HashSet Data Structure

The C# HashSet data structure was introduced in the .NET Framework 3.5. A full list of the implemented members can be found at the HashSet MSDN page.

So first off, a Set data structure is a list with no duplicate values. It is more or less modeled after a mathematical set, which has only unique elements.

Although a HashSet is a list of elements, it does not inherit the IList interface. Instead it just inherits the ICollection interface. What this means is elements inside a HashSet cannot be accessed through indices, only through an enumerator (which is an iterator).

Additionally, since a C#.NET HashSet is modeled after a mathematical set, certain functions were implemented such as Union, Intersection, IsSubsetOf, IsSupersetOf. These can come in handy when working with multiple sets.

A resulting difference between a HashSet and a List is that the HashSet's Add method is a boolean function, which returns true of an element was added and false if it wasn't (because it was not unique).

Why not List<>

Since a C# HashSet is simply a collection of unique objects, you might wonder why it has to be a data structure. A normal List could have the same behavior by checking if an object is found in the List before adding it.

The short answer is speed. Searching through a normal List gets very slow very fast as more elements are added. A HashSet requires a structure design that will allow for fast searching and insertion speeds.

Benchmarks

Let's compare the performance speed of a C# HashSet vs a C# List.

Each trial consisted of adding integers from 0 to 9,999 to each collection. However, mod 25 was applied to each integer. Mod 25 makes the maximum types of items 25. Since 10,000 elements were added, this forced 400 collisions to occur, giving the data structures a chance to use their searching algorithms. Times were measured 3 times after 10,000 trials and averaged out.

Don't pay too much attention to the specific running times of the tests since they are dependent on my hardware, but look at how they compare to each other.

HashSet - average time: 2290 milliseconds
List - average time: 5505 milliseconds

Now let's make elements objects instead of primitive types. I wrote a quick Person class with three fields: Name, LastName, and ID. Since I did not include any specific way to compare objects, all the elements will be added without collisions. This time 1,000 Person objects were added to each collection for a single trial. The total times of 3 sets of 1,000 trials were averaged out.

HashSet - average time: 201 ms
List - average time: 3000 ms

As you can see, the difference in running times becomes astronomical when using objects, making C# HashSet advantageous.

Conclusion

The C# HashSet class is a powerful data structure. Not only is it very fast and efficient, but it comes with a bunch of built-in set functions. It is a welcomed addition to C#.NET.

Back to C# Article List