C#: GetHashCode() might cause OverflowException

Microsoft recommends if you overload Equals method you should also overload GetHashCode. Now, how to properly implement GetHashCode? There are many resources on the web that describe it. A good starting point might be this article on Stack Overflow.

Following MSDN guidlines GetHashCode must fulfill these requirements:

  • If two objects of the same type represent the same value, the hash function must return the same constant value for either object.
  • For the best performance, a hash function must generate a random distribution for all input.
  • The hash function must return exactly the same value regardless of any changes that are made to the object.

Sticking to first bullet, you (probably?) should consider the same fields in Equals and GetHashCode methods. Let’s have a look at the example in which I did so:

public class Contact
{
    public int ID { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }

    public override bool Equals(object obj)
    {
        // If parameter is null return false.
        if (obj == null)
        {
            return false;
        }

        // If parameter cannot be cast to Contact return false.
        Contact c = obj as Contact;
        if c == null)
        {
            return false;
        }

        // Return true if the fields match:
        return ID == c.ID 
            && FirstName == c.FirstName
            && LastName == c.LastName;;
    }

    public override int GetHashCode()
    {
        return ID.GetHashCode()
            + FirstName.GetHashCode()
            + LastName.GetHashCode();
    }
}

Now, what is wrong with this example of GetHashCode? There’s one drawback here. The hash is calculated as a sum of three integer values, which might give a value that is greater than int.MaxValue and that will result in OverflowException.

Default configuration of C# project results in this exception being swallowed so you won’t experience that problem. However, you can enforce checking for arithmetic overflow / underflow in project settings, advanced settings in build section:

project settings 350x298 C#: GetHashCode() might cause OverflowException

Once you do this, it’s highly possible an OverflowException will occur in your application. Therefore it might be useful to turn on checking for arithmetic overflow / underflowo only in Debug mode.

One way or the other, implementation of hash method presented above needs to be improved.

Solution

Operation on bits

One possible solution would be to change + operator to one that operates on bits, e.g. ^ (XOR). That’s bitwise exclusive-OR, whose result for two integers will always fit an integer value.

public override int GetHashCode()
{
    return ID.GetHashCode()
        ^ FirstName.GetHashCode()
        ^ LastName.GetHashCode();
}

Use unchecked

In an unchecked context, if an expression produces a value that is outside the range of the destination type, the result is truncated. That’s enough for GetHashCode.

public override int GetHashCode()
{
    unchecked
    {
        return ID.GetHashCode()
        + FirstName.GetHashCode()
        + LastName.GetHashCode();
    }
}

To learn more on unchecked and checked, please refer to this post or that article on Code Project.

Conclusion

I presented two possible fixes for risk of OverflowException thrown in GetHashCode. Now, you might expect an answer which solution is better. I honestly don’t know. Probably there are scientific papers that elaborate on that. All I can say both both options do what they should.

By the way, if you speak on better uniqueness of the calculated hash, you might include prime numbers as a carefully selected random number multiplier, which in details is described there.

10 Responses to “C#: GetHashCode() might cause OverflowException”


  • Bitshifted XOR is the correct way.

    return (ID << 24)
    ^ (FirstName.GetHashCode() << 16)
    ^ LastName.GetHashCode(); // No need for GetHashCode on int because it just returns itself.

  • Steve,

    On what basis you say so? I mean ok but can you give some explanation?

    Jarek

  • Because that is the used implementation by Microsoft :p
    For example, the Tuple that is introduced in .NET 4.0 has a CombineHashCodes method:

    internal static int CombineHashCodes(int h1, int h2)
    {
    return (((h1 << 5) + h1) ^ h2);
    }

    The + h1 is used here because the overloads just call this methods again when there are more items in the tuples:

    internal static int CombineHashCodes(int h1, int h2, int h3, int h4, int h5, int h6)
    {
    return CombineHashCodes(CombineHashCodes(h1, h2, h3, h4), CombineHashCodes(h5, h6));
    }

    Bitshifting will have less collisions then using normal ^ or +

    If your object has 2 int properties (4 and 5) then using ^ or + would give the same hashcode then an object with values (5 and 4). Bitshifting will give an unique hash for both objects.

  • Steve,

    The fact that Microsoft did so does not satisfy my curiosity, if you know what I mean :)

    Anyway, thanks for sharing that information.

    Jarek

  • Good point, so I made a little test case :) The idea of the hashCode is to provide a unique number for different instances (based on their data) without having collisions.

    So keeping with the simple case of 2 ints:

    static void PrintHashes(int x1, int x2)
    {
    Console.Write(Tuple.Create(x1, x2).GetHashCode());
    Console.Write(” – “);
    Console.Write(x1 ^ x2);
    Console.Write(” – “);
    Console.Write(x1 + x2);
    Console.WriteLine();
    }

    Which will generate the following results:

    PrintHashes(4, 5);
    // 129 – 1 – 9
    PrintHashes(4, 4);
    // 128 – 0 – 8
    PrintHashes(5, 4);
    // 161 – 1 – 9
    PrintHashes(5, 5);
    // 160 – 0 – 10

    XOR without bitshifting is dangerous because 2 identical values will cancel each other out, simple addition will generate the same hashcode if the values are in a different order, the algo used for the Tuple generates a unique hash for each different object.

  • And what if ID == null? Shouldn’t the GetHashCode() return ID.GetHashCode() return an exception?

  • ID is an int, so the default value will be 0 with GetHashCode() == 0

  • Sorry, I meant FirstName.. What I’d like to know, if you have a GetHashCode based on a nullable value, and this one come to be null.. How would you handle this case ?

  • Most implementation will also use 0 as hashcode for null.

    For example Nullable uses:

    public override int GetHashCode()
    {
    if (!this.HasValue)
    {
    return 0;
    }
    return this.value.GetHashCode();
    }

Leave a Reply