39

I'm wondering how bad it is to truncate a SHA1 and only compare, say, the first 10/12 bytes, etc. I'm working with a fixed length of 8 bytes that I need to hash for uniqueness but store with the smallest footprint possible (8 other bytes would be nice, but I guess not feasible).

A bit more information without sacrificing professional secrets:

  • Some blackbox take a 8 byte value, transforms it and transmits it to a server, where it is checked for validity against a table of know items.
  • Neither the blackbox nor the server should be able to know the original 8 bytes.

Best solution would be some kind of 1 to 1, one way relation, like asymmetrical encryption. But I don't think any encryption mechanism outputs so little bytes.

Whymarrh
  • 312
  • 3
  • 17
Agnar
  • 493
  • 1
  • 4
  • 6
  • Can you just store an ID that points to another location with the actual hash? – Gray Nov 10 '14 at 18:13
  • @Gray Given that each input should hash to a unique output, that would actually increase the storage requirements rather than reduce them. – Xander Nov 10 '14 at 18:21
  • @Xander The idea was that that column/table size could stay the same, and reference a different location. If it is a hard limit on storage in terms of bytes overall, then my kludge would not make sense. – Gray Nov 10 '14 at 18:23
  • 3
    Is the source data you are hashing only bytes long? – poke Nov 10 '14 at 18:31
  • 14
    Are you using the hash in a security application, or is this for something like a hashmap? If it's a security application, 8 bytes is already within range of a brute-force attack, and your system is insecure before you start. If it's a hashmap, you'll probably be fine as you'll have to deal with the risk of collisions anyway. – John Deters Nov 10 '14 at 18:46
  • Maybe MD5 the hash if you need it smaller. The chance of collision would be very small since you're only checking the integrity of 8 bytes. – Andrew Hoffman Nov 10 '14 at 18:56
  • We can probably give you better answers if you give us more context. (a) What are you using the truncated hash for? What properties do you need? Uniqueness? Do you have to deal with adversaries who can choose an input? What's the adversary model (threat model)? (b) Can you choose a random salt and store that? (Either the same salt for all hashes, or a different salt for each hash?) – D.W. Nov 11 '14 at 01:08
  • 1
    "the blackbox should not be able to know the original 8 bytes." and "Some blackbox take a 8 byte value" sounds like a direct contradiction to me. – CodesInChaos Nov 11 '14 at 19:04
  • But I don't think any encryption mecanism outputs so little bytes. <-- why not a stream cipher? – J-16 SDiZ Nov 12 '14 at 08:01

2 Answers2

44

There is no absolute answer, because it depends on the attack model. By truncating the hash, you make some operations easier; this is bad if the attacker wants to perform these operations, and making them easier actually makes them feasible.

There are three main characteristics that cryptographic hash functions try to fulfil:

  • Resistance to preimages: given x, it should be infeasible to find m such that h(m) = x.
  • Resistance to second preimages: given m, it should be infeasible to find m' ≠ m such that h(m) = h(m').
  • Resistance to collisions: it should be infeasible to find m' ≠ m such that h(m) = h(m').

For a perfect hash function that has an output of n bits, costs of finding preimages, second preimages or collisions will be, respectively, 2n, 2n and 2n/2. By truncating the hash output, you lower these costs correspondingly. As a rule of thumb, a cost of 264 is very hard (feasible, but it takes more than a dozen PC) and 280 is infeasible.

In your case, if a collision is interesting for the attacker, and he gets to choose what is hashed, then the attacker will try to input two data elements that imply a hash collision (on your truncated hash). Truncating to 12 bytes makes the attack quite feasible, easy if you go down to 8 bytes. On the other hand, if all the attacker can do is to try to find some data element matching a given truncated hash (second preimage), then at 10 or 12 bytes this is still too hard to be feasible, and at 8 bytes it is merely "very expensive" (so probably not worth the effort).

If there is no attacker and you are just fighting bad luck, then truncating to 8 bytes incurs the risk of spurious collisions; you should reach your first collision after (on average) entering a few billions of entries. Depending on your situation, this may or may not be tolerable.

Tom Leek
  • 170,038
  • 29
  • 342
  • 480
  • Is it conceivable that one could build a chart listing "safe, unsafe, and 'easy'" for truncating a hash of various lengths? I'm imagining one chart for preimage, second preimage, and one for collisions. I would be interested if the security increase is relatively the same per characteristic – makerofthings7 Nov 10 '14 at 19:11
  • Can you clarify the difference between your second and third criteria? Is it just that in the second case, you begin with knowledge of the initial input? – NWard Nov 10 '14 at 20:16
  • In the second one (second preimage), you are given an _m_ and need to find something that hashes to the same thing. In the third one (collision) you get to choose the _m_ yourself. Since you can choose it yourself, that makes it easier. – Buge Nov 10 '14 at 20:29
  • @NWard: for a collision, you challenge the attacker with finding two distinct inputs that hash to the same value. For a second preimage, you issue the same challenge with the additional constraint that you fix one of the two messages that are to collide. (So it makes sense that finding a second preimage is harder than finding a collision.) – Tom Leek Nov 10 '14 at 20:31
  • Thank you both, that makes sense. Are there hash functions which are resistant to the second attack but not the third? E.g where it collides only for specific inputs, and not arbitrarily? Or does resistance to (2) always guarantee resistance to (3) (and vice-versa)? – NWard Nov 10 '14 at 20:34
  • 2
    It is possible to define quite contrived hash functions that show how resistances to preimages, second preimages and collisions are distinct concepts. However, if you resist collisions then you resist second preimages at least as well (though you normally expect a much better resistance for second preimages than for collisions). – Tom Leek Nov 10 '14 at 21:02
  • @TomLeek: Can you please have a look here: http://security.stackexchange.com/questions/84226/reducing-hash-output-length? –  Mar 20 '15 at 15:04
21

Regarding truncation in general, it's not necessarily bad at all. In fact, the SHA‑384 algorithm is defined as doing a (slightly modified) SHA‑512 and then truncating the result to 384 bits. I'd like to add to the existing answers by providing the relevant material regarding truncation from the SHA standard.

FIPS 180‑4, the SHA standard, explicitly allows truncation of the leftmost bits.

Some application may require a hash function with a message digest length different than those provided by the hash functions in this Standard. In such cases, a truncated message digest may be used, whereby a hash function with a larger message digest length is applied to the data to be hashed, and the resulting message digest is truncated by selecting an appropriate number of the leftmost bits. For guidelines on choosing the length of the truncated message digest and information about its security implications for the cryptographic application that uses it, see SP 800‑107 [SP 800‑107].

SP 800-107 spells out the rules for truncation:

Let the (shortened) message digest be called a truncated message digest, and let λ be its desired length in bits. A truncated message digest may be used if the following requirements are met:

  1. The length of the output block of the approved hash function to be used shall be greater than λ (i.e., L > λ).

  2. The λ left-most bits of the full-length message digest shall be selected as the truncated message digest.

    For example, if a truncated message digest of 96 bits is desired, the SHA‑256 hash function could be used (e.g., because it is available to the application, and provides an output larger than 96 bits). The leftmost 96 bits of the 256-bit message digest generated by SHA‑256 are selected as the truncated message digest, and the rightmost 160 bits of the message digest are discarded.

  3. If collision resistance is required, λ shall be at least twice the required collision resistance strength s (in bits) for the truncated message digest (i.e., λ ≥ 2s).

It ends the section on truncation with this warning about the security of a truncated hash.

Truncating the message digest can impact the security of an application. By truncating a message digest, the expected collision resistance strength is reduced from L/2 to λ/2 (in bits). For the example in item 2 above, even though SHA‑256 provides 128 bits of collision resistance, the collision resistance provided by the 96-bit truncated message digest is half the length of the truncated message digest, which is 48 bits, in this case.

indiv
  • 311
  • 1
  • 4