Is AES feasible for secure hashing files?

Question

There was already a similar topic "Why AES is not used for secure hashing, instead of SHA-x?", but it was not about files specifically, and I, personally, am not convinced by the answers in it. "AES is not designed for this job" is not an answer.

What bothers me with those answers is, that they theoreticize about how AES is a different kind of algorithm and is not suited for the job, but nobody put forth actual cases - as in, "the suggested implementation would be cracked with this procedure". I hope they were talking in general, and that file hashing may be a different story.

An important fact must be taken into account: all of today's cryptographically secure hash algorithms are slow. Best implementations of SHA-256 are achieving a couple hundred megabytes per second at most. They have an inherent crippling property, that they cannot be computed in parallel - the input data sequence cannot be split.

Today's IO systems are already faster than the fastest single thread consumer hardware can calculate these hashes at. This means that these algorithms have become the bottleneck, and it only goes down from here, because single thread performance is not showing any serious progress anymore (for several CPU generations), whereas IO is becoming faster quickly (thanks to SSD drives and RAM disks, which finally started to push forward the long-stagnating platter drive speeds).

The biggest advantage of AES hash is that we have hardware implementations for it and that AES hash can be designed to enable parallelism.

Let's take a look at a simple scheme: AES256(DATA_BLOCK_0 XOR COUNTER_0) XOR AES256(DATA_BLOCK_1 XOR COUNTER_1) XOR ...

Last data block is padded with zeros. The AES key is known and preset. Another option is to use the data block as the key each time and to encrypt only the counter - I don't know right now if this can have bad impact on speed, as the key needs to change for every block. If there is no serious impact, it may be the better choice.

Anyway, the given scheme is massively parallelizable and can push 2.5 gigabytes per second on a modern quad-core with hardware AES acceleration. It will also scale perfectly in the future, which is more and more CPU cores.

AES hash is to be used because of speed, and it's primary purpose is to hash files, and not to initialize private keys and stuff like that. That is best left to 'real' secure hash algorithms, I agree.

Now for some analysis.

As far as detecting random errors goes, I see no problem with the above given scheme. It should mix well and react randomly for any change.

We do not need to worry about someone reverse-calculating the original data from the hash. File hashes are always distributed with their files, and their purpose is not to obscure the original data. Their purpose is to guarantee (with reasonable probability) that data has not changed, that it is an unmodified file. Hashes of private files should not be made public in either case - even with algorithms such as SHA-256.

The problematic part can only be an intelligent attack which attempts to modify the file in such a way that hash would not change. In the simplest scenario, I believe that an attacker modifies a part of one block to achieve some goal. After that, he needs to modify any one (or more) block, which is deemed unimportant, in such a way that it will produce a known hash - one that it will XOR with the first modified block hash so that the difference will be eliminated and the final hash will remain the same.

Let's leave aside the case of attacker adding data to file, as that can be easily detected and is probably not easier to calculate anyway.

I'm looking at the simple scheme above, and to me it seems that attacker has to search a 2^256 space in order to find the appropriate input, which is about the same as cracking AES. And that space is simply too big.

Now, can someone explain what procedure would enable the described attack?

Thanks.

You can use AES to compute a MAC (keyed!), but not as collision resistant hash (unkeyed!). Typical AES based MACs lose all security one an attacker learns the key. — CodesInChaos, Jan 17 '14 at 09:32
Why your obsession with speed. Who said hashing has to be fast? In fact a lot of people are deliberately slowing it down by doing multiple iterations (I'm thinking password DB here). It takes more time, and thus more computation power for the bad guys to bruteforce, which is good. — djf, Jan 17 '14 at 09:49
@djf The OP is talking about file hashing, not password hashing. Password hashing is supposed to be slow, but file hashing should be as fast as possible. In particular it should be able too keep up with IO, which can be annoying for very fast SSDs. — CodesInChaos, Jan 17 '14 at 10:12
@CodesInChaos you've got a point. I didn't read carefully enough. The other point I fail to see how the recipient would verify the hash w/o knowing the key.. — djf, Jan 17 '14 at 10:21
1) There are secure AES based MACs, but yours isn't one of them. Xorring the counter doesn't prevent block swapping, since the attacker can trivially adjust them. 2) There are faster hashes than SHA-2. You could consider Blake2 and Skein. — CodesInChaos, Jan 17 '14 at 10:25
You might want to read the paper " On the Impossibility of Highly-Efficient Blockcipher-Based Hash Function" (http://www.iacr.org/archive/eurocrypt2005/34940527/34940527.ps‎). Also that crypto stack exchange question http://crypto.stackexchange.com/questions/10284/hash-function-based-on-block-cipher-and-proof-of-security-in-the-prp-model — Bruno Rohée, Jan 17 '14 at 15:33

score 7 · Answer 1 · edited Apr 13 '17 at 12:48

Here's the file that describes our records at the First Bank of Craptology. It's a simple format with fixed-width records: a 16-byte user name and a 16-byte balance.

user37442_______0000000099999999
Gilles__________0000000000000042

Under your proposal (if I've read it correctly, your statement is not very precise), its hash is AES[K](your_name ⊗ (C + 0)) ⊗ AES[K](your_balance ⊗ (C + 1)) ⊗ AES[K](my_name ⊗ (C + 2)) ⊗ AES[K](my_balance ⊗ (C + 3)) where C is the initial counter value and K is some key. Here's another file with the same hash, assuming that C is a multiple of 4:

user37442_______0000000000000040
Gilles__________0000000099999997

So you have a cryptographic scheme that you can't break? Big deal. Your task is to come up with a scheme that nobody can break.

I'm sure that you can come up with a simple modification of your scheme that would make my counter-example fail. Quite possibly if you set your mind to it you'll be able to find a scheme that I can't break. But I'm not a cryptographer.

If you want to be taken seriously, then:

Understand what you're talking about. Read up on what defines a cryptographic hash. What you're looking for here is weaker than a hash — a cryptographic hash must have preimage resistance. Ok, there may be a point in a weaker primitive, but you need to be clearer about what you're attempting to do. Define your security problem properly.
Work seriously on attacking your own scheme. Don't just handwave that “it should mix well and react randomly”. Read up on how other people's schemes have been broken and try the same techniques against your proposal.
Once you have something that you can't break — and that you've seriously tried to break — publish it.
Wait until years have passed and many professional cryptographers have attempted to attacks your scheme and they have all failed. At that point, your scheme can be given serious consideration for use in applications.

On the specific topic of using AES-based algorithms for hashing, here's a bit of reading:

Having a CPU that can keep up with an IO system isn't that big a deal in most applications. Usually the IO system is used by several instances in parallel anyway.

score 4 · Answer 2 · answered Jan 17 '14 at 12:38

Rejecting general arguments and insisting on having your specific "proposal" be broken in a way which convinces you is a known fallacy. It is the main tactic of most crackpots, who try to lure some experts into an endless cycle of "it is broken this way -- yes but what if I change this bit ? -- then it is broken that way -- ok but what about XORing that bit there ? -- ...". When the expert finally gets bored of talking to the network equivalent of a parrot, or has work to do, and walks away, the crackpot claims victory.

A decent cryptographic proposal comes with positive arguments that it cannot be broken, not arguments about how you do not know how to break it. Otherwise, it is useless and nobody will look at it.

Though in your case, preimages are trivial. If K is known, then decryption by K is as easy as encryption by K; correspondingly, given AES_K(data XOR counter0), obtaining "data XOR counter0" is a matter of 28 clock cycles, no more. You "hash" cannot even be called "weak"; it offers no resistance whatsoever to attacks. It is even poor against non-attacks, and will incur more spurious collisions when used on "normal data" than a CRC32 or MD4.

In fact I remember having this very example (with another block cipher) used in an introductory course on cryptography to demonstrate how these things are not improvised. First-time students are expected to find it trivially breakable.

Is AES feasible for secure hashing files?

2 Answers2