There was already a similar topic "Why AES is not used for secure hashing, instead of SHA-x?", but it was not about files specifically, and I, personally, am not convinced by the answers in it. "AES is not designed for this job" is not an answer.
What bothers me with those answers is, that they theoreticize about how AES is a different kind of algorithm and is not suited for the job, but nobody put forth actual cases - as in, "the suggested implementation would be cracked with this procedure". I hope they were talking in general, and that file hashing may be a different story.
An important fact must be taken into account: all of today's cryptographically secure hash algorithms are slow. Best implementations of SHA-256 are achieving a couple hundred megabytes per second at most. They have an inherent crippling property, that they cannot be computed in parallel - the input data sequence cannot be split.
Today's IO systems are already faster than the fastest single thread consumer hardware can calculate these hashes at. This means that these algorithms have become the bottleneck, and it only goes down from here, because single thread performance is not showing any serious progress anymore (for several CPU generations), whereas IO is becoming faster quickly (thanks to SSD drives and RAM disks, which finally started to push forward the long-stagnating platter drive speeds).
The biggest advantage of AES hash is that we have hardware implementations for it and that AES hash can be designed to enable parallelism.
Let's take a look at a simple scheme: AES256(DATA_BLOCK_0 XOR COUNTER_0) XOR AES256(DATA_BLOCK_1 XOR COUNTER_1) XOR ...
Last data block is padded with zeros. The AES key is known and preset. Another option is to use the data block as the key each time and to encrypt only the counter - I don't know right now if this can have bad impact on speed, as the key needs to change for every block. If there is no serious impact, it may be the better choice.
Anyway, the given scheme is massively parallelizable and can push 2.5 gigabytes per second on a modern quad-core with hardware AES acceleration. It will also scale perfectly in the future, which is more and more CPU cores.
AES hash is to be used because of speed, and it's primary purpose is to hash files, and not to initialize private keys and stuff like that. That is best left to 'real' secure hash algorithms, I agree.
Now for some analysis.
As far as detecting random errors goes, I see no problem with the above given scheme. It should mix well and react randomly for any change.
We do not need to worry about someone reverse-calculating the original data from the hash. File hashes are always distributed with their files, and their purpose is not to obscure the original data. Their purpose is to guarantee (with reasonable probability) that data has not changed, that it is an unmodified file. Hashes of private files should not be made public in either case - even with algorithms such as SHA-256.
The problematic part can only be an intelligent attack which attempts to modify the file in such a way that hash would not change. In the simplest scenario, I believe that an attacker modifies a part of one block to achieve some goal. After that, he needs to modify any one (or more) block, which is deemed unimportant, in such a way that it will produce a known hash - one that it will XOR with the first modified block hash so that the difference will be eliminated and the final hash will remain the same.
Let's leave aside the case of attacker adding data to file, as that can be easily detected and is probably not easier to calculate anyway.
I'm looking at the simple scheme above, and to me it seems that attacker has to search a 2^256 space in order to find the appropriate input, which is about the same as cracking AES. And that space is simply too big.
Now, can someone explain what procedure would enable the described attack?
Thanks.