49

Consider this. Many websites with software downloads also make available MD5 or SHA1 hashes, for users to verify the integrity of the downloaded files. However, few of these sites actually use HTTPS encryption or digital signatures on the website itself.

So, if you're downloading a file from what is effectively an unauthenticated source, and validating it with a hash from the same source (or even another unauthenticated source), what is the real value of hashing the file? Does this not establish a false sense of security, since (in the absence of a digital signature) both the download and the hash could have been tampered with, without the user's knowledge?

Iszi
  • 27,027
  • 18
  • 99
  • 163
  • 1
    I'd wager that the false sense of security is a strong argument for not providing hashes...ever. I don't know the last time TCP/IP failed and I got a corrupt download. That's happened...never. – mlissner Jul 04 '13 at 16:34
  • MD5 is a message digest algorithm, and neither MD5 nor SHA1 are intended for authentication of the author; simply verification of content with high probability assuming no malefactor. MD5 collisions are completely possible. That alone should clue you into the fact that these aren't intended to be security features (or, at least, if they are billed as such the person doing the billing is incompetent). – Parthian Shot Jul 15 '15 at 23:04

11 Answers11

61

So, if you're downloading a file from what is effectively an unauthenticated source, and validating it with a hash from the same source (or even another unauthenticated source), what is the real value of hashing the file?

The provided hash lets you double-check that the file you downloaded was not corrupted accidentally in transit, or that the file you downloaded from another source (a faster mirror) is the same as the file available for download at this website.

However, there is not really much additional security. A sufficiently skilled cracker can replace the file with a maliciously modified version and the hash with one that matches the modified file; or he can MITM the requests over the network and replace both the file requested with his own and the hash with his own.

yfeldblum
  • 2,817
  • 21
  • 13
  • 1
    Good answer. Personally, I've never even thought of using the hash for security; I thought it was for error checking only! – Matthew Read Jan 17 '11 at 22:32
  • @MatthewRead, That doesn't make sense. They are made for security. – Pacerier May 23 '15 at 21:12
  • 4
    @Pacerier Hashing certainly didn't originate for security. If you're disagreeing with this answer and the premise of the question, better to address that than a 4+ year old offhand comment. – Matthew Read May 23 '15 at 22:12
  • @MatthewRead, The question was talking about SHA1 and MD5. These are called secure hashes which are fundamentally different from CRC checksums. – Pacerier May 23 '15 at 22:15
  • 1
    @Pacerier as the other answers, by security professionals, have made clear, an attacker can typically change the hashes also, and hashes are not designed to be used by themselves to sign things. Furthermore, MD5 was broken years ago. It is easy now to create two files with the same MD5 hash. SHA1 is also suspect, and does not meet modern requirements. But as noted, even switching to SHA-2 or SHA-3 by itself, even with https, would provide hardly any protection. – nealmcb Apr 06 '16 at 00:45
20

The benefit there is indeed limited. As you pointed out, if you can replace one thing on a site, you can probably replace both.

This does, however, have some benefits:

  • It allows other sites to host large files with verified integrity. With that I can grab the file from some random 3rd party who I have no reason to trust and still verify that it is a good file based on the source site (which may have limited bandwidth).
  • If your site is popular enough, it is likely that there are enough copies of the old hash to quickly confirm a compromise to the public, including via archive.org and various search engine caches.

While there is added security to be gained by having signing files with a public / private key pair, the practical benefit for most applications is no greater with a key than without. If I post my key on the website and sign everything instead of posting the hash, an attacker can achieve the same effect replacing the key as was done by replacing the hash. Truly large-scale projects that have independent distribution need this extra layer (Debian comes to mind), but I think that few would benefit from it.

Jeff Ferland
  • 38,170
  • 9
  • 94
  • 172
13

For the hash to have some security-related value, the two following conditions must both be fulfilled:

  • the hash value must be distributed through a protocol which guarantees integrity (e.g. HTTPS);
  • the downloaded file must not be distributed through a protocol which guarantees integrity;

because, if the file is also served with HTTPS, then the hash tends to be pointless: SSL already ensures the integrity during the transfer.

The hash will not protect against an attacker who takes control of the server, because he could then modify the hash just as he could modify the file.

An example of the usefulness of the published hash is when you download the file at some date, and then want to check later on that the copy you have is correct, because you do not necessarily trust the integrity of your local storage (e.g. it was on a USB key which could have been temporarily purloined by a person with evil intent). Another example is when the download itself came from a p2p network, because such things are very efficient for distributing bulk software to a lot of clients (that's what Blizzard Downloader does): use the p2p to get the file, then get the small hash value from the main HTTPS site. A third example (which I professionally encounter) is when building a trusted system under heavy audit conditions (e.g. creating a new root CA): you want the auditor to be able to verify that the used software is genuine. If a hash file is distributed (through HTTPS) then the auditor just has to check it against a local archive; otherwise, the auditor must witness the whole download. Given auditors' hourly rates, the hash is preferable.

Thomas Pornin
  • 322,884
  • 58
  • 787
  • 955
7

So, if you're downloading a file from what is effectively an unauthenticated source, and validating it with a hash from the same source (or even another unauthenticated source), what is the real value of hashing the file?

In this situation, where a hash sits immediately beside a link to the file, the value is primarily in ensuring that the file isn't damaged or corrupted in transit.

From a security standpoint, you're right; this has very little value to prove that the file hasn't been tampered with, because anyone who could get in and upload a hacked file (or change the link to point to a hacked copy) could also replace the hash sitting alongside it. That's why it's almost never done this way when the objective is security.

Typically, when a hash digest is offered, it comes directly from the software producer. They offer software for download, but are unwilling to directly host the software on their own servers; they are a software company, not a hosting company, and don't have the bandwidth in and out of their own servers to allow thousands of concurrent broadband downloads. So, they rent space and bandwidth from a cloud provider to provide the same service.

Now, they don't control this cloud; it's a different system maintained by a different company, "my house, my rules". The software provider is worried about hacking and having their good name tarnished by a compromise of the hosting company, and for good reason; this is a common method of attack, and it's the software company's name on the software that allowed an attacker to get in to a corporate user's network and cause havoc.

The solution is for the software producer to hash the files being offered for download on the hosting site, and present that hash from its own systems under its control. Now, you, the end user, can download the software from the big hosting site, and verify that what you got was what the software company put up there by going to the software company's site and comparing their listed file hash with one you compute from the downloaded file. This takes much less bandwidth for the software company than hosting the actual file for download. Now, there isn't a single point of vulnerability anymore; an attacker must hack both the cloud and the software producer's site in order to put a file in place that will pass the hash-checking. They can mess up people who don't bother to check the hashes (which is a lot of people) by just hacking the hosting site and replacing the file, or they can mess up people who do check hashes by hacking the software provider's site and changing the hashes so the real file's hash no longer matches, but they can only truly masquerade a hacked file as the software company's own by doing both, which is much more difficult.

KeithS
  • 6,758
  • 1
  • 22
  • 39
7

You're right that just having a raw hash is of limited benefit, since you also need to securely distribute the hash, and most people don't know how to properly check it or navigate the many possible issues.

The right way to get good security is to use public-key technology to sign the hashes, and have it seamlessly integrated into the whole software distribution scheme. It is still necessary to securely distribute the public key, but this can be done once, e.g. when the operating system is installed. I assume that is essentially what is done with Windows and MacOS for OS updates and a few major packages like Office.

Best of all is when nearly all the software you need is covered by the same standard keys. That is essentially what happens with most open source software distributions, like Debian, Ubuntu, Red Hat, Suse, etc. They securely distribute literally tens of thousands of packages, all automatically signed with keys that are managed as part of the distribution, and thus highly secure. And it mostly happens without anyone needing to do any manual checks.

nealmcb
  • 20,693
  • 6
  • 71
  • 117
  • There's no point in this if the page was already HTTPS right? Then a simple SHA2 would suffice. – Pacerier May 23 '15 at 21:38
  • @Pacerier HTTPS only protects data in transit between your client and the site. It doesn't tell you anything about the files that are served by the site. Since an attacker can often get files on to a site, via a wide variety of methods (up-front, surreptitious or illegal), you really need file-level signatures. – nealmcb May 23 '15 at 22:47
  • When you say "an attacker can often get files on to a site", do you mean they can modify the binary? If they are able to modify the binary, couldn't they also modify the checksum as well? – Pacerier May 24 '15 at 13:08
  • @Pacerier Yes, an attacker can modify the binary, and they can modify a simple checksum. That is the whole point of this question - simple checksums provide little additional security. But public key signatures provide lots of additional security, since if someone changes the signature, it won't be valid when checked. And they would have to have access to the signers private key (which is hopefully offline and may be secured in a hardware security module) to create a valid signature for a modified binary. – nealmcb May 24 '15 at 17:13
5

Generally speaking, yes - a hash from a http site provides little-to-no assurance that the data came from a trusted source.

But it depends on your definition of security, i.e. your threat model. One aspect of security is data integrity, and checking a hash will often help you avoid wasting time on a bad CDROM or a corrupted download.

It is much better to get software from a package system that provides good authentication all the way from the programmer, thru version control, to packaging and distribution.

nealmcb
  • 20,693
  • 6
  • 71
  • 117
4

For your home user it is usually 'enough' comfort that the file is correct (and by that I mean - the download worked and there is no corruption) - although from a security perspective an attacker could just as easily replace the file hashes as the files if they wanted to if they are stored in the same location.

for a corporate security person or for something a little more sensitive, you'd want to really validate the hash, probably by an out of band mechanism.

Rory Alsop
  • 61,474
  • 12
  • 117
  • 321
3

In the situation you described, the only use for a signature is really just to make sure the file isn't corrupt.

Steve
  • 15,215
  • 3
  • 38
  • 66
  • But note that a hash is not in any way a signature. It is a one-way function, a secure checksum, but contains no evidence that anyone is vouching for anything. – nealmcb May 26 '11 at 14:05
2

There are a a few cases where checking the hash of a file could be prove a security benefit, even if the hash wasn't downloaded through a secure connection:

  • If only your connection is being tampered with, you could visit the web page containing the hash using another connection (e.g. a secure VPN). If the page, when viewed through that other connection, displays the same hash it is far less likely to have been tampered with.
    Of course, you could achieve the same result by downloading the same file twice through different connections, but this is much faster.
  • If the attacker, whether human or software, did not care to calculate the hash of the malicious file and replace the original hash with it. Be careful though, relying on attackers being lazy is not the best defense.
  • If the file is downloaded from a different site than the hash and that site has been compromised but the site providing the hash was not. For example, the file may be distributed through a mirror or CDN.
  • If the hash provided is digitally signed with a trusted key. For example hashes signed with a trusted PGP signature.

Still, when providing hashes through an unsecured connection, the main advantage is that you can verify that the file transmitted wasn't corrupted in transit. Nowadays this doesn't happen very often, but it does occur. Burning a corrupt Linux ISO can give you a corrupted install that you may only only notice when it's already too late. Flashing your BIOS with a corrupted download could be even worse.

One other thing to consider is that, whilst providing hashes may indeed give a false sense of security, it is very likely that the person downloading the file would have downloaded it anyway, even if no hashes were present. In that case, the small security benefit provided by hashes may outweigh nothing at all.

user2428118
  • 2,788
  • 16
  • 23
  • In your third bullet point, you mentioned that the if the hash is hosted on a different site then it would be ok to send the hash to the user over an unsecured connection. But cant an attacker change the hash that is being sent to the user? – Rads Oct 26 '16 at 07:49
  • 1
    @Rads The scenario I'm thinking of for #3 is not an active man-in-the-middle attack against a user, but a third-party website hosting the software itself being compromised and the website with information about the software (including a hash) not. For example, in August this year open-source software site [FossHub](http://www.ghacks.net/2016/08/03/attention-fosshub-downloads-compromised/) got hacked and some of their downloads replaced with malware. (Continued in next comment.) – user2428118 Oct 26 '16 at 08:13
  • 1
    FossHub is used by many open-source projects to handle their downloads, most of which have their own website (e.g. [qBittorrent](http://www.qbittorrent.org/download.php) and many other projects). By checking the setup file downloaded from FossHub against the hashes provided at the qBittorrent website, one could have discovered the file had been tampered with. (qBittorrent was not one of the files that got replaced with malware in this particular case, but you get the idea. I picked qBittorrent because they actually provide hashes on their own website.) – user2428118 Oct 26 '16 at 08:13
0

@Iszi here's a short story... (ok, maybe not "short"... sorry for that :P)

let's say you want to download VLC. For windows. Lastest version (ver. 1.1.5). Ok ?

Your first place to go would be http://www.videolan.org (official site). But you can find and get the "same" prog from 5.780.000 sites. Right? (google "vlc download 1.1.5").

The official site says that MD5 of file is "988bc05f43e0790c6c0fd67118821d42" (see link). And you can get this prog (ver. 1.1.5) either from official videolan.org WEB server (NO HTTPS). Or click the link , redirect to Sourceforge and get it. And again with NO HTTPS. But Sourceforge is a big name. Trustworthy. Right? And guess what. Your uncle who has VLC, sends you a rapidshare link via email, to download it. And your friend from work too.

So you download it from these "trustworthy sources".

Friends, sites, versions, uncles. You trust them all. Right ? I don't think so. At least you shouldn't.

There is one (and only one) way to check that what you got, is the original file. Unaltered, unmodified. Untouched. And that is to compare the hash of it. But with what? With the hash from the official source.

No HTTP(S), no digital signatures, no "Secure" or "Trusted" server. Nothing. You don't need any of these. Data integrity is your friend.

Its the same with PGP/GnuPG. You can detect whether a message has been altered since it was completed.

@Justice said that

A sufficiently skilled cracker can replace the file with a maliciously modified version and the hash with one that matches the modified file

Sure, MD5 Collisions have been found to exist. And SHA-1 collisions also exist. But to quote Wikipedia:

Cryptographic hash functions in general use today are designed to be collision resistant, but only very few of them are absolutely so. MD5 and SHA-1 in particular both have published techniques more efficient than brute force for finding collisions. However, some compression functions have a proof that finding collision is at least as difficult as some hard mathematical problem (such as integer factorization or discrete logarithm). Those functions are called provably secure.

And I don't think that a "sufficiently skilled cracker" could find a "bad version" or make one file/binary/whatever of the original you want and make it have the same MD5/SHA-1 as the original. And make it look the same (pic). Or with the same filesize, or even make it run or make sense (text). Not even close. He can make it undetected for antivirus (if malicious). That's a different story. But there are some really bad cases for collissions reported.

So, download from ANY (good) source you like, but compare hash from the original source. And there is always one original source.

labmice
  • 1,338
  • 1
  • 9
  • 11
  • 3
    You seem to have @Justice's comment confused. They were not saying that Mallory could alter the VLC Player in such a way that the new file (we'll call this VLC-M) matches the hash of the old one. They were (and I am) saying that, because there are no digital signatures involved on the sites or files in question, Mallory could hack VLC's website, post VLC-M in place of VLC, and change the posted hash from that of VLC to one that matches VLC-M - and the user would have no way of knowing this happened. – Iszi Jan 19 '11 at 17:26
  • Well yes. In that case all bet are off. But... 1) Zbot Authors Forge Kaspersky Digital Signature (http://news.softpedia.com/news/Zbot-Authors-Forge-Kaspersky-Digital-Signature-150817.shtml) 2) Infected File Signed by Symantec Outlines Industry Problem (http://news.softpedia.com/news/Infected-File-Signed-by-Symantec-Outlines-Industry-Problem-152120.shtml) 3) New Stuxnet-Related Malware Signed Using Certificate from JMicron http://news.softpedia.com/news/New-Stuxnet-Related-Malware-Signed-Using-Certificate-from-JMicron-148213.shtml Are we talking about the same thing ? – labmice Jan 20 '11 at 06:36
  • I think we're still not quite speaking the same language here. I'm talking about file *hashes*, like your basic MD5 and SHA1 - in *absence* of digital signatures. Interesting news links, though. – Iszi Jan 23 '11 at 22:19
  • Good point, @iszi. @labmice, your news links are good examples of how 1) a digitally signed hash still needs to actually be checked against the software in question, 2) sometimes bad stuff gets signed, so look for the latest copy and 3) signing keys can also be stolen. But thankfully, most signing keys are far better-protected than most sites, and when compared with the huge number of compromised web sites out there (where unsigned hashes could trivially be changed), digital signatures are a huge win. – nealmcb May 26 '11 at 13:56
0

If you were to really think about it, having multiple sites to host your downloadable content, together with the hash keys would stop most run of the mill replacement attacks. Once again the assumption is the threat model where the attacker would have to replace in situ as opposed to in transit. Even with a single source, the copies would stop anyone from replacing both the file and the hash.

And, file sources can get corrupted through downloading.

mincewind
  • 41
  • 4