Checksums vs Hashes vs Signatures
You mention checksums, PGP, and SHA in your question title, but these are all different things.
What is a checksum?
A checksum simply verifies with a high degree of confidence that there was no corruption causing a copied file to differ from the original (for varying definitions of "high"). In general a checksum provides no guarantee that intentional modifications weren't made, and in many cases it is trivial to change the file while still having the same checksum. Examples of checksums are CRCs, Adler-32, XOR (parity byte(s)).
What is a cryptographic hash?
Cryptographic hashes provide additional properties over simple checksums (all cryptographic hashes can be used as checksums, but not all checksums are cryptographic hashes).
Cryptographic hashes (that aren't broken or weak) provide collision and preimage resistance. Collision resistance means that it isn't feasible to create two files that have the same hash, and preimage resistance means that it isn't feasible to create a file with the same hash as a specific target file.
MD5 and SHA1 are both broken in regard to collisions, but are safe against preimage attacks (due to the birthday paradox collisions are much easier to generate). SHA256 is commonly used today, and is safe against both.
Using a cryptographic hash to verify integrity
If you plan to use a hash to verify a file, you must obtain the hash from a separate trusted source. Retrieving the hash from the same site you're downloading the files from doesn't guarantee anything. If an attacker is able to modify files on that site or intercept and modify your connection, they can simply substitute the files for malicious versions and change the hashes to match.
Using a hash that isn't collision resistant may be problematic if your adversary can modify the legitimate file (for example, contributing a seemingly innocent bug fix). They may be able to create an innocent change in the original that causes it to have the same hash as a malicious file, which they could then send you.
The best example of where it makes sense to verify a hash is when retrieving the hash from the software's trusted website (using HTTPS of course), and using it to verify files downloaded from an untrusted mirror.
How to calculate a hash for a file
On Linux you can use the md5sum
, sha1sum
, sha256sum
, etc utilities. Connor J's answer gives examples for Windows.
What is a signature?
Unlike checksums or hashes, a signature involves a secret. This is important, because while the hash for a file can be calculated by anyone, a signature can only be calculated by someone who has the secret.
Signatures use asymmetric cryptography, so there is a public key and a private key. A signature created with the private key can be verified by the public key, but the public key can't be used to create signatures. This way if I sign something with my key, you can know for sure it was me.
Of course, now the problem is how to make sure you use the right public key to verify the signature. Key distribution is a difficult problem, and in some cases you're right back where you were with hashes, you still have to get it from a separate trusted source. But as this answer explains, you may not even need to worry about it. If you're installing software through a package manager or using signed executables, signature verification is probably automatically handled for you using preinstalled public keys (i.e. key distribution is handled by implied trust in the installation media and whoever did the installation).
Related Questions