2

Reading this question about extracting PDF password hashes to feed into john the ripper has me wondering why password hashes are in a PDF in the first place? Shouldn't it be using the supplied password to derive a symmetric key, and then decrypt the contents of the PDF?


UPDATE: this question explains that PDF encryption uses a symmetric key like I expect. My question still stands about what pdf2john.pl is doing to extract a password hash from a PDF file?

Mike Ounsworth
  • 58,107
  • 21
  • 154
  • 209

1 Answers1

2

Best guess, they contain the hash so that it's possible to check the password for correctness before trying to decrypt the file contents (which might be very long). Hopefully the contents also have a MAC or other integrity / authentication tag of some sort, so you can verify the correctness of the password after decryption, but that could be a lot of work if the file is huge (as some PDFs are) just to tell the user "whoops, wrong password".

If the attacker knows part of the plaintext (or at least knows what it should look like, such as that an embedded image should have a recognizable image file header), that can be used as a "crib" to check the password without either resorting to the hash or trying to decrypt the whole file. However, that doesn't work very well for the general case.

CBHacking
  • 42,359
  • 3
  • 76
  • 107