4

When extracting files from an encrypted zip archive, the user is asked to give a password in order to read the original file.

How does an encrypted ZIP detect when the user has given the correct password?

Obviously it does not connect with some backend service, and it doesn't contain the actual password to compare against. So how exactly does it check? Is there some hash of the original password included in the archive? Is it easy to find this hash?

Selcuk
  • 193
  • 7
CodyBugstein
  • 579
  • 5
  • 12

2 Answers2

9

The Cyclic Redundancy Check (CRC) field is used to determine whether or not the file is decrypted correctly. Quoted from the original ZIP format specification:

After the header is decrypted, the last 1 or 2 bytes in Buffer SHOULD be the high-order word/byte of the CRC for the file being decrypted, stored in Intel low-byte/high-byte order. Versions of PKZIP prior to 2.0 used a 2 byte CRC check; a 1 byte CRC check is used on versions after 2.0. This can be used to test if the password supplied is correct or not.

Update: As can be seen from Info-ZIP's unzip source code, the CRC value is used to check if the password is correct:

https://github.com/LuaDist/unzip/blob/master/crypt.c#L617

#ifdef ZIP10 /* check two bytes */
    c = hh[RAND_HEAD_LEN-2], b = hh[RAND_HEAD_LEN-1];
    Trace((stdout,
      "  (c | (b<<8)) = %04x  (crc >> 16) = %04x  lrec.time = %04x\n",
      (ush)(c | (b<<8)), (ush)(GLOBAL(lrec.crc32) >> 16),
      ((ush)GLOBAL(lrec.last_mod_dos_datetime) & 0xffff))));
    if ((ush)(c | (b<<8)) != (GLOBAL(pInfo->ExtLocHdr) ?
                           ((ush)GLOBAL(lrec.last_mod_dos_datetime) & 0xffff) :
                           (ush)(GLOBAL(lrec.crc32) >> 16)))
        return -1;  /* bad */
#else
    b = hh[RAND_HEAD_LEN-1];
    Trace((stdout, "  b = %02x  (crc >> 24) = %02x  (lrec.time >> 8) = %02x\n",
      b, (ush)(GLOBAL(lrec.crc32) >> 24),
      ((ush)GLOBAL(lrec.last_mod_dos_datetime) >> 8) & 0xff));
    if (b != (GLOBAL(pInfo->ExtLocHdr) ?
        ((ush)GLOBAL(lrec.last_mod_dos_datetime) >> 8) & 0xff :
        (ush)(GLOBAL(lrec.crc32) >> 24)))
        return -1;  /* bad */
#endif
    /* password OK:  decrypt current buffer contents before leaving */
    for (n = (long)GLOBAL(incnt) > GLOBAL(csize) ?
             (int)GLOBAL(csize) : GLOBAL(incnt),
         p = GLOBAL(inptr); n--; p++)
        zdecode(*p);
    return 0;       /* OK */
Selcuk
  • 193
  • 7
  • 1
    With a 1-byte CRC especially (and also with a 2-byte) there will be a high rate of false positives. That's probably for the best, to prevent use of the CRC as an oracle during brute force attacks (although validation of internal structure can also rule out a large fraction of attempts, unless the true content has intentionally inconsistent structure) – Ben Voigt Dec 12 '18 at 04:06
  • @BenVoigt I agree, the short CRC actually helps with the security. The fact that the file names are not encrypted will help the attacker to validate the internal structure though. That being said, ZIP encryption is mostly a convenient method for confidentiality, not for security or integrity. – Selcuk Dec 12 '18 at 04:10
  • filenames will be somewhat helpful, but I was actually thinking of other information in the file catalog (particularly that offsets and sizes shouldn't extend beyond the length, etc) – Ben Voigt Dec 12 '18 at 04:11
  • you can zip twice to hide filenames, adding a password to the "outer zip". It's not just CRC methinks; if the pw is wrong, it will be "corrupt" or non-parse-able. – dandavis Dec 12 '18 at 17:36
  • @dandavis Define "corrupt". Any byte sequence is a perfectly valid file. And using zip twice makes it even more vulnerable as the attacker now knows that the inner file must start with the bytes `PK` as part of zip header. – Selcuk Dec 12 '18 at 22:49
  • corrupt as in; (eg.) deflate tries to unpack it and says "WTF, this ain't deflated!!!" – dandavis Dec 12 '18 at 22:51
  • @dandavis I see, but compression (deflate, implode, etc) is optional in zip format and the files can also be stored uncompressed. The CRC is a generic mechanism to detect password correctness (to some degree). – Selcuk Dec 12 '18 at 22:55
  • @Selcuk: isn't there a 1/255 chance that the CRC will be accidentally correct? – dandavis Dec 12 '18 at 22:55
  • @dandavis Yes, that's why I said "to some degree" and it is also mentioned in the first comment posted by Ben Voigt. The CRC is just a convenient method not to generate corrupt files. It does not aim security or integrity. – Selcuk Dec 12 '18 at 23:00
1

How does an encrypted ZIP detect when the user has given the correct password?

Obviously it does not connect with some backend service, and it doesn't contain the actual password to compare against. So how exactly does it check?

Short answer: most zip extraction programs probably don't check the password. They simply attempt to decrypt the data, and possibly check if it looks like real data. Selcuk's answer indicates some programs will probably use a CRC included in the file to get some level of confidence that the data decrypted correctly, but from the "should" and "can be used" in the text it sounds like this is optional (although encouraged) for applications using zip format.

Incidentally this also may allow (partial) data recovery in case of file corruption, as an application could simply ignore the CRC check and extract what it can from the encrypted zip, relying on the user to find the bad files (which could be indistinguishable from a wrong password depending on how the rest of the spec is written).

Ben
  • 3,896
  • 1
  • 10
  • 22
  • 1
    How can a program check if something "looks like real data" ? – CodyBugstein Dec 14 '18 at 16:36
  • Easiest is some checksum in the file. But not required. E.g.: MS Office documents use a zip format internally. I don't know enough about the MS Office format to know if they ever use the zip encryption features, but if they did, they *could* check that the resulting data after decrypting with a given password is validly formatted for an office document. Other files have similarly strict formatting you can check for. For plaintext format of any kind, for a suitably large file, if there are no unprintable or invalid characters (for the assumed encoding), it's a good chance it's real data. Etc. – Ben Dec 14 '18 at 22:16