2

I'm considering encrypting a series of TAR archives using AES. Something I'm concerned about is the TAR format being quite predictable regarding field content and block sizes:

Each file object includes any file data, and is preceded by a 512-byte header record. The file data is written unaltered except that its length is rounded up to a multiple of 512 bytes. The original tar implementation did not care about the contents of the padding bytes, and left the buffer data unaltered, but most modern tar implementations fill the extra space with zeros.

Should an attacker get hold of a big tarball with hundreds of files, is the AES encryption compromised?

Andreas
  • 135
  • 5

2 Answers2

5

First, AES is not something that you want to use itself. AES is a pseudorandom permutation family, which is roughly academic cryptography jargon for ‘here be dragons—do not enter unless you are a wizard who can harness them’.

You want to use an authenticated cipher like AES-GCM or NaCl crypto_secretbox_xsalsa20poly1305, in which the dragons—along with other useful things—have already been harnessed into a useful security contract: if you pick a key uniformly at random and assign to each message a unique message number, for up to a gigabyte of data per message and about a terabyte of data total under a single key, then AES-GCM prevents an adversary who can intercept messages in transit from (a) learning what's in them in any more detail than their length, and (b) forging them.

(NaCl crypto_secretbox_xsalsa20poly1305 has a slightly better security contract—it's safer for much larger volumes of data, you can safely pick the message numbers uniformly at random, and it's designed to invite resistance to side channel attacks in software implementations, unlike AES-GCM.)

This security contract holds even if the adversary can choose the patterns of data in the message. The modern standard for secrecy of a cipher (IND-CPA, or indistinguishability under chosen-plaintext attack, which authenticated encryption implies) requires that the adversary be unable to find any pair of messages whose ciphertexts they can tell apart with more than negligible probability, even if given arbitrarily many other plaintext/ciphertext pairs of their choice.

Squeamish Ossifrage
  • 2,646
  • 9
  • 17
  • As I understand it, authenticated encryption refuses to decrypt invalid data (data which hasn't been encrypted with the same key), to prevent attackers from decrypting carefully constructed data and analysing the result to help figure out the key. That seems easy enough to understand without having to dumb it down to dragons. Although is the sales pitch about authenticated encryption necessary? I assume I'd be hearing a lot more about it if it were either (a) really important for the average person or (b) not already used basically everywhere. – NotThatGuy Nov 09 '19 at 22:23
  • Related: [How secure is AES-256?](https://crypto.stackexchange.com/questions/2251/how-secure-is-aes-256) – NotThatGuy Nov 09 '19 at 22:30
  • @NotThatGuy Authenticated encryption doesn't just keep the message secret; it also detects attempted forgeries and prevents you from acting on malicious adversary-controlled messages—which might have led you to act in a way that leaks secrets anyway. It took a long time for major protocols like TLS and SSH to catch up, but these days they use authenticated encryption. If you are reaching for the letters A-E-S yourself and not for an authenticated cipher, you are probably about to shoot yourself in the foot [like OpenPGP did](https://crypto.stackexchange.com/a/59241). – Squeamish Ossifrage Nov 09 '19 at 22:49
  • @NotThatGuy I cannot overstate enough how critical it is for application engineers to reach for an authenticated cipher as a unit—like AES-GCM or NaCl crypto_secretbox_xsalsa20poly1305—rather than trying to cobble something together out of using AES directly or juggling the acronym soup of ‘block cipher modes of operation’, a conceptual mistake that is a relic of the dark ages of cryptography engineering from decades past. – Squeamish Ossifrage Nov 09 '19 at 22:53
  • @NotThatGuy The question you linked is about a primitive component that you shouldn't be touching directly, like talking about a transistor in a question of programming language design. Are transistors and their reliability important to computing? Sure, but a discussion of physics of transistors is not really helpful to a web developer trying to make their application reliable on a MySQL database back end—much more important is higher-level ideas like ACID and databases that actually provide them like PGSQL. – Squeamish Ossifrage Nov 09 '19 at 22:58
  • "The question you linked is about a primitive component that you shouldn't be touching directly" - that question doesn't seem all that different from this question in that regard (notwithstanding this *answer* focusing on the primitive component). This question is about whether encrypting tar compromises AES, which this is, at best, a *very* indirect and tangential answer to (where is the adversary when encrypting something *yourself*?) and doesn't do anything to convince me that authenticated encryption would make any difference with tar more than it would make when encrypting anything else. – NotThatGuy Nov 09 '19 at 23:13
  • Anyone who tries to cobble together an authenticated cipher by themselves will probably end up with something not much more secure than someone trying to "cobble something together out of using AES directly" - one way or the other, the better option is just to use an well-regarded established encryption implementation. A useful note, definitely, but that doesn't address what the question is asking. – NotThatGuy Nov 09 '19 at 23:16
  • @NotThatGuy The question was whether the pattern of data in a tar file enables an adversary to break the cryptography. I answered that, _if_ you are using an authenticated cipher, then no, there's _no_ pattern—not even an adversary-chosen pattern—that gives them any such advantage. (All bets are off if you jury-rig something out of AES that isn't actually an authenticated cipher, of course!) I don't know where you get the idea of cobbling together an authenticated cipher all by oneself—I specifically named two off-the-shelf authenticated ciphers, one of which even has the letters A-E-S in it. – Squeamish Ossifrage Nov 09 '19 at 23:41
2

This is known as Known-Plaintext Attack (KPA) and secure ciphers designed to resist this attack. AES not proved but is conjectured to be secure against KPA.

There is one problem here, that is multi-target attacks against AES, actually for any block cipher. If you use a different key for each archive, some keys can be found faster than brute-force. The expected cost of breaking one from t AES-128 target is (2^128)/t. For a billion targets, the cost would be below 2^100 and the time would be below 2^70.

If you consider, the collaborative work of bitcoin miners reached ≈2^92 SHA-256 hashes per year in 06 Agust 2019 this could be a serious threat for multiple targets.

If you must use AES than use AES-256.

kelalaka
  • 5,474
  • 4
  • 24
  • 47