23

According to Wikipedia, the initialization vector (IV) does not have to be secret, when using the CBC mode of operation. Here is the schema of CBC encryption (also from Wikipedia):

Enter image description here

What if I encrypt a plaintext file, where the first block has a known, standardized structure, such as a header?

Let's imagine the following scenario:

I encrypt file.pnm using AES-CBC. The pnm file has a known header structure, such as:

P6
1200 800
255

Moreover, the dimensions (1200 x 800) and color mode (P6) can be guessed from the encrypted file size.

If both IV and the first block of plain text are known, doesn't this compromise the whole CBC chain?

Peter Mortensen
  • 885
  • 5
  • 10
Martin Vegter
  • 1,947
  • 4
  • 28
  • 39
  • It's a problem if the adversary knows the IV before you encrypt the data. But once the data has been encrypted, there is no need to keep the IV secret. – kasperd May 04 '16 at 17:32
  • 2
    This questions seems to be purely about cryptographic theory (as opposed to *using* cryptography to implement a security system, which would be on topic here). While it already has received some good answers here (in part due to the considerable community overlap betweer crypto and security), perhaps it would be better migrated to [crypto.se]. – Ilmari Karonen May 04 '16 at 18:52
  • 3
    The easiest way to see why the IV does not need to be secret is to compare the first block to the second block. The ciphertext clearly does not need to be secret. And the IV serves as the ciphertext input for the first block. If the first block were not secure with a known IV, the second block would not be secure with known ciphertext, and ciphertext is known. So if the second block is secure, the IV need not be kept secret. – David Schwartz May 04 '16 at 21:54
  • @David Schwartz - how can AES-CBC be used in hard disk encryption (LUKS), when the cyphertext (ie disk sectors) is not static? The IV keeps changing every time data is written to disk. – Martin Vegter May 05 '16 at 10:14
  • @David Schwartz - I realize this might be off topic, so I have created a new [question](http://security.stackexchange.com/questions/122377/how-can-aes-cbc-be-used-for-luks-encryption-when-the-ciphertext-is-not-static) – Martin Vegter May 05 '16 at 10:21

6 Answers6

34

I think it's easier to split this into its component parts, and consider them as separate entities: AES and CBC.

AES itself does not "basically consist of XORing together chunks of the block" - it's a much more complicated affair. Ignoring the internals of it for a moment, AES is considered secure in that without knowing the key, it's practically impossible to recover the plaintext or any information about the plaintext given only an encrypted block, or even in situations where you're given parts of the plaintext and you need to find the remainder. Without the key, AES might as well be a one-way function (and there are MAC schemes which rely upon this!). Discussing the technicalities around the security of AES and similar block ciphers is extremely involved and not something I can cover in an answer, but suffice to say that thousands of cryptographers have been looking at it for almost two decades and nobody has found anything remotely practical in terms of an attack.

The diagram you posted above describes CBC. Block ciphers, such as AES, aim to be secure for encrypting one block with a secret key. The problem is that we rarely want to just encrypt one block, but rather a data stream of indeterminate length. This is where block modes, like CBC, come into play.

Block modes aim to make ciphers secure for encrypting multiple blocks with the same key. The most simple block mode is ECB, which offers zero security in this regard. ECB involves independently encrypting each block with the same key, without any data fed between blocks. This leaks information in two ways: first, if you have two identical plaintext blocks, you'll get two identical ciphertext blocks if you use the same key; second, you'll get two identical ciphertext streams for two encryptions of the same message with the same key. This is a problem as it leaks information about the plaintext.

CBC solves this problem by introducing a "cascading" effect. Each plaintext block is xor'ed with the previous ciphertext block, resulting in originally equal plaintext blocks no longer being equal at the encryption step, thus no longer producing equal ciphertext blocks. For the first plaintext block, there is no previous ciphertext block (you haven't encrypted anything yet), and this is where the IV comes in. Consider, for a moment, what would happen if instead of an IV we just used zeroes for the -1th block (i..e the imaginary ciphertext block "before" the first plaintext block). While the cascade effect would make equal plaintext blocks produce different ciphertext blocks, the same entire message would cascade the same way each time, resulting in an identical ciphertext when the same full message is encrypted multiple times with the same key. The IV solves this. By picking a unique IV, no two ciphertexts are ever the same, regardless of whether the plaintext message being encrypted is the same or different each time.

This should, hopefully, help you understand why the IV doesn't need to be secret. Knowing the IV doesn't get an attacker anywhere, because the IV is only there to ensure non-equality of ciphertexts. The secret key is what protects the actual data.

To emphasise this even further, you don't even need the IV to decrypt all but the very first block. The decryption process for CBC works in reverse: decrypt a block using the secret key, then xor the result with the previous ciphertext block. For all but the very first block, you know the previous ciphertext block (you've got the ciphertext) so decryption is just a case of knowing the key. The only case where you need the IV for decryption is the very first encrypted block, where the previous ciphertext block is imaginary and replaced with the IV.

Polynomial
  • 133,763
  • 43
  • 302
  • 380
  • A weakness with CBC is that if the two messages happen to have yield the same first block of ciphertext, the xor of the corresponding blocks in the two messages will be the xor of the initialization vectors. On the other hand, if two messages happen to have the same block of ciphertext in any other location, the xor of the corresponding blocks in the two messages will be the xor of the preceding blocks in the ciphertext (to which an adversary would presumably also have access). – supercat May 04 '16 at 19:00
  • 2
    @supercat This is true. I felt it a bit much to start delving into weaknesses of CBC, padding, etc. and instead focused on the core issues and intents around OP's question. If I went into detail on all of it it'd be an incredibly long answer. – Polynomial May 04 '16 at 19:11
  • @supercat Hmm while I see your point I fail to see the issue. The attacker listening in to the conversation would have access to the IV's using normal CBC as well since they are sent along in plaintext. So there is no real difference between the two situations. Also you end up getting the same ciphertext blocks, very rarely assuming that IV's are chosen randomly (P_1\xor IV_1=P_2\xor IV_2 should be VERY rare sqrt{2^n} by bday p.) and since any decent cipher should be behaving mostly as a random permutation so again sqrt{2^n} by bday p. for following cipher text blocks. – DRF May 05 '16 at 07:13
  • @supercat Ahh I see your point actually though I'm not seeing it in what you wrote. Assuming you get the same ciphertext block at the beginning you can retrieve the xor of the plaintext blocks. This though also happens if you get he same ciphertext block anywhere in the middle (C_1\xorP_1=C_2\xorP_2 gives C_1\xor C_2=P_1\xor P_2) and you know C_1 and C_2. The argument about the rarity of such result still applies though. – DRF May 05 '16 at 07:19
  • @DRF: A good cryptosystem is supposed to avoid revealing anything about the relative composition of two ciphertexts. Further, it's not implausible that if a system uses no care to ensure that IVs never vary in a systematic way with regard to the text being encrypted, the likelihood of files starting with identical ciphertexts could be much greater than chance. – supercat May 05 '16 at 14:08
  • @DRF: I've sometimes thought there would be value in having a cryptosystem which included the ability to add an xor stage after running about a third of the rounds. Adding a full AES block-crypt cycle on each horizontal arrow in the diagram above would eliminate the aforementioned vulnerabilities, but double the processing time. Performing the xor in the middle of an AES cycle would seem like it should minimize the extra time, and I can't see how it would add any new vulnerabilities, but since I'm not an expert I could be missing something. – supercat May 05 '16 at 14:41
  • @supercat The problem with that is that any state recovery through that mode would allow you to attack a reduced-round version of AES, which is much more practical. – Polynomial May 05 '16 at 18:56
  • @Polynomial: Fair point. On the other hand, it still seems that running two full rounds of AES would be overkill. Perhaps what would be best would be to add a non-keyed permutation function? Even if it was a relatively crummy one, I would think that would greatly reduce the usefulness of information that could be gleaned by noticing that corresponding portions of two files contain the same ciphertext. – supercat May 05 '16 at 19:05
  • @supercat A better option is to use a mode such as GCM or EAX that doesn't suffer from this issue and many other common problems with CBC. – Polynomial May 05 '16 at 19:08
11

No, because the key is secret.

The "block cipher encryption" block in the diagram scrambles the data depending on the key. The XOR in the diagram does not provide the security, the encryption does. The XOR and the IV are just to make sure the same plaintext encrypts as different ciphertext for each block.

Sjoerd
  • 28,897
  • 12
  • 76
  • 102
  • if you know the plaintext, iv, and the encryption algorithm used, can't you recover the key ? I mean, what happens inside the "block cipher encryption" box is perfectly reversible operation, not some one-way hash function. – Martin Vegter May 04 '16 at 10:39
  • 4
    @MartinVegter: you need the key to make it reversible. It's not something as easy as `ciphertext = plaintext * key + IV` where you can easily retrieve any of the parameters if you know the others. – Yuriko May 04 '16 at 10:53
  • @Yuriko - do you know this for fact, or are you just speculating? Do you have any link to substantiate your claims ? From what I could understand by (briefly) looking at the Wikipedia description, `AES` basically consists of XORing together chunks of the block, repeatedly. XOR operation is commutative and associative, which (again, with my basic understanding) makes it reversible. AES encryption is no one-way-hash function. – Martin Vegter May 04 '16 at 11:03
  • 2
    @MartinVegter [AES is one-way](http://stackoverflow.com/questions/810533/is-it-possible-to-reverse-engineer-aes256) if you don't have the key. – Sjoerd May 04 '16 at 11:16
  • 3
    @MartinVegter: it's a fact: you can't *easily* reverse AES-CBC without the key. (In terms of computations) You should ask on [Crypto.SE](https://crypto.stackexchange.com) if you want more information. It doesn't *just* XOR, it also *scrambles/substitutes* the bytes. – Yuriko May 04 '16 at 12:37
  • @Sjoerd One-way would imply that you can compute it in one direction. But without the key you can neither encrypt nor decrypt. – kasperd May 04 '16 at 17:27
5

All modern encryption methods (AES, blowfish etc.) are designed to be much more secure than you seem to expect. Let us quickly look at some attacks which such ciphers are designed to be resistant against.

Known plain text attack - In this case we assume the attacker has access to many plain text blocks along with corresponding cipher text blocks encrypted under a given key K. His goal is to find K.

Chosen plain text attack - In this case we assume the attacker gets to choose many plaintext blocks and gets the corresponding cipher text blocks encrypted under a given key K. His goal is to find K.

Chosen cipher text attack - In this case we assume the attacker gets to choose many cipher text blocks for which he gets the corresponding plain text blocks decrypted under a given key K. His goal is to find K.

Chosen adaptive cipher text attack - In this case we assume the attacker gets to choose many cipher text blocks for which he gets the corresponding plain text blocks decrypted under a given key K and then he gets to repeat this process with his new knowledge. His goal is to find K.

If an attack is found in any of these scenarios which manages to get K for a modern day cipher, such a cipher is declared broken and will not be used further. Actually the requirements are much stricter. If an attack is found that can show that you can get even partial information about K say it's parity that's already a huge deal. Or if you can show that you can't really find the key K but can do significantly better than brute force (even 2^110 vs 2^126 would be usually considered worth publishing) for searching for it the cipher would usually be declared broken.

So in conclusion no, you can't get the key of a cipher just because you know a block of plaintext and it's corresponding cipher text.

DRF
  • 384
  • 3
  • 7
3

The IV has the same security requirements as the encrypted blocks.

For CBC to work, you need to XOR the unencrypted data in the current block with the encrypted data from the previous block. Because there is no block before the first block (so no encrypted block can be obtained) an IV is used instead.

schroeder
  • 125,553
  • 55
  • 289
  • 326
Trisped
  • 131
  • 3
1

Let's tell a parable! Here are the participants:

  1. Alice, the sender
  2. Bob, the recipient
  3. Eve, the eavesdropper
  4. Jake, Eve's hopelessly unhelpful assistant

Alice sends an encrypted message to Bob. Eve intercepts the ciphertext, and is trying to decode it. In order to "help" her, Jake tosses a fair coin 128 times, writes down his results, encodes them as a hexadecimal numeral, and gives it to Eve: 1eff4bb16388e2ee263eb5a8a2bf56b1. Eve is puzzled at this, but Jake insists that these random coin tosses's results will help her decipher Alice's message.

Wouldn't you think that Jake is completely nuts? The results of Jake's 128 coin flips have nothing to do with the plaintext of the message that Alice sent Bob, or the key that she used to encrypt it. So Eve can't possibly learn anything about the message's plaintext from Jake's coin flips!


And that's why the IV for CBC mode doesn't need to be secret. CBC requires a random IV for each encryption. The IVs are therefore completely uncorrelated to the plaintexts or the keys; so knowledge of the IV reveals no information about the plaintext.

Luis Casillas
  • 10,361
  • 2
  • 28
  • 42
-4

The Initialization vector used is a random number also called nonce which when combined with a secret key makes the original data completely unreadable. The data when first XOR with plaintext data, it randomizes it. Additional secret key encryption will make it even more harder to read. Hence IV essentially need not be secret since the encryption with a secret key provides the required secrecy. Also the data inside the encrypted file cannot be guessed in AES-CBC as it goes into many rounds of encryption.

  • 1
    An IV and a nonce are semantically different. An IV implies a unique, *unpredictable* value. A nonce merely implies a unique value. This is an important distinction, as predictability of the IV destroys the security of CBC mode. – Xander May 04 '16 at 20:26