62

If I encrypt some data with a randomly generated Key and Initialization Vector, then store all three pieces of information in the same table row; is it necessary to encrypt the IV as well as the Key?

Simplified table structure:

  • Encrypted data
  • Key (encrypted using a second method)
  • IV (encrypted?)

Please assume that the architecture and method are necessary: The explanation behind it is lengthy and dull.

Stu Pegg
  • 723
  • 1
  • 5
  • 6
  • 5
    Is there a reason why you store the IV separately at all? You can just concatenate the IV with the data, encrypt, and store the whole blob. Later, when decrypting and using the data, you simply ignore the first _length_of_iv_ bytes. I could see how it would make sense to store them separately if you wanted to add an index on the data to be able to search for it, but encrypted data is (hopefully!) unreadable and pseudo-random, so that's pretty useless. – Damon Mar 03 '14 at 13:28
  • @Damon: Encrypting and decrypting takes processing time. In the end there was only a single key, so there would be more than one IV per key (as originally intended by the Key/IV combination). – Stu Pegg Mar 03 '14 at 14:17
  • 4
    But that is not how an IV works. It's fine to have a single key, or many keys, this makes no difference. It is the IV that must be different (and strong random / unpredictable if you have any means) for every message. That ensures that a) two identical messages encode different, i.e. you cannot tell that two messages are the same and b) it is very hard to derive information from the plaintext or run known-plaintext attacks at all, since the _beginning_ of the plaintext (the IV) is random garbage. You can't analyze an awful lot because one unknown random garbage looks like the other. – Damon Mar 03 '14 at 14:29
  • For that, it is necessary to initialize the cipher "in some way" with the IV. The easiest way is to simply concatenate the IV with the actual data. You don't need to remember it otherwise, because it's not good for anything! Once encrypted, it is all the same random garbage, and once decrypted, you simply ignore the IV, since you know that it isn't good for anything. Storing the IV _might_ actually make a known-plaintext attack feasible. But as long as it's an unknown random sequence, all the attacker really knows is "random garbage in, random garbage out". – Damon Mar 03 '14 at 14:31
  • So the short version is that you think the IV should be hidden somewhere. This disagrees with the below answer. Please feel free to add another answer so that it can be voted upon. I'm afraid I lack the time or willingness to debate at great length on a question I asked 2 workplaces ago. – Stu Pegg Mar 03 '14 at 15:14
  • 2
    @Damon How do you want to retrieve the IV from encrypted data? To decrypt the data in the first place you already need the IV to initialize a decoding cipher... or am I missing something. – robert Aug 22 '15 at 16:49
  • @robert: The IV is usually transmitted with the message, so there is no need to "retrieve" it. You just decode the message ignoring the fact that there is an IV, and then throw away the first N bytes. The IV doesn't have a purpose per se, it's just there so the first block of a message is guaranteed to be different/unique even if the plaintext is the same (imagine e.g. email headers) and no such thing as known plaintext can happen (since even if it's something like a standard header, at least part of the first block is random and unknown). – Damon Aug 22 '15 at 21:24
  • 2
    @Damon but if the IV is inside the encoded message it cannot be used to decode. And the IV is needed to decode, isn't it? So the IV has to be kept outside the encryption but it may be prepended for convenience, say into the same byte[]. Do we mean the same thing? – robert Aug 23 '15 at 01:08
  • 2
    @robert: You only need to know the key. Instead of `Message`, you encrypt e.g. `lkjoiukqMessage`, or `ylmqtclrMessage` on another day. Even though `Message` is the same on both days, the random-looking output of the cipher will be different, there is no way of knowing it's the same, nor is there a way of guessing the key from the fact that the input to the cipher is the known plaintext `Message` (because _it isn't_). When you decrypt , you get back `xxxxxxxxMessage` where `xxxxxxxx` is something that you simply ignore, knowing that it's just meaningless random. – Damon Aug 23 '15 at 10:33
  • The main reasons why you want this is that **a)** all chained identical blocks have the same identical ciphertext up to the first difference. This is bad, so you want the first block to always be different for any two pairs of messages (well, ideally... it's enough if it's different for any two messages with a common prefix). And **b)** it's not possible for an attacker to build a dictionary or (without knowing the IV, and usually you _don't_ know it) to perform analysis based on known plaintext, since the random stuff isn't known, and thus the plaintext as a whole in that block is unknown. – Damon Aug 23 '15 at 10:40
  • 2
    @Damon Thank you for the explanation, I really appreciate it. I believe to understand the value of CBC/IV (better and better). But my point is more mundane: When I AES/CBC a message with a specific IV, then I will need the same IV to decrypt the message. However, if the only place to keep/store/find the IV again is inside the encrypted message (which is what you suggest as I understand) ...then I would have locked myself out, wouldn't I? – robert Aug 23 '15 at 11:03

3 Answers3

72

Update, 2022:

This answer is now 10 years old. Its advice is correct, but you should not use CBC mode in new designs today. Instead use an AEAD such as ChaCha20-Poly1305 or AES-GCM, and put the IV in the associated data so that it is authenticated.

While CBC is not strictly broken, it's really easy to shoot yourself in the foot while trying to use it in a real-world implementation, and improper implementation can easily lead to a complete break of your design. Real world AES-CBC implementations (even those written by experienced security-conscious developers) frequently fall victim to padding oracle attacks and other side-channel issues. Using AES-CBC securely requires significantly more cryptographic engineering work than just using an AEAD. The less cryptographic engineering work you have to do, the less likely it is that you'll introduce a vulnerability.

If you just want an easy life, libsodium's secretbox API will take care of the cryptographic decision-making and implementation details for you. You provide a message, a nonce (IV), and a key, and it'll encrypt/decrypt and authenticate the data securely. It also has APIs for securely generating keys & IVs. There are libsodium bindings for most programming languages, so you're not limited to just C/C++. I would highly recommend libsodium to anyone building production systems.

Original answer below.


From Wikipedia:

An initialization vector has different security requirements than a key, so the IV usually does not need to be secret. However, in most cases, it is important that an initialization vector is never reused under the same key. For CBC and CFB, reusing an IV leaks some information about the first block of plaintext, and about any common prefix shared by the two messages.

You don't need to keep the IV secret, but it must be random and unique.

The IV should also be protected against modification. If you authenticate the ciphertext (e.g. with a HMAC) but fail to authenticate the IV, an attacker can abuse the malleability of CBC to arbitrarily modify the first block of plaintext. The attack is trivial: xor the IV with some value (known as a "tweak"), and the first block of plaintext will be xor'd with that same value during decryption.

Polynomial
  • 133,763
  • 43
  • 302
  • 380
  • 1
    If you had to keep the IV secret, it would be part of the key. The "key" (unless qualified as a "public key") is, by definition, whatever you have to keep secret. – David Schwartz Jul 11 '12 at 05:48
  • @DavidSchwartz Not always: there are nonces that are not called keys (but they wouldn't be called IV either) and that must be kept secret. The *k* parameter in DSA, for example. – Gilles 'SO- stop being evil' Jul 11 '12 at 09:21
  • @DavidSchwartz: The point of the IV is that it is unique per encryption with a key. Having the IV in the key would negate its purpose. – Stu Pegg Jul 11 '12 at 10:41
  • 2
    @StuartPegg, actually, for CBC mode, it needs to be not only unique but also truly random. (A counter would not be a good choice of IV, for CBC mode.) – D.W. Nov 01 '12 at 21:11
  • This answer entirely skips over the problem of what happens if the IV is *predictable (to the attacker)* under some attacks. It also doesn't explain that other modes may not require a random IV and even fare worse under a random IV due to the birthday problem. – Maarten Bodewes Aug 30 '21 at 07:56
  • You say: "`...it must be random and unique`" The question here is when decrypting AES, am I forced to use the same IV with the decryption key? If the answer is no (depending on your statement above) so why when I use a different IV some decrypted text contain some invalid characters at the very beginning of the file? – Ahmed Suror Feb 14 '22 at 13:17
  • @AhmedSuror In CBC mode encryption, each block of plaintext is xor'd with the previous block of ciphertext before it is encrypted by the block cipher. The first block has no previous block of ciphertext, so it is xor'd with the IV instead. During CBC decryption, each ciphertext block is decrypted by the block cipher, and the result of that is then xor'd with the previous ciphertext block to recover the plaintext. In the case of the first block, there is no "previous" ciphertext block, so the IV is used. That's why only the first block is broken when you decrypt with the wrong IV. – Polynomial Feb 16 '22 at 20:18
  • 1
    @AhmedSuror The CBC diagrams on the Wikipedia page for block cipher modes are really helpful in understanding this behaviour: https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Cipher_block_chaining_(CBC) – Polynomial Feb 16 '22 at 20:18
12

Although in your case the IV should be okay in plaintext in the DB, there is a severe vulnerability if you allow the user to control the IV.

The IV in decryption is used (and only used) to XOR the first block into the final plaintext - so if an attacker can control the IV they can arbitrarily control the first block of data, and the rest of the plaintext will survive without modification.

enter image description here

If the attacker knows the original plaintext of the first block, then the problem is magnified again as the attacker can choose arbitrary data for the first block without trial and error.

This is particularly important in the case where encrypted data is being transmitted through untrusted channels with the IV, maybe into a browser or an app etc.

George Powell
  • 1,528
  • 1
  • 12
  • 14
  • wait, what? you're saying that if the attacker knows the IV, he can decrypt first block of the cyphertext? – nicks Nov 03 '16 at 08:16
  • 1
    Notice how the first block has to go through the key first before it is XORd with the IV. So... Regarding second paragraph - if the attacker changes the IV, the first block comes out garbled. Regarding the third paragraph - if the attacker already knows the plaintext of the first block (say it's a common header text), and also has the IV, he has a better shot at figuring out the key if he can reverse the decryption process, giving him the ability to decrypt the other blocks. – Bondolin Nov 08 '16 at 21:04
  • 1
    MAC (which you should do) mitigates this kind of attack. – rustyx Apr 23 '18 at 06:19
  • @Bondolin The part about figuring out the key is incorrect; AES is secure under known plaintext / ciphertext, no change in the use of IV or mode of operation can change that. – Maarten Bodewes Aug 30 '21 at 07:58
2

If I encrypt some data with a randomly generated Key and Initialization Vector, then store all three pieces of information in the same table row; is it necessary to encrypt the IV as well as the Key?

No, only the key may need to be stored encrypted; it is not necessary to encrypt a random IV.

Obviously the (secret) key needs to be kept secret. If you do that using encryption (i.e. shifting the burden to another key or key pair) or any other method is up to the architect. However, if you are going to store it next to the database then a good key wrapping method such as AES-WRAP, AES-SIV is a logical choice as part of securing your key at rest. Other security mechanisms such as access control / protection against side channel attacks need to be in place when using the key.

The IV needs to be unpredictable to an adversary for CBC mode. If that's not the case then CBC fails under an active attack using a ciphertext oracle. In general you should assume that such an oracle exists. For instance, you should assume that the attacker can send any messages to your DB, which you then encrypt and store in the database for the attacker to read back. Having a random IV generated by a well seeded cryptographically secure (pseudo-) random number generator (CSPRNG) - such as /dev/urandom - avoids this kind of problem.

Encrypting the IV is dangerous practice as the IV gets XOR'ed with the plaintext. If you do so you should encrypt it with a different key, otherwise you may also harm the security of CBC mode. If you have to use your current key then you should use the resulting ciphertext as IV instead of the plaintext input. However, for a random IV that's completely unnecessary: you can just store it next to the ciphertext without protection.


Supplemental: message integrity & authenticity

In general it is always a good idea to at least verify integrity and authenticity of the stored messages before using it. Using a mode of operation such as GCM or EAX should be preferred.

Note that neither GCM nor EAX need an unpredictable IV; the IV just needs to be unique - there are no other requirements apart from size; using a fully random IV/nonce of 128 bit is of course fine as well. They automatically will include the IV into the calculation of the authentication tag.

Using a HMAC over the ciphertext (also known as encrypt-then-MAC) has been proven very successful to provide integrity / authenticity as well but please note that you should include the IV in the calculation* if you decide to go with HMAC.

Maarten Bodewes
  • 4,602
  • 15
  • 29