2

I have founds lots of DON'Ts with Google but so far, no simple resource that would explain to someone like me who knows that he is not smart enough get this right without help how to use correctly a modern symmetric block cipher.

I plan to use this to exchange a bunch of control messages over a connection-less datagram-oriented transport protocol. I plan to distribute some initial secrets over SSH-tunneled http connections and then the remaining secrets will be spread over the network through the already-secure channels. It is thus important to me to try to keep the secret as small as possible in size to fit as many of them as possible within a single datagram.

To summarize, what cipher should I pick, how should I setup the IVs, the MACs, etc. Alternatively, a hint on resources that could give me enough background to make these decisions myself would be fine but I am still struggling to find something that it somewhat more practical than crypto textbooks.

Polynomial
  • 133,763
  • 43
  • 302
  • 380
mathieu
  • 125
  • 5

2 Answers2

5

This should probably be on the crypto Stack Exchange instead of here.

Regardless, the best course of action is to not touch the symmetric block cipher yourself. There are mature, cryptographer-audited security libraries like NaCl and KeyCzar that will make the correct decisions for you.

That said, if you do choose to go it your own, a generally safe choice is AES-128 in either EAX mode or GCM mode. Both of these modes provide authentication of your ciphertext in addition to cryptographic security, which is crucial for avoiding things like padding oracle attacks. In order to use either of these modes, you need a cryptographic key, an initialization vector, a plaintext, and, optionally, additional authentication data.

The key must be 128 bits long and generated with a cryptographically secure random number generator. If you want a password-protected key, take the randomly-generated key and pass it through PBKDF2-HMAC-SHA-256 with the password, the key in place of the salt, a number of rounds calibrated to take as long as you're comfortable with (10_000 rounds on my laptop requires 0.5s, which is a reasonable amount of time), and a 128-bit output size. The output will be the "real" key to use, and you can re-run this with the original key and password any time you need to access the real key.

The initialization vector for EAX and GCM modes (note: this may not be the case for other modes, like CBC, which has even stricter IV requirements) must be unique across all encryptions with a given key. A simple counter is considered to be sufficient. However, you need to make sure this counter guarantees uniqueness even when run across multiple invocations of the process, multiple simultaneous processes, and multiple machines. A properly-implemented version 1 UUID should suffice for these purposes. This isn't in the format encouraged by RFC 5116, section 3.2, however, and someone more knowledgeable than me will have to clarify whether or not that format is important to abide by. Alternatively, a securely-generated random number of appropriate size should be sufficient, but I haven't seen people do this in practice with CTR-derived modes (which GCM and EAX are based off of). Again, someone more knowledgable will have to comment on whether or not this is a good idea.

The plaintext can be anything you please, but do not fall into the trap of thinking the input of AES is "characters", "ASCII", "Unicode", or any such nonsense. The appropriate input is an array of bytes.

The optional associated authentication data may also be anything you like. This is data that is incorporated into the ciphertext in order to verify authenticity when a decryption is performed. But it is not part of the plaintext protected inside of the ciphertext. For example, if you're encrypting data on behalf of users stored in a database, you could use the user's user_id in the database. In the event that one user found a way to copy another user's encrypted data into his account, he wouldn't be able to get your application to decrypt it, since his user_id wouldn't match the one used to encrypt the data. I may have explained this poorly, so please let me know if it was confusing or unclear.

For storage, the key must be kept secret. Key management and lifetime is beyond the scope of my answer, but it is a crucially important part of ensuring the security of your protected data. The initialization vector, ciphertext, and authentication data have no requirement on their secrecy, but the authentication data should ideally be something provided to the cryptography layer from an external source (e.g., the user_id mentioned earlier). When used, it should not be something a potential attacker can supply to you or otherwise exercise control over. Storing, transmitting, and copying the authentication data along with the other values defeats the purpose. One way of considering the authentication data is that it should provide "context" for the protected data.

That should do it. Keep in mind that this is complicated, and excruciatingly difficult to do correctly without leaking protected information. I don't recommend you go it on your own, but if you do, the above should be a relatively safe point to start from. At the very least, you should be better off than 95% of the websites out there that try to implement cryptography themselves. Assuming, of course, that I haven't completely failed at describing a secure implementation approach.

Stephen Touset
  • 5,774
  • 1
  • 23
  • 38
2

The best way to use a block cipher is -- don't. Instead, use a higher level of abstraction.

Follow the advice at Don't roll your own crypto. For data in motion, use TLS. For data at rest, use GPG. If you can't do that, use a high-level crypto library, like cryptlib, GPGME, Keyczar, or NaCL.

D.W.
  • 98,860
  • 33
  • 271
  • 588