2

I'm looking at some code that encrypts data in a database using AES. Before doing so, the encryption key is passed through a PBKDF2 function. Reading up on this, it appears this is for when the key space can be guessed, i.e. like someone's password that might use words out of an English dictionary. The encryption key however is purely random, something like:

M39UrEEveje3J#PB=jPG9+&eUSTJG*SAK&s_xHLRu$?Hrbg&7Vn5X^P298$W2z2#r6_!yfGQMQ@ArXjgefq-?9^b?y786ZL5cYcqE6#!c4@rE$scZxR$$e6cYPX$U-m7

In a predictable key space, you want there to be an increased workload for brute-forcing the encryption. Thus, hashing the password through so many rounds of PBKDF2. If the key space is completely uniform and non-predictable though, isn't this just a waste of CPU cycles? Is it good practice to do anyways?

Matt Molnar
  • 123
  • 3
  • Out of curiosity, are you planning on doing the encryption in the database itself, or in the application before it gets sent to the database? There are [issues](https://security.stackexchange.com/q/184464/151903) you should be aware of if using the database's native support for encryption. – AndrolGenhald Aug 10 '18 at 19:29
  • in theory it's not needed but in practice it doesn't hurt and guards against randomness flaws. compromise: go fewer rounds than a plain-text password, but don't leave it out. – dandavis Aug 10 '18 at 19:43
  • 2
    Can you clarify how that "random" string is being generated? I hope you're using SecureRandom (or `/dev/random` or equiv) to populate a byte array, and that string is some kind of pretty-print of the byte array. On the other hand, if you're actually generating ASCII strings and using them as keys, then your key space is probably a lot smaller than you realize and you should see AndrolGenhald's answer. – Mike Ounsworth Aug 10 '18 at 19:50
  • @AndrolGenhald doing it in the application and bypassing DB native encryption for reasons specified. – Matt Molnar Aug 11 '18 at 16:34
  • @MikeOunsworth this was just an example key I grabbed from an online tool. Very valid point however, unless I can guarantee that all 256 values of a byte are used for each "character", I need to run it through a hash. – Matt Molnar Aug 11 '18 at 16:35
  • @MattMolnar If you're using a String, then you're guaranteed not to use all 256 values because both ASCII and Unicode have unused "characters". Running it through a hash is will not increase your keyspace; say there are only two possible strings: `yes` and `no`, then there are only two possible keys: `hash(yes)` and `hash(no)`. The only correct way to generate AES keys is as byte arrays directly from a cryptographic random number generator (ex. `SecureRandom` in Java, `/dev/random` on a unix system, etc) – Mike Ounsworth Aug 11 '18 at 16:53
  • @MikeOunsworth I was wondering about that as I always thought a hash ended up as a string, so is PBKDF2 the wrong mechanism to use here? Or is it simply better to use a longer key to increase key space? – Matt Molnar Aug 14 '18 at 14:26
  • @MattMolnar Uhh, neither? You should not be using PBKDF2 here. You should only be working over byte arrays. There is no reason that generating an AES key should involve passwords or Strings. Correct way: make `new byte[32]` (32 bytes for AES-256, 16 bytes for AES-128), and fill it from `SecureRandom` (assuming you're in Java). That byte array *is* your AES key. Better yet, use the keygen function provided by your language / library: https://stackoverflow.com/questions/18228579/how-to-create-a-secure-random-aes-key-in-java – Mike Ounsworth Aug 14 '18 at 14:41

1 Answers1

5

If you have a 128 bit secret random value (or 192 or 256 bits for AES-192 or AES-256), then yes, you can use that as an AES key without running it through a key derivation function.

However, the example you provided does not contain random bytes, it contains (presumably) random characters from a certain character space with at least 56 characters and probably not more than 100. A truly random byte will have one of 256 values with equal probability, whereas a byte in this secret has one of 56-100(ish) values with (hopefully) equal probability.

It is also not 128, 192, or 256 bits. It is actually 128 bytes, which suggests to me that someone may have misunderstood what key size they needed, and generated it incorrectly.

If you try using the secret directly as a key, likely one of three things will happen:

  • The software already does its own key derivation (not necessarily a slow key derivation like PBKDF2, but some sort of key derivation). Since the key (hopefully) contains significantly more than 128 bits of entropy this is fine.
  • The software will warn you that the key isn't the correct length and fail.
  • The software will truncate your key to the correct length and use it. This can be very bad in some cases, but sadly it does happen.
AndrolGenhald
  • 15,506
  • 5
  • 45
  • 50
  • Are you sure that's not base64 or some other encoding of random bytes? – Mike Ounsworth Aug 10 '18 at 19:35
  • @MikeOunsworth It's definitely not standard base64, and it's not base64 with special characters swapped like base64url. I count at least 11 non alpha-numeric characters. I suppose it could be some weird encoding, but that's still not something you should use _directly_ as a key without decoding it (unless the encryption function actually handles keys with that encoding), and I worry that OP may not understand that. – AndrolGenhald Aug 10 '18 at 19:41
  • Fair. If the OP is generating random _strings_ and using that, then the PBKDF2 is probably there to convert the string to a suitably whitened byte array and your answer is appropriate. However if that is just an output representation of a number, then it may be fine as-is. Worth a clarifying question. – Mike Ounsworth Aug 10 '18 at 19:46
  • 1
    (minor edit to your answer: I see lowers, uppers, numbers, 11 symbols which gives a character space of at least 73 characters) – Mike Ounsworth Aug 10 '18 at 19:55
  • @MikeOunsworth you're probably right that the keyspace is larger, I ran it through `echo '...' | fold -w 1 | sort | uniq -c | sort -n` and came up with 56 characters and some appearing 4, 5, or 6 times. Not being a statistician I jumped to conclusions that can't be supported with the small sample size. – AndrolGenhald Aug 10 '18 at 20:16