28

I have a web application that has the following use case:

  • User creates account with username and password -- hashed password is stored in database.
  • User logs in (persists across sessions) -- login token is stored in cookie.
  • User inputs and submits text data -- data is stored in database, but is sensitive and shouldn't be exposed even if database is compromised.
  • Only the (logged in) user, and no one else, can read the submitted data.
  • For convenience, user should not need to enter any passphrase to encrypt/decrypt the data.

Is this feasible? How should the data be encrypted?

Code
  • 393
  • 1
  • 4
  • 6

2 Answers2

29

You can use a key derivation function to convert the user's password into an encryption key. Then you would use a cryptographically secure pseudorandom number generator to generate a separate key that would encrypt the user's data. You would then use the derived key to encrypt the generated key. The resulting ciphertext of the data encryption key could then be stored safely in the user table of your database (call the field "encryptedkey" if you like). In this way, the user's password will become the means to decrypt the user's encrypted key. The key that actually encrypts the data is only decrypted long enough to decrypt the data that it encrypted. You'll need to store that key in the session in order to avoid the need to ask the user for his password on each decryption occurance.

Alternatively, you can store the key encryption key on a Key Management Service such as that offered by Amazon AWS. This way you would retrieve the key from Amazon over TLS using only a reference to the key. Of course in this case you will still need to store the authentication credentials for the KMS somewhere in your architecture, possibly in a remotely retrieved highly secured config file.

  1. Random Number Generator ⟶ Helps create Key #1.
    This key encrypts your data. It stays constant over time. You must generate this key when the user first registers. Use a CSPRNG (cryptographically secure pseudo-random number generator) to ensure sufficient randomness and unpredictability.

  2. Password ⟶ Converted into Key #2 with PBKDF2.
    This key, Key #2, is used to encrypt Key #1. You'll want to persist Key #2 in the user's session. Store the encrypted form of Key #1 in the user table, in a field called (perhaps) "encryptedkey".

  3. Changing passwords
    Whenever the user changes their password, you only have to execute step #2 again, rather than encrypting all of your data, all over again. Just convert the new password into a key (Key #2), re-encrypt Key #1, and overwrite the old value for the encrypted form of Key #1.

  4. Encrypting/decrypting data
    When the user has logged in, execute step #2. Once you have the password converted into a key, just decrypt Key #1. Now that you have Key #1 decrypted, you can use Key #1 to encrypt and decrypt your data.

vrtjason
  • 1,085
  • 9
  • 10
  • What kind of key derivation function should I use? – Code Apr 15 '17 at 22:46
  • Also, I don't understand why the generated key has to be encrypted and stored in the database. Why not just store the generated key in a cookie and use it to encrypt/decrypt directly? – Code Apr 16 '17 at 01:03
  • 2
    A good key derivation function is PBKDF2, provided that you feed it a randomly generated salt and give it a healthy number of iterations (say, 200,000). Also you should follow OWASP recommendations about enforcing rules to your users about their passwords' complexity. This will make it harder for attackers to determine the key. – vrtjason Apr 16 '17 at 15:15
  • 3
    If you store the generated key in a cookie, then the user's encrypted data can only be decrypted on a later date by that same key, and by that time he may be logging in from a different browser which has no such cookie or key, and therefore decryption would fail. Also, even if he returns again using the same browser, his cookies may have been deleted by then, and decryption would fail. Also, attackers can conceivably read the cookie and steal the key. The key should be exposed only as long as it is necessary. – vrtjason Apr 16 '17 at 15:21
  • 1
    Lastly, if the generated data encryption key is itself encrypted by a password-derived key, then this allows more flexibility by letting the user change his password. If he changes his password, then you only need to re-encrypt and store the generated data encryption key; in other words, you don't have to re-encrypt all the data that the generated key might have encrypted. You want that generated data key to remain (1) relatively constant over time, and (2) protected from eavesdropping. So encrypt the generated data encryption key with the key derived from the user's password, and store it. – vrtjason Apr 16 '17 at 15:27
  • When storing the encrypted data in the database is it better to encrypt properties separately and put each encrypted property in its own column or to put all the properties into a JSON string or something similar, encrypt the string, and stick that into a single column? – Wilfred Oct 29 '17 at 22:49
  • @vrtjason it'd be great to show some kind of diagram, or flow chart, or even just a numbered list of steps. One (e.g. me) could get lost at *use the derived key to encrypt the generated key*. Thanks anyway for you solution! – superjos Mar 20 '18 at 17:23
  • @superjos I couldn't think up a diagram, so instead I just edited the answer to add an ordered list. Hopefully that helps. – vrtjason Mar 20 '18 at 18:43
  • This is a great solution, "only" problem I can think of, the data becomes useless. So if I need to ship something to User 3441 and his address is encrypted, I'm not going to be able to ship anything to that dude... I think that top managers in the company offering the service are expected to see most of the data. Otherwise, why ask for it in the first place?! – Alexis Wilke Jan 23 '19 at 03:44
  • Would Diffe-Hellman work? – mLstudent33 Jun 09 '20 at 04:40
  • How about * Forgot Password * scenario? – Arman Fatahi Apr 12 '21 at 06:42
  • @vrtjason Want a information on how should I store the salt used while generating the password-derived key (Key 2). In my scenario, I'm not storing the password or its has on the database and is only known to user. As you have mentioned, the Key 2 is used to encrypt the random Key1. One option is to store the salt along with the users's encrypted data (encrypted with Key 1), but not sure if thats wise from security aspect. – Shubhan Aug 23 '21 at 04:11
  • 1
    @Shubhan You can store the salt right there with the ciphertext of the data encryption key. A salt is not a secret. Its usefulness is in ensuring a unique hash value. If using PBKDF2, I would recommend creating a JSON object (serialized) containing (a) the ciphertext of the data encryption key, (b) the salt, (c) the quantity of iterations you used, and (d) the digest method used. That can all be stored in a text field (if using MySQL, use the TEXT type rather than VARCHAR, since there's no need to do indexed lookups on it). – vrtjason Aug 24 '21 at 07:00
  • @vrtjason Since I have stateless APIs and no session data, what do you think of sending a encrypted version of KEK (encrypted with a application key & salt) to the browser client? I can do this everytime I ask the user for their data-password (not used for login). – Shubhan Aug 24 '21 at 19:32
18

Not sure if this can be effective as a comment, instead of an answer: happy to change for the better.

The only purpose is to give a visual representation of @vrtjason's accepted answer, as that includes a number of scenarios and involves using a couple of (encryption) keys. Here you go:

scenarios and used keys

For completeness, keys involved in such protocol play roles that are known in literature as:

  • User-Key: DEK, Data Encryption Key
  • Password-Key: KEK, Key Encryption Key

HTH

superjos
  • 281
  • 2
  • 5
  • 1
    I would only add that you might take user experience into consideration. When the user logs in, persist his password-derived key in his session. That way, you don't have to ask him for his old password as a condition for changing to a new password. Also, keeping the password-derived key in the session kind of protects the more sensitive data-encryption key, which you should expose only for as much time as needed, then promptly removed from RAM. You never know if there might be a leak in the session, so it's probably best not to persist the more sensitive data-encryption-key in the session. – vrtjason Mar 21 '18 at 21:16
  • 1
    W.R.T. user experience, let me disagree on this specific use case: when user wants to change to a new password, I'd still explicitly request s/he provides current password as well. For the rest: yes, you're right. I did not imply to skip the storing of password-derived key in session by not drawing it, but yes, I'd better show that step as well in the diagram. Will update it. – superjos Mar 22 '18 at 00:15
  • How can I implement forget password feature if I want to use the same security model? – Sagar Shah Oct 20 '18 at 15:14
  • If I recall correctly, in that scenario your sensitive encripted data are lost with the user password – superjos Oct 22 '18 at 08:00
  • 1
    @SagarShah, yeah... good point... you can't... – Alexis Wilke Jan 23 '19 at 03:47
  • For password recovery: why not store in db, a second copy of DEK that's encrypted with a secret question/answer? – bilal.haider Mar 03 '19 at 18:44
  • @bilal.haider That's acceptable from a UX perspective (user will more easily remember their parent's maiden names than pw) but it's not solving any crypto problem - if the user can't provide _some_ unknown-to-system piece of info, they will lose the data. Even better UX is to tell the user to write down their passphrase/recovery key in a secure place, or put it in a password manager. – snugghash Oct 08 '21 at 07:47
  • 1
    @bilal.haider At that point your account is only secure as your secret answer. Security is only as strong as it's weakest link. – rshea0 Dec 23 '22 at 20:53