18

I am planning to do a web application which stores a lot of personal stuff (info, photos...). And to give users a sense of protection, I want to encrypt the data before storing it to the database.

So, in the database I will store hash of the password (to validate against) and user data (encrypted with the password as key -or key derived from password-).

How can I provide a forget-my-password option? I see that the only way is to store the key in some place, which removes the benefit of encrypting the data in the first place.

Yousf
  • 453
  • 4
  • 8
  • That really seems like "a sense of protection" only. What's that encryption supposed to protect against? A stolen database backup? Sure! But with anything else (somebody getting control of the server), all is lost and encryption won't help anymore. The server (controlled by the attacker) will still receive the user's password on login (before hashing it to compare against the stored value) and thus can derive the encryption key. Even if encryption happened locally (using JavaScript), the server could send malicious JavaScript to the user which in turn sends back the password to the server. – caw Apr 09 '17 at 18:13

3 Answers3

17

The standard way of doing this is as follows:

  1. Create an authentication hash using a salted key derivation algorithm such as PBKDF2 or bcrypt.
  2. Generate a random surrogate key for data encryption. Encrypt all data with this key.
  3. Generate a "locking" key from the password using a different salt, but still using a strong KDF.
  4. Xor the surrogate key with the locking key and store that value. This makes the master key unknown unless the password is unknown.
  5. On login, generate the locking key and xor it with the stored surrogate key, which gives you the real surrogate key.

The benefit of this system is that it allows for easy password changes - just use the old password to decrypt the surrogate key, then generate a new locking key from the new password.

When using such a mechanism, the only way to decrypt the data is to have the password, unless you create a second encrypted copy of the surrogate key using another secret.

Here's how I'd do it:

  1. Ask the user to provide you with 3 secret answers. Convert the answers to uppercase and concatenate them. Run that value through a KDF to create a secondary locking key.
  2. Xor the surrogate key with the secondary locking key and store that as a backup key.
  3. When the user forgets their password, ask them for their secret answers and generate the secondary locking key, then use that to compute the surrogate key.

The secret answers option is the easiest, but it's possible to do this with any secret value.

Polynomial
  • 133,763
  • 43
  • 302
  • 380
  • 6
    This just boils down to "ask the user for two different passwords, and hope he doesn't forget both". – CodesInChaos Feb 03 '13 at 11:50
  • 1
    Your crypto scheme is pretty weird too. Why XOR instead of a normal authenticated encryption method? Why run the expensive KDF twice? – CodesInChaos Feb 03 '13 at 11:52
  • 1
    Sure it does, but isn't that the standard mechanism for most things? You need to know your password, and if you don't know that you need to know your secret answers. The difference is that the latter should *always* be memorable. If they forget the name of their first schoolteacher and their first pet's name, that's just too bad. Security comes at the price of some usability. – Polynomial Feb 03 '13 at 12:04
  • 2
    There's no reason to use a full encryption method if you're generating two keys of equal length. The xor operation is ideal for the situation - it's essentially a one time pad with a key generated by a KDF, so you can guarantee that the security of that operation is *at least* as strong as the KDF and therefore at least as strong the password. You're somewhat right about running the expensive KDF twice - you could run a cheap KDF for the second option, but then you're running into a situation where offline cracking from a database dump is relatively easy. I prefer the full KDF. – Polynomial Feb 03 '13 at 12:08
  • 2
    Oh you mean that kind of "secret question". Those are only good enough to prevent DoS against recovery emails, not as keys. Using public information as secondary key sounds completely useless to me. – CodesInChaos Feb 03 '13 at 12:08
  • "but then you're running into a situation where offline cracking from a database dump is relatively easy" only if you're using a really dumb construction. Use `MasterKey = ExpensiveHash(salt, password); LoginVerificator = Hash(MasterKey, "Verify"); EncryptionKey = Hash(MasterKey, "Encryption")` – CodesInChaos Feb 03 '13 at 12:10
  • 2
    @CodesInChaos Which is why I said "The secret answers option is the easiest, but it's possible to do this with any secret value". Also, secret questions should not be *easily attainable* public information - they should be things like your first high-school crush's name, or the name of your first stuffed toy as a kid. In reality such things are very difficult to guess or discover. Plus you should always provide soft-security techniques such as requiring a reset email to go to the user with the reset link, before any questions are asked. – Polynomial Feb 03 '13 at 12:11
  • @CodesInChaos Sure, that works too. It's not really any different in terms of security, just more efficient. – Polynomial Feb 03 '13 at 12:12
12

There are two ways out the "forgotten password" issue:

  1. Encrypt the data not with one password, but with two passwords. To make that efficient, use an intermediate key: a data file is encrypted with a random file key K, and key K is encrypted twice: once with the first password, and once with the second password. You actually want an intermediate key anyway, to support password changes without having to reencrypt all the data.

    In that scenario, the second password will be the "backup password" which the user will not forget. This kind of miracle (how will the user not forget a password he never uses, since he managed to forget the password he uses regularly ?) can be achieved in several ways: the backup password could be a long sequence of characters which the user writes down on a piece of paper, stored in a safe (or his wallet); the backup password could consist of answers to "security questions" (as @Polynomial suggests);...

  2. Use key escrow. A copy of the user's password, or of an intermediate key K, could be stored by a "Trusted Third Party", to be unlocked in case of emergency. The TTP would have to be adequately protected, and agree to unlock escrowed secrets only as part of an official, controlled and audited ceremony where the data owner (the user who forgot his password) proves his identity through some physical mean (e.g. by coming in person showing and his driver's license). The escrow step can be performed without interacting with the TTP if asymmetric encryption is used: the TTP has a RSA key pair, the user's password is encrypted with the public key, and the TTP uses the private key to unlock the lost password.

    Whether key escrow is applicable really depends on the context and, ultimately, how much you can make the user pay for the unlocking of his precious data. In a different situation, Microsoft's BitLocker drive encryption technology (included in recent/expensive versions of Windows) supports key escrow so that the sysadmin can save the data of users who forgot their password.

    Key escrow means that the server (with the help of the TTP) can theoretically unlock the user data without his consent or even knowledge. This is good the user data is not really his own, and the user is unavailable (e.g. in an enterprise, when a user is struck by a bus, his successor must be able to read the stored business data, and the previous tenant can no longer give his password). Conversely, the much-hyped Mega site deliberately forgoes all kinds of escrow and lost password recovery precisely to ascertain, in a very legal way, that they have no way of accessing the data without the collaboration of the user.

Thomas Pornin
  • 322,884
  • 58
  • 787
  • 955
1

Put a somewhat less descriptive, but more conceptual way, basically you want to provide two different means to get to the same key. This is the idea of having a data key, typically unique to a record, or perhaps to all of a user's records, depending on security needs. You then encrypt that key with a key derived from their password and a key derived by some other piece of information. It could be an alternate thing that they know or it could be something that your server knows. This can either be symmetric cryptography, or for greater security, the information your server knows can use a public key for a private key that is not held on the server.

Then if the user ever loses there password and is thus unable to access the data key, the alternate encryption of it can be decrypted to retrieve the key. In order of security, symmetrically storing the key on the server is least secure as if both the server and DB are compromised, the encryption is rendered useless. The user information might be a little better, but for it to be easily rememberable, it will also likely be easily researchable or guessable, which is insecure even if the server isn't compromised. The asymmetric option where the server doesn't have access to the information necessary for a reset (because it only knows how to encrypt the recovery version of the data key is the most secure, but it is also the most difficult from a usability standpoint as the decryption then has to either be done a) by hand or b) by a secondary, highly secured server, preferably holding the private key in a TPM (trusted platform module) or some other secure hardware keystore.

Either way though, the basic principal is the same. You always make the key unrecoverable without the secret, you just store it multiple times with different secrets.

AJ Henderson
  • 41,896
  • 5
  • 63
  • 110