9

I have some experience in building PHP based websites with MySQL access, storing encrypted user details, but all other fields being plain text. My most recent project will require sensitive data to be stored in the database.

I want to set up a system where a user can access his own entries and see the plain text results, but even if he was able to access someone else's they would be encrypted, unintelligible strings.

I have an idea of how to accomplish this, but perhaps it's not optimal or there exist tools to do this already in a more efficient way.

1.) Store username as plain text, and password hashed with sha1() or better hashing algorithm plus random salt etc.

2.) Take the user's password (not the hashed one, but the one he typed in) and use it do define a key specific to that username and password, which will then be stored as a session variable. Since the key is never stored anywhere except in the user's head (in the form of him knowing his plain-text password, which will then be converted to a key somehow by mixing it with his username and some salt, hashing etc, as he inputs it to log-in) this should be a good solution right?

3.) Encrypt, or decrypt all of that user's data with that key.

In my opinion even if someone gained access to the database and saw a list of plain text usernames and encrypted passwords they couldn't figure out the specific key since it's not stored anywhere. Therefore even if access was gained, they couldn't decipher the content of the sensitive database fields. Of course I'm building in ways to stop them accessing the database anyway, but as a catch-all effort this seems like a good set of steps.

Could the experts please comment on this, and offer some advice? Thanks a lot.

I posted this in the programmers.SE forum and they advised me this was a better location for the question. More specifically why I shouldn't do this myself. If so then what alternatives are there?

Joey O
  • 91
  • 1
  • 1
  • 3
  • Are you using just sha1() on the password or are you applying a salt and a number of hashing rounds? – Lucas Kauffman Jul 16 '13 at 13:43
  • Sha1() with salt and multiple hashing. I'm also aware better algorithms exist for this purpose so will explore those for this new project where security is more important than for any other one I've done. – Joey O Jul 16 '13 at 13:49
  • Of course the first step is to prevent someone accessing the database, or a user's data, but this question was more along the lines of minimising the effect if they indeed did access things. So I'm curious about encrypting and decrypting all the data with a user specific key, coming from some username, password combination or similar, and the possible shortfalls or vulnerabilities of doing so? – Joey O Jul 16 '13 at 13:51
  • PBKDF2 compliant algorithms are normally alright, apart from that you also have scrypt and bcrypt. – Lucas Kauffman Jul 16 '13 at 13:52

2 Answers2

5

What you are talking about is a derived key. The technique you describe is frequently used for high security systems, though I would further recommend that you use the derived encrypt a data key. This allows the data key to be stored with more than one derived key and limits the use of the derived key. This allows for data sharing, keyrings and account recovery. Also, keep in mind that the derived key itself is only as strong as the password, so salting and slow key derivation functions should still be used to ensure that password derived keys can't be formed for all the users passwords at once.

Derived keys tend to be weaker than truly random keys, so you want to encrypt the minimum amount possible with them. If you encrypt only the user's data key with it, you can decrypt the user's data key and then store it in the session. That truly random key can then be used for encrypting and decrypting the user's data.

A further enhancement is to associated keys with records and then encrypt that record key with the user key in order to give the user access. If you use asymmetric cryptography here, it would be possible for a user to grant record level access to another user by encrypting the record key with the public key of the user they wish to share with and adding it to the other user's keyring.

Similarly, for account recovery, the user keys could be encrypted with a public recovery key. The private key could be held offline or associated to an admin user (with similar derived key protection) and only used by customer service when necessary to restore an account.

AJ Henderson
  • 41,896
  • 5
  • 63
  • 110
  • Oh, one other thought, if you use a derived key, you don't have to actually store a hash of the password either. Just have it derive the key, then use the user key that it decrypts to encrypt a login string of some type. If it encrypts properly, they were able to decrypt the user key which means they have the derived key and thus the password is valid. – AJ Henderson Jul 16 '13 at 14:49
  • thanks a lot for this detailed explanation. So you're suggesting to assign each user a randomly generated key, which is used to encrypt and decrypt most of their data. This key itself is encrypted and decrypted with another key which is derived from their password, meaning it is also sufficient to log them in (check for valid password). Why is it that using a derived key to decrypt a random key which can decrypt sensitive data is safer than just using a derived key for the same purpose? If someone had the derived key they could obtain the random key and get the data anyway? – Joey O Jul 16 '13 at 15:23
  • It has to do with the methods of trying to crack a key. Derived keys by definition are less secure than a random key because they have less entropy and the key itself has to follow certain rules on how it is derived. It is more likely that an attack could be found that could use statistical analysis to reveal portions of the key due to the added complexity. If you limit the data encrypted and decrypted with it, you also limit the surface area for statistical analysis. It might or might not actually be a gain (since such attacks might not exist) but it's a fairly cheap step. – AJ Henderson Jul 16 '13 at 15:32
  • It also allows account recovery since if the password is lost, the user key may still be recoverable if it is encrypted to another derived key. (Since the user's data key can be shared without needing the user's password.) So the net effect is that it is more versatile and potentially more secure against particular types of attacks. – AJ Henderson Jul 16 '13 at 15:33
  • @JoeyO - note that good selection of a key derivation function is still important as well, just like good hash selection is. You don't want someone to be able to brute force the keygen easily, so salts are still advisable. – AJ Henderson Jul 16 '13 at 15:38
  • God. Came to that question because I had exactly the same idea as the OP. Your answer is f***ing great! – Áxel Costas Pena Dec 25 '13 at 22:04
0

use it do define a key (…), which will then be stored as a session variable. Since the key is never stored anywhere except in the user's head (…)

Although temporarily, the key is being stored in the server (eg. a basic session handler uses files in /tmp, a memory dump of memcached could reveal keys even after expired). You should check how is the server keeping sessions to verify it doesn't (easily) give away those keys.

You may want to re-encrypt that value-stored-in-the-session with a random cookie value. Or not store it at all.

Bonus if you perform all encryption on client-side (using javascript). Then the server never sees your data, and thus can't an attacker that compromises it.

(Note that the attacker could still replace the legit code with malicious one and wait for the user to access the site again for stealing his data. This was alread possible before)

Ángel
  • 18,188
  • 3
  • 26
  • 63