22

I'm developing a web service that stores sensitive personal information (such as telephone numbers, addresses, full names, and email addresses) in a database, such that it can be accessed from anywhere with an Internet connection.

As part of this, it's obviously a good idea to be encrypting that data to add security against being hacked. It has to be encryption rather than hashing because the data has to be accessed again.

I would also like to do this encryption such that each user has their own decryption key. This means that even administration can't access the data, which also protects against internal corruption.

Given that I've already got user accounts, is it acceptable to use the user's password as the encryption key? That leaves me with a process like this:

  • User signs up. Hash their password, as normal.
  • User creates some entries and saves them.
  • User signs out. Ask for their password, check it against the stored hash to verify it, then use it to encrypt everything they have in the database.
  • User signs in. Verify their password, then use it to decrypt everything they have.

Obviously this isn't the best possible security - but are there any major flaws with this method?

ArtOfCode
  • 572
  • 3
  • 14
  • 2
    One big issue is that it requires transmigration of the raw password. At the very least you should pre-hash the password before sending it to the server. Also if the server is compromised cracking the password gets the bonus data in the form of personal information making further attacks on that person easy. – AstroDan Mar 25 '16 at 18:19
  • 8
    Will your users get mad if they skip step 3 and their data doesn't *actually* get saved? –  Mar 25 '16 at 18:19
  • @drewbenn In that case, their data would be saved but not encrypted. Which I guess one could get round by implementing idle-time-based sign outs. – ArtOfCode Mar 25 '16 at 18:23
  • 5
    In that case, will your users get mad if they think their data is stored encrypted and *it actually isn't*? –  Mar 25 '16 at 18:26
  • 4
    @AstroDan From a data protection perspective, pre-hashing the password doesn't buy you anything. It's the data that is valuable, not the password. Pre-hashing just means that it's the hash of the password that becomes the target, not the password itself. And it's moot for a web app in any case, where the code could be changed to send passwords directly to the server at any time. – Xander Mar 25 '16 at 18:27
  • 1
    @drewbenn That's the flaw in the system that I can see. Data *is* stored encrypted, but is stored decrypted while they're logged in. – ArtOfCode Mar 25 '16 at 18:27
  • 5
    @ArtOfCode well technically it's stored decrypted *until they explicitly encrypt it*. Which might be a bad UI decision. You can leave a page without logging out, for example if you move out of range of a cell tower, or you hibernate your computer, or close your browser or it crashes, or your library/lab computer session times out. –  Mar 25 '16 at 18:29
  • @ArtOfCode I wouldn't call it a flaw..It's simply a reality of the architecture. If you want to have access to data, you have to trust the components that are allowing you to access that data, whether a cloud service or a the PC on your desk. – Xander Mar 25 '16 at 18:30
  • @drewbenn You can leave a page, yes, but if you're automatically logged out after 15 minutes of inactivity, then data will be cleartext for a maximum of your session plus 15 minutes. – ArtOfCode Mar 25 '16 at 18:31
  • 1
    @ArtOfCode but you said the user has to re-enter their password in order to encrypt the data. –  Mar 25 '16 at 18:31
  • 1
    @drewbenn Ah, true. Yes, that presents a more pressing problem, then. – ArtOfCode Mar 25 '16 at 18:32
  • @Xander Very true. However the plain text of the password is more valuable due to password reuse. – AstroDan Mar 25 '16 at 18:36

4 Answers4

40

If you do the encryption and decryption on the server-side, there is always a chance for an administrator to decrypt the data without the knowledge of the user, by modifying the system so that when a user legitimately decrypts his or her data, the decryption key is stored to be used at the leisure of other interested parties, for instance, in the service of warrants.

That said, the scheme you describe is generally along the lines of how systems like this are in fact built, and in normal use, are reasonably secure. There are a couple of caveats, or clarifications, however.

  1. You would not use the password directly as a key. You would use a PBKDF such as bcrypt, scrypt, or PBKDF2 to turn the password into a strong random key using as high a work factor as makes sense for your application.

  2. You would generally not use this key to encrypt the data directly. You would generate a strong random key on the server that will be the actual data encryption key (DEK). The key derived from the password will then be used as a key-encryption-key (KEK) to encrypt the DEK. This way, if a user decides to change their password, you generate a new KEK, and you only have to re-encrypt the DEK, and not all of the data.

  3. With this system, you're not actually required to store a password hash at all. When a user logs in, you merely need to derive the KEK, decrypt the DEK, and determine if it can correctly decrypt data for the user. If it can, the password is correct. If not, it isn't, and you can fail the authentication attempt. This may or may not be desirable depending on the application.

Xander
  • 35,616
  • 27
  • 114
  • 141
  • 3
    What happens if user forgets password? Even if the user can then prove identity, the existing encrypted data will be lost forever? – Boon Mar 26 '16 at 04:13
  • 11
    @Boon The same thing happens when you encrypt data yourself and then forget either the key or the password to the key. Such is life. – Omniwombat Mar 26 '16 at 05:01
  • 5
    @drewbenn No, I am not. I am just worried that most users might not be aware of that consequence. It is quite hard to explain that to the user. Too often, user forgets their password and expect that just by proving their identity, we can reset things back to normal. – Boon Mar 26 '16 at 13:05
  • @Omniwombat Perhaps what you said is what we have to say to the user to explain the situation. – Boon Mar 26 '16 at 13:05
  • 3
    @Boon There are several ways you can deal with the lost password scenario. The safest, and a relatively common way is to simply state clearly up front to the users "If you lose your password, your data cannot be recovered." 1Password works this way, and they're explicit about the fact. Additionally, you could also encrypt the DEK twice, once with the user's KEK generated from the password, and the second time with an alternate master key that could be saved by the user as a recovery key, or kept by a trusted third party in escrow, for instance. – Xander Mar 28 '16 at 18:04
17

This has (at least) two big flaws:

  • users can't easily change their passwords
  • if the data leaks, it's protected by a likely weak password

Better is to use a full-strength encryption key generated randomly, and encrypt that with the user's password. Users can change their password easily (you only have to decrypt/reëncrypt one thing), and if encrypted data gets in the hands of attackers, it's protected by 128 or 256 bits of entropy.

Stephen Touset
  • 5,774
  • 1
  • 23
  • 38
  • 1
    As the question is worded, the password changing is not a problem as _all_ data gets decryped on logon and decrypted on logout. A password change in between would not be a problem in this case. Also, using a random key and encrypting that with the user password only moves the problem a layer upwards. Better ask the user for sound pass phrases to begin with. – Tobi Nary Mar 25 '16 at 18:22
  • 3
    Expecting (or worse, trying to force) users to provide high-entropy passphrases is an exercise doomed to failure. – Stephen Touset Mar 25 '16 at 18:30
  • 1
    I fail to see how this proposed solution is any better. If the attacker gets their hands on the encrypted data, they'll probably have their hands on the ciphertext that contains the encryption of the full-strength key with the user's passwords, and we're back right where we started. Your construction does allow users to easily change your passwords, but that's all it does -- it doesn't meaningfully increase security against dictionary attacks on the password. – D.W. Mar 27 '16 at 17:50
  • 1
    If one uses a separate datastore for the keys than they do for the data, I think it provides a meaningful improvement. – Stephen Touset Mar 28 '16 at 02:55
3

You're right to be paying careful attention to client login/authentication/personal data systems, but your initial post only considers users' private encrypted data in terms of encryption techniques, not in terms of how its then accessed and stored. I'd highlight especially the reliance on approaches likely to have a single point of failure in the pathway, or a general hope that your software/systems will be "well enough set up". Long term that's not a very safe assumption, although it's very commonly made "by default". Assume this instead:

  • Almost any widely used software will regularly have new vulnerabilities discovered. Your chosen softwares/DBs/OSes/web-facing services too.
  • Many businesses will be targeted. Yours too.

So "layering" security matters here as much as anywhere. That means don't make the encryption become a "bottleneck" able to take your users' data security with it. Don't trust just a one-layer system (of encryption or any other kind) to keep it secure. Work with the mental view that at some point your DB and the encrypt/decrypt pathway or certificates will turn vulnerable at the same time, allowing attackers or insiders to read stored private user data if there is no "further hurdle" standing in the way, and face that threat directly as well.

So here is a further mix of ideas for security measures for designing-in, on top of everything else. I've described it in terms of authentication/login for ease, but the same principle holds true for your entire stored data from clients if it's sensitive:

Split the encrypted login data for each user account into two or more and separate them. For example, put the encrypted passwords or user data on one SQL DB server - but ensure that individualised random salts needed to decrypt them on a second independent SQL DB server. Or, once encrypted, store individualised per-account decryption keys in two halves on two separated DB servers.

Basically it forces a person to break into and exfiltrate two servers not one. They also have to leave two audit trails not one.

Make sure that a person gaining access to one doesn't trivially also find that this gets them access to the other (different DB engines/OS flavours to mitigate RDBMS/OS vulnerabilities, perhaps different login credentials or sites if it's viable, to make it harder for a single individual to gain physical/logical access without flagging an alert).....

The future "person gaining access" could as easily be an insider as an outsider, or pivot using other accessible devices on your networks or that you and your staff own (staff working at home for example), so make it hard for insiders to copy/skim/replicate the DB other than in controlled scripted ways.

Then look at access control for those DB servers. Perhaps the web service servers which are public facing and probably on the same LAN as most staff, are higher risk, but the authentication/client RDBMS servers only need maintenace access by very few staff, and are dedicated to just storing the client data, making any abuse/exception stand out much more sharply against the lower "background chat" in the logs. So perhaps the servers holding sensitive data might be secured by watching for unusual access patterns, being on better hardened or specialist hosted separate networks run by a reputable business (not yours in case it's an insider), perhaps they might only be responsive in a very limited way to the outside, for example restricting the IPs they respond to, to your own IPs for the servers providing provide the public-facing web service, and a dedicated console used for the client-login authentication DB server maintainance only, or only responsive to very limited and strictly formated plaintext requests that are translated into DB calls locally (to prevent SQL or other common API abuses)......

Then look at the rest of the pathway in the same way. Where can an attacker sit in the wider system, that lets them grab data or enhance their access in meaningful ways (or perhaps access to data that becomes meaningful once combined with other data they get). Who actually has to be able to sit there and can the inherent scope for harm be reasonably limited?

For example, very few people need access to a login/authentication server, or private keys/certs. Physical and logical access may only be needed by a handful of people/devices as well.

(If you can't do "end to end" encryption fully, perhaps you can still to an extent "black box" that side of things. For example, maybe the servers doing the actual encrypt/decrypt are only accessible this way, the DB servers holding encryption keys/data-at-rest are only accessible that way, and each of these only has very limited and well-defined access needs for maintenance and only communicates with specific secured non-internet IPs and (for user data processing) via a very limited data API. Your web service treats these as a black box, receives back encrypted data (end to end encryption from its perspective) and other than these very few points, your other systems or users only ever see encrypted user data.)

Don't forget to pay a bit of attention to routine risks as well (backups, power/hardware/connectivity issues, sudden loss of key people).

Put these together and it starts to get a bit harder for things to go "so catastrophically wrong" as to cause serious business damage.

What I am hoping to convey is, don't just think of encryption as your "solve-it-all". Look at the pathways that would allow an insider or outsider to get user data either when it's stored/modified, or retrieved/used, or in transit. Consider what can be done to strip those pathways down to their minimum (less is easier to secure and to notice patterns). Consider what you can do to make it hard to get the data they want from any other point than a very few points, and how to secure those very few points. Consider how to better detect bulk downloads or slow exfiltration. Layer it, and don't trust single points of security failure any more than you would single points of hardware failure. It won't be perfect but it's an approach that will pay off.

This kind of thinking has the potential benefit that even if your DB server is penetrated or internal encryption/decryption credentials accessed, perhaps useful user-data in bulk still can't be obtained or exfiltrated so easily. It adds a hurdle, and conveniently it doesn't need to be a very expensive or complicated one. It's almost the principle of it.

Stilez
  • 1,674
  • 8
  • 14
2

What you describe seems reasonable, as long as:

  • The pass phrases of the users are good (cf. "correct horse battery staple")
  • You are using a key derivation function to derive the encryption key from the pass phrase in a sound way (cf. PBKDF2)
  • You dispose of the key as soon as possible
  • You stop any other process from copying (backup?, bad access rights?, anything else) the keys or the decrypted data as long as the user is logged in.
Tobi Nary
  • 14,352
  • 8
  • 44
  • 58