Least insecure way to encrypt a field in the database so that it can still be included in queries

Question

I have a table where all columns need to be encrypted (aside from the PK). This will be achieved using AES at the application level. However there is one particular field that needs to remain searchable.

For example a Message table has:

MessageTitle (Encrypted)
MessageBody (Encrypted)
SenderID (Encrypted but matchable)

Given a plainText SenderID I should be able to select all the related messages.

A few constraints / decisions:

I do need to encrypt the data at rest, including this column. I have no choice.
The SenderID is of a fixed format, so if I hashed the value it wouldn't be hard to brute force
I have chosen to encrypt specific fields rather than the whole database (TDE) to mitigate threats other than just the disks being stolen.

Currently my preferred solution is to encrypt the SenderID (and only the SenderID) with a fixed IV. This way the data remains encrypted but we can still match on it in queries.

A downside to this approach is that plaintext attacks could occur on the data; DBAs could create records with different SenderIDs and try to find a matching record. However I feel this can be mitigated at the Encryption Service with suitable access control and monitoring.

Does this seem like a valid solution?

Isn't "encrypting the SenderID with a fixed IV" the same thing as hashing the field ? Well, except for the "revertible" part, of course, but mostly regarding the security properties, it seems to me that you'd be better with a proper crypto hash function — Stephane, Jul 16 '13 at 12:31
What do you mean by "searchable" exactly ? Do you mean you need to be able to find an exact match from a given input (cleartext, I assume) or do you need to be able to get something else out of it (order, partial matching) — Stephane, Jul 16 '13 at 12:33
If the SenderID was plaintext you would know that a message had been sent by a particular user even if you couldn't read the content. For our threat model this is unacceptable data leakage. — Dan Rowlands, Jul 16 '13 at 12:36
@Stephane, yes sorry for the confusion. I mean the field needs to be matchable. No sorting, ordering or partial searches required. And yes, we'll have the SenderID in plaintext to execute the search. — Dan Rowlands, Jul 16 '13 at 12:39
@Stephane, regarding the fixed IV, I assumed that the encryption would be more secure than hashing because there is a secret key component. An attacker with the data base could generate hashes of all possible SenderIDs (as it's a fixed format). Whereas even if they had the data dump they wouldn't know the encryption key. — Dan Rowlands, Jul 16 '13 at 12:43
What is the risk in the SenderID being exposed? Easier to give answers if the threat model is known. — David, Jul 17 '13 at 03:10
@David It's possible for SenderID to be traced back to a real person. So if you imagine that it's a person's name (it's not but just as an example) and that name is present in a table called Bankrupcy or PreviousArrests etc. You may not be able to read all the content but the very fact they exist in the table leaks information. — Dan Rowlands, Jul 17 '13 at 09:28
Similar question: [Encrypt DB Fields preserving search functionality](https://security.stackexchange.com/questions/105806/encrypt-db-fields-preserving-search-functionality) — sleske, Mar 10 '22 at 13:34

Stephane · Accepted Answer · 2013-07-16T14:06:18.000

Given what you've told us, I would still use a hash function: that's exactly what they have been designed for.

You cannot salt your hash value, unfortunately, since you want to be able to find a specific record based only on the cleartext value but you can still used a HMAC algorythm to protect against direct attacks: an attacker will need to have access to the key used in the HMAC computation in order to derive a hash from a given cleartext: assuming you're storing your keys properly (probably the most critical issue in such an application), your SenderID will remain hidden but you'll be able to find it at any time by a simple lookup.

Using a HMAC will also help you if your SenderID has low entropy (although it will not solve THAT issue completely).

Please note that anyone who has (or can gain) access to the key used in your HMAC calculation will be able to try to match given cleartext with your data but that is something that cannot be helped, I think.

Be aware, though, that, while you can (and should) have a different key on each installation, you will NOT be able to change that key afterward or you will lose the ability to find your records. One way around this is to add an encrpyted version of the SenderID within your record: in you need to re-key everything, you can use it to recompute a new hash.

I would, however, suggest that you use a strong, modern hash function for your HMAC. (SHA-3, for instance, even if the final specs haven't been published yet)

Thanks Stephane this is an interesting, I briefly considered HMAC but didn't put much time into researching the option. I'll take another look. — Dan Rowlands, Jul 16 '13 at 13:53
This does look like a good solution, as you say it was designed for this purpose and is obviously better than simple hashing. Could you provide any information on why the HMAC approach is more secure than a Fixed IV encryption approach? — Dan Rowlands, Jul 17 '13 at 08:50
First, a hash/HMAC is supposed to be irreversible. That means that you can actually leak the hash without putting the data it was derived from at risk. Second, you're using standard algorythm in a standard way instead or trying to reinvent your own system: your not placing yourself in a situation where you're introducing new risks because you do not fully understand the consequences of what you're doing (which is always the risk with encryption and security systems). — Stephane, Jul 17 '13 at 09:16
This is the solution I'm going to use. Thanks for your help. — Dan Rowlands, Jul 18 '13 at 08:42

score 1 · Answer 2 · answered Jul 16 '13 at 12:37

1

If it's a hash you can compute the hash and search using that value since the same phrase will make the same hash value. You may not necessarily need to salt this hash since I'm assuming the SenderID is unique but salting is never a bad idea. Also why have just a senderID what about password? If you need to make the message body searchable you would have to get the encrypted contents back, decrypt it then look for a match.

For example the word bob will always compute to 9f9d51bc70ef21ca5c14f307980a29d8 using md5 so if the senderID happens to be bob the user will input bob. Your code will hash that string and get 9f9d51bc70ef21ca5c14f307980a29d8 then select * from email where senderid='9f9d51bc70ef21ca5c14f307980a29d8'

answered Jul 16 '13 at 12:37

Four_0h_Three

1,245
2
8
13

My concern with this method, is that the SenderID follows a fixed format, so I don't imagine it would take very long to brute force the values even with a "slow" hash. – Dan Rowlands Jul 16 '13 at 12:46
In this case SenderID just identifies a person. It's entirely separate from username/password authentication. – Dan Rowlands Jul 16 '13 at 12:47
@DanRowlands the amount of time it will take to break the hash will depend on the value used. If you are just using integers then it will take no time at all. If you salt those integers it may take longer. If you really want to ratchet up the entropy use a GUID rather than an integer. – Four_0h_Three Jul 16 '13 at 12:57
Unfortunately in my real world problem the SenderID equivalent is a 10 digit integer. – Dan Rowlands Jul 16 '13 at 13:01
That is a problem. I would salt it then. Make a table where you can look up a salt by senderID then mix the salt and senderID to get the salted hash. – Four_0h_Three Jul 16 '13 at 13:05
He cannot salt it. If he does, then in order to find a given record, he'll need to recalculate a hash for each record using the specific salt and compate that with the sender ID: that is NOT going to work on the large scale and that makes him very vulnerable to a DOS (since for each "logon" attempt, he'll have to compute, on the average, n/2 hashes, n being the number of row concerned). – Stephane Jul 16 '13 at 13:12
He's already going to have to compute the hash each time anyway. If you have one table that's simply senderID and salt and you index senderID that query will take no time at all. If adding in one simple query will DOS your system then there's a problem with your setup. – Four_0h_Three Jul 16 '13 at 13:15
If I understand salts correctly, they exist to prevent pre-computed lists of hashes being used to brute force a hashed value. However considering we only have a 10 digit integer and the salt would be essentially stored with the hash, would it make any real difference in how long it would take to run through all the combinations? – Dan Rowlands Jul 16 '13 at 13:24
If he uses a second table to hold the SenderID in clear text, then he's not following his own requirement. What you're suggesting would simply make it harder to match a given SenderID with a given data record – Stephane Jul 16 '13 at 13:24
@DanRowlands The proper use of a salt value is to create a different salt value for each record, store it (in clear) with the record and the result of a HMAC computation. That means that an attacker would have to create a new table for each record he wants to break. You cannot, unfortunately, do that so using a secret, static, constant key instead is the next best solution (or, in your words "less insecure way") – Stephane Jul 16 '13 at 13:27
@Stephane I see what you mean, you're right. Store the salt with the record and hash the salt with the senderID value. Storing the senderID in plaintext would be dumb :) – Four_0h_Three Jul 16 '13 at 13:34
@PsudeoReality If you do that, you're back to the state where every time you try to find the correct record for a given SenderID, you have to walk through each record and compute the salted hash result. – Stephane Jul 16 '13 at 13:36

Least insecure way to encrypt a field in the database so that it can still be included in queries

2 Answers2

Linked