Encrypt DB Fields preserving search functionality

Question

I would like to protect sensitive information stored on a Postgresql database, but I would not like to encrypt it all. The idea is to encrypt only those fields which hold sensitive information. I was thinking on using AES 256 to store the data, and I've read some posts here with some insights and recommendations about this topic.

The problem is, AES with CBC generates a different ciphertext for the same plaintext (provided the IV is different everytime) so I lose the ability to search on those fields. One of those fields is the person ID number, which is used on our webapp to search for a person's data. What I would like to implement in our app is that when the request comes from a user to search for a person with id '123', the app encrypts 123 and searches for the encrypted value (hex-encoded) in the database. Note that the ID field is not a primary key in the table.

However I read that providing the same IV for AES is not a good idea. Is there any cipher suite I can use that produces the same ciphertext given the same plaintext, and offers a good level of security against cracking? Or is it OK to use AES 256 CBC with the same IV to produce the same ciphertext in this scenario?

One of the posts I read suggested adding a new field to the table to store the hash of the plaintext for searching. While this is a good idea, I prefer to keep my tables without changes because we want to use the same persistence beans for encrypted and unencrypted databases.

What I'm trying to prevent is DBAs seeing sensitive person data performing selects on tables (the database is not under our control).

Any advice is greatly appreciated!

References:

Hashing doesn't help. Either you salt the hashes in which case you can't search on them or you don't salt them in which case they're easily cracked. — Neil Smithline, Nov 17 '15 at 22:53
What you're looking for is called [Homomorphic encryption](https://en.wikipedia.org/wiki/Homomorphic_encryption) — Stephane, Nov 17 '15 at 23:41
Similar Q on StackOverflow: [Using a constant IV with single-block encryption](https://stackoverflow.com/questions/5196421/using-a-constant-iv-with-single-block-encryption) — sleske, Mar 10 '22 at 15:23

d0nut · Answer 1 · 2015-11-17T23:45:39.560

3

I would actually not advise encrypting something like a user id. There should be practically no risk to anyone seeing it and you're really making a lot more work for yourself than necessary. An id identifying a user is not what I imagine a lot of people are going to call "sensitive information".

I'm actually having a hard time trying to think of a case of sensitive information that you would like to use a WHERE clause on in SELECT. I wouldn't never want to search against SSNs, for example.

If you wanted to compare user input against something like an SSN there's generally other information such as name and DOB that's entered alongside the SSN. I would pull all records out into an array with that match that information (name, dob) and then check any resultant entries with the SSN provided and the SSN from the entry, decrypted.

edited Nov 17 '15 at 23:45

answered Nov 17 '15 at 23:23

d0nut

876
7
13

Encrypting the person id is a requirement. We are trying to avoid people seeing who is in our database. If someone has a list of people IDs (along with contact info) and finds one in the database, he will begin to contact our clients and spam then with offers (that's why we want to encrypt name, telephone, mail and address too). However when our staff logs in our app and attends a client, the client usually gives his ID, and the app has to search information about the client in order to fill a form. – rober710 Nov 19 '15 at 16:51
@rober710 i think you're thinking of this correctly or are using the incorrect/ambiguous terminology. you can encrypt contact info but you shouldn't have to encrypt an `id`. If they have the `id` but can't read the contact info, they can't do anything, correct? – d0nut Nov 19 '15 at 16:56
Sorry, I think I didn't explain it very well. Besides hiding contact information, we also want to hide who is registered in our database. Even if a DBA cannot see the person's name or contact info, he can make a list of all our clients `(SELECT id FROM client)`, and sell that list to a marketing company or someone else. With the id, they can cross reference information with other databases and figure out who the person is. – rober710 Nov 19 '15 at 17:22
@rober710 so this is where i think the ambiguity is ensuing. When you say `id` i think you mean an auto incremented column in your table that assigned each user a number like `john smith -> 1, jane doe -> 2, ...` – d0nut Nov 19 '15 at 17:25
@rober710 can you elaborate on what the ID is exactly? How does the `id` identify a client, exactly. – d0nut Nov 19 '15 at 17:25
Oh! sorry, now I understand your confusion. What I meant by id is not the primary key of the clients table, but the ID number on a person's ID card (we don't use Social Security Numbers for a person's identification, but "ID numbers" instead). Thinking in SSNs, we would like to keep the SSN secret so DBAs cannot see which SSNs are registered in our database, while being able to search for them in a query. I wanted to get the SSN received from the client, encrypt it, encode it in hex or base64, and search for it in the database. – rober710 Nov 19 '15 at 19:37
@rober710 Most people in the DB community think of an ID as a meaningless identifier, so thus the confusion. – Steve Sether Nov 19 '15 at 20:34

Steve Sether · Accepted Answer · 2015-11-19T21:05:06.993

You can't use AES CBC mode because it would require either require you to use the same IV for each user ID, or encrypt the ID multiple times with each unique IV in the DB and do a compare. The first would defeat the purpose of CBC mode, and the second would be extremely slow.

If you really have no other choice, use AES-ECB mode, which always encrypts the same plaintext with the same ciphertext, given the same key. It's normally not recommended because it preserves patterns in your data, as seen in this answer. If your data fits in a single block size of 128 bits, this isn't much of a concern. If the ID is simply an integer, it almost certainly does fit within a single block. If the ID is a String, it may fall into 2 or more blocks depending on length and encoding (unicode for instance). More blocks would lead to the potential for further analysis of what's inside the blocks based on the patterns inside the data (like the analogy of the picture in the link). I'm not qualified enough in crypto to tell you the risks of this.

I'd heavily recommend against storing the hash of the ID plaintext anywhere, at least if you want to keep this ID secure. An ID as you describe has a very low amount of entropy. It'd be trivial to hash a billions IDs until it matched the hash, defeating the purpose of the encryption entirely.

Your scenario is probably OK, if the threat model is just protecting from rogue DBAs. Just remember that the application MUST somehow get access to the encryption key. If the application runs on the clients computer, then any skilled attacker would likely be able to gain access to it.

Thanks @Steve, seems I don't have another choice. According to other answers, AES-CBC with known or predictable IVs is worse than using AES with ECB. Also, the link you provided showed me the existence of SIV mode of encryption, which seems to be the best choice for what I want. Unfortunately, it is not widely implemented and there's no provider in Java to use it. As you said, my threat model is basically to protect the data from rogue DBAs, but I would like to protect it from everyone in the future. — rober710, Nov 24 '15 at 22:54

Encrypt DB Fields preserving search functionality

2 Answers2

Linked