Obfuscating IDs for greater security in DB?

Question

Original post: https://laracasts.com/discuss/channels/general-discussion/best-way-to-secure-healthcare-data-in-db

I have a problem dealing with high secure/sensitive (healthcare) data. I know about encryption and I'm encrypting some of my fields.

But what I have been told is to "obfuscate IDs" between tables.

The idea is: even if someone gets the DB, he cannot see which doctor a patient had appointments with (basic example).

But I'm googling and reading all over the place that it's not a very good practice (because it's harder to make joins, poor performance etc...).

Is obfuscating IDs necessary?

I don't know if it's a good or bad idea to do this ? In general, should I ignore encrypting ids and focus security elsewhere(i.e not at the data model level). — Charkhan, Nov 03 '15 at 16:00
If you're dealing with health care data then you should be following HIPPA/HITECH which specify the legally required way to handle issues like this. Whether that's the "best way" or not, it's the one that can result in major legal action against you and your company. Thank god for democrazy! — Dave, Nov 03 '15 at 16:25
Is HIPAA wordwide or only appliable to US ? The project wer're working on is targeted for GB. — Charkhan, Nov 03 '15 at 17:20

score 4 · Answer 1 · answered Nov 03 '15 at 18:39

If you are attempting to be compliant and/or avoid the breach notification rule under HIPAA in the United States, then obfuscating data relationships within your database will not help you at all.

Under HIPAA, if data is encrypted you need not notify patients/insurance providers/etc about the breach. However, encrypted means that it was encrypted using a FIPS 140-2 compliant cipher and only this encrypted data was obtained by the third party.

On a database, you can make use of tools such as transparent database encryption, or third-party encryption suites. This would mean if someone got a copy of the database, it would be FIPS 140-2 compliantly encrypted, and thus not triggering breach notifications or other consequences under HIPAA.

However, simply having a hard to understand database is covered nowhere under HIPAA, and security-by-obscurity doesn't win you any points on a CMS audit...even if you explain obscurity is a "compensating control" for a HIPAA requirement.

Also while security typically means keeping unauthorized individuals out, the next worst thing to be able to happen to an organization keeping PHI data is losing that data. Having a DB with complex relationships makes it easy to lose that data, and integrity of your data should be considered a security concern when dealing with HIPAA.

Note that transparent database encryption or other on-line encryption methods will do little against the online compromise of the database, e.g. through SQL injection. If someone gets a DB dump by running SELECT * statements, this data wouldn't be encrypted. For this, youl'l need to secure/segment your database users and also look into best practices for securing the frontend web application.

In short, encrypt the database somehow; but DB encryption should only be a small part of a larger HIPAA compliance strategy. Security by obscurity, however, should not be any part of your security strategy.

Thanks for the long description ! But do you know if there is HIPAA equivalence for England ? I'm not encrypting data with SQL command but via my app. So in theory it's harder to decrypt ? I didn't really understand fully what you describe, will try to look into it. — Charkhan, Nov 03 '15 at 21:51
HIPAA is a U.S. law so it does not govern healthcare data in England. However, the Data Protection Act (DPA) and the soon-to-be-adopted EU GDPR will govern data located in the UK. I am no expert int he GDPR, but it does include provisions similar to HIPAA regarding the use of encryption. — Herringbone Cat, Nov 03 '15 at 22:01

score 3 · Answer 2 · edited Mar 17 '17 at 10:46

You asked for best, let me start by qualifying my answer as best in a sense that it is my best effort to allow you to make your own choice perhaps with more confidence. Since 'best' can be quite subjective, and I am not the all powerful knower of things, I'm quite sure another best answer can be found.

I also want to quote AviD's Rule of Usability:

Security at the expense of usability comes at the expense of security.

(While there is no direct connection to this, I do want to point to it if only to make mention that if you break usability, then all is for naught)

My first thought is "Oh no, your entire DB is stolen!". I would hope that your first priority is to prevent this (I don't know if you're actually worried about this or if it's just planning for a worst case scenario). Then make every necessary effort to hide all required data. Any additional data that you can hide without a noticeable performance hit is gravy.

I've done a very small amount of HIPAA in my time and have found some rules to be perhaps intentionally vague. I assume this is to allow new techniques to be used without having to change the rules or something to that effect.

In short, if obfuscating ids is required in your situation, then you must do that (requirements are requirements), if not, then you can do that if you feel the difficulty and performance hit is tolerable. Follow the requirements first. Go above and beyond where it makes sense.

In your case, it seems as though it doesn't make sense to go to all that extra trouble only to make future maintenance that much more difficult. It might be a better use of time to focus on securing the DB itself while maintaining encryption/obfuscation only on the required fields. You can obfuscate ids all you want, but if the DB is easily stolen, then what is the point?

PS: I'm not a lawyer, nor do I play one on TV. Make sure you check HIPAA requirements yourself if you are in fact dealing with HIPAA.

Thanks for the answer ! This is more like a worst case scenario plan than anything else. There is many things to do before getting to steal the DB data but I'm not well aware of security requirement (To make it more funny : It's not the same requirements depending on the country...). But don't you think the solution I provided on the original post (optimius package) would be kind of weak to secure the data ? — Charkhan, Nov 03 '15 at 15:38
I was not able to follow the link to the optimus package, but you sound quite correct that obscuring ids will still allow for a pattern recognition. If you change one id to another equivalent id then you still have matching ids. If someone accesses your entire DB, it is reasonable to believe that they may also have access to the controlling files including the methods of obfuscation and encryption. — KnightHawk, Nov 03 '15 at 16:00
The Optimus could work if you only used it on only one table (patient or doctor table for example). Then, if someone got the entire DB but not the Optimus constructor values they would not easily make any direct connections (perhaps comparing appointment times could be used to circumvent this entirely). It still, however, adds a layer of difficulty to building and maintaining the entire system and the payoff is not encryption, but rather security through obscurity. It's merely a more sophisticated obfuscation than using base64 or another encoding method. — KnightHawk, Nov 03 '15 at 16:19

Steve Sether · Answer 3 · 2015-11-03T16:13:53.033

This is ultimately a terrible idea, and will cost you in terms of data integrity. Data relationships that are obscured are data relationships that will become corrupted, and unreliable. It doesn't even make you more secure. Obscurity protects only from the mildest of curiosity. Look at the long history of cracked obscured systems for evidence of this.

This approach will lead to lost appointments, possibly appointments made with the wrong doctor, and a generally unreliable system.

In general, these sorts of requirements (HIPPA, PCI, etc) are written by non-technologists, and are up for broad interpretation.

Ask the person driving the requirement if he/she is willing to sacrifice the integrity and reliability of the system for security through obscurity. That's essentially what's being asked for. My guess is someone is interpreting HIPPA is a very narrow way if they're asking for crazy things like "obscuring database table IDs".

My general advice is find someone with expertise in interpretting HIPPA requirements of what you can actually do rather than someone trying to interpret it in the most restrictive sense possible.

score 2 · Answer 4 · answered Jan 17 '17 at 21:43

Just started looking into HIPPA compliance for an upcoming application I'm building, and from what I see you're taking the rules/regulations a little overboard. As far as obfuscating your data goes, one of the main requirements is protecting your Electronic Protected Health Information or (ePHI) This includes stuff like Names, SSI numbers, or private or patient sensitive data. One of your internal Id's in your database wouldn't fall under this category, and trying to obfuscate that would just create a whole world of problems for you while not accomplishing anything. For example, if some data breach were to happen and they see that Id 5 is tied to id 8, it would not mean much to anyone without some other personal data, which would be encrypted.

Again I may not be the best source since I am just learning about HIPPA compliance also, but I know a good amount about PCI compliance and they seem pretty similar.

score 0 · Answer 5 · answered Jan 18 '17 at 15:30

I agree with the others: this is bad practice, from every POV, including design, audit, recovery, security, performance. The list goes on.

If a hacker gets into the database, the game is over. It doesn't matter if you obfuscated data, it will be a matter of time before that can be de-obfuscated.

RDBMS these days allow column-based permissions, and so, using real keys to join to other tables will allow standard SQL queries to be used, without the overhead to de-obfuscate. Moreover, your auditors will appreciate the attention to ability to scale, upgrade, troubleshoot, and still maintain integrity and security.

Having said that, some vendors, like Microsoft and Oracle, maybe others, have stepped up and allowed very focused encryption. Keys are stored in a separate system, and access to columns are protected. Microsoft SQL Server, for example, uses TDE (transparent data encryption) and extensible key management (EKM) to do just this. Therefore, your table structures don't change, and you don't compromise the overhead of joins with obfuscation, et al. So, while you might have a doctor and a patient, you won't necessarily know the names or other details of either, since TDE can allow encryption of these fields.

RDBMS systems also allow for encryption of the entire database, to handle instances where the database itself is stolen.

In short, don't obfuscate the keys to the data. Rather, encrypt the data that is sensitive. You can look at like this: Don't obfuscate the metadata; rather, only encrypt the data. Think of the keys and foreign keys as metadata.

Be careful, too, not to fall victim to the practice of placing strong security on the front door, all the while leaving the back door unlocked: your transaction logs, error logs, application, application logs, backup systems, and people who are trusted to have access to this data and applications are all sources for leaks, so, shore up permissions, and be careful what you log, and what you do with those logs.

Once you apply these principles, it becomes less important to obfuscate keys.

Obfuscating IDs for greater security in DB?

5 Answers5