1

During my IT school years, I was told that including rows identifiers from a database in the resource URL, in the context of a REST API, is a bad practice. To my understanding, the rationale behind this statement is that exposing technical database identifiers is a security breach.

For instance, say I have a table USER_ACCOUNTS(id, username) in my database. Say I’m using a REST API to expose the user accounts data. How would I expose the URL without using the real column ID ?

If the user ID is 3 in my database and I want to access his profile, how would I design the URL without having something like /users/3 ?

Is it true that exposing database IDs in the URL is a bad practice ? If so, what are the alternatives ?

I thought of using a hash & salt value instead of the real, plain ID, but I can already see many drawbacks to this solution.

How do you deal with exposing database identifiers in your URLs ? What are some of the best practices to deal with this issue ?

user244072
  • 11
  • 3

2 Answers2

1

The answer is not far away, you need an Id to extract the user, however, the Id would be an UUID instead of running number which can easily be pooled. You can easily use code libraries to generate the UUID for your continence.

Guy
  • 11
  • 1
  • many modern databases also can generate a UUID or even use it as a primary key; depending on database that might have some performance impact when the tables get really large (but there are ways around that). example for uuid support: https://mysqlserverteam.com/mysql-8-0-uuid-support/ btw. you might want to explain what pooling means in this context. – Frank Hopkins Oct 13 '20 at 00:36
1

Why we don't put certain data in the URL

First of all, one big reason not to include certain data in the URL is because (1) URLs are saved in your browsing history, which makes it much easier to get your data with just your history (cookies are usually changed as you use your web applications and request data is usually not saved), and (2) URLs aren't protected by HTTPS, which is used to encrypt your data as you browse (the URL is encrypted by HTTPS). This simply means The URL is much more easily captured by anyone.

Making access tokens safe from enumeration

Your users will have some control over the data they're sending to your server. They can be legitimate users, editing their cookies, headers, and other data either directly in the browser (Chrome/Firefox developer console) or using a proxy (Burp Suite, zaproxy, etc.). This means they can see and manipulate the user_id=3 in the requests.

We should not use incremental values for user IDs. Instead, we use a unique and randomly generated ID for every user. For example, if I sign up with the email JohnDoe@example.com, and the next person signs up with Alice@company.com, we should each have unique IDs with no clear relation to our email or sign up order. If I see user_id=10 in a query, I will try changing it to user_id=11 and so on. However, seeing id=MTExIC1uCg== will make it look more random and harder to crack (at least until you notice it's base64-encoded, which is not encryption). If I see a totally random & long string (e.g. pE5Y9G7HqNHe3bUq7Scz) as the user ID, I will consider it safe from enumeration since I most likely can't grant me access to another account by simply changing a character and, unless I'm into cryptanalysis, move on.

Make sure your access tokens are unique

Using salted hashes is nice and all, but there's the possibility of hash collisions, where 2 different salted passwords generate the same hash, in which case you'll have to find a way to resolve such cases, however unlikely they are. It's not a good day when I login and someone get access to some random user's account.

For this, you can do what YouTube does and generate a random & unique ID for every user in just a few steps

1. Generate a random value
2. value already exists -> go to step 1
3. Value is unique -> use value as ID

Of course, you have to use a Cryptographically secure pseudorandom number generator (CSPRNG) for generating the random IDs so that people can't analyze existing keys and guess valid keys.

ChocolateOverflow
  • 3,472
  • 4
  • 17
  • 34
  • 1
    URL’s are protected by HTTPS, except the server name due to SNI, which exposes the server name during the TLS handshake. But the path and any query parameters are protected. On the other hand, every URL is stored in the browser history which could potentially be exposed at rest on the client device. – Craig Tullis Oct 13 '20 at 05:07
  • Thank you for your detailed answer. Could you please clarify the following point ? To avoid slowing down the database JOIN queries using the publicly displayed UUID value, can I still use an auto incremented integer as a primary key ? Given this auto incremented integer would never be displayed publicly and would only be used to speed up the JOIN operations ? Would it defeat the purpose of using UUIDs ? Thank you – user244072 Oct 13 '20 at 19:30
  • I'm not very good with databases but you may wanna use [hash tables](https://www.hackerearth.com/practice/data-structures/hash-tables/basics-of-hash-tables/tutorial/) with randomly generated IDs rather than incremental IDs. A very simple implementation of this would be: (1) Generate a random *integer* as the user ID, say `user.id = 514`; (2) If your database is an array of length 1000, that user ID would be the index of that user in the array. With a hash table, you can extend that example to use not just integers, but much longer strings like alphanumerical & special characters. – ChocolateOverflow Oct 14 '20 at 05:31