2

I'm designing an API with token authentication.

I don't want to store tokens as plain text in the database, for the same reason user passwords are not stored as plain text: if the database gets compromised, the attacker should not be able to extract any usable token from it.

My current plan is to generate tokens 40 chars in length, composed this way:

  • the first 20 chars would be the token "ID" (the primary key in the database)
  • the next 20 chars would be the token "password"

Upon generation of the token, I would send the full token to the client, and store in my database:

  • the token ID
  • a SHA1 hash of the token password

This way, my database only holds half of the actual token sent to the client, and can only verify tokens, not retrieve them.

I'm not planning to add a salt: as I understand it, the whole point of the salt is to prevent hash table / rainbow table attacks againt commonly used or short passwords, while in my case passwords are totally random, with enough entropy (67 possible chars, 20 chars in length = 4×1036 combinations). Unless I missed something, adding a salt in this case would be the same as creating a longer random password.

Also, I'm not planning to use an expensive hashing technique such as Bcrypt, as it would be too expensive: unlike user authentication, where the user authenticates once and then gets a session ID, the token is the only method of authentication here and is going to be sent with every single API call; a 50ms hashing method is just not acceptable here. I don't consider having an expensive hashing technique particularly more secure, for the same reason exposed in the previous point: the password is random with enough entropy, so even with a powerful hashing machine, it would still take billions of years to bruteforce.

Is there any flaw in my approach?

The only one I can think of (provided someone gets access to the database!), is if a vulnerability is found in SHA1, so that it becomes feasible to find an input that gives a given hash as output (this somehow happened to MD5, I heard). But I guess this is the same thing for every hashing algorithm out there, Bcrypt included?

BenMorel
  • 909
  • 1
  • 7
  • 13
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackexchange.com/rooms/93195/discussion-on-question-by-benjamin-is-hashing-without-a-salt-secure-for-random-p). – Rory Alsop May 03 '19 at 15:40

1 Answers1

3

It's routine to use a cryptographic hash function for such purposes. The relevant property is called preimage resistance. For a function H it is not feasible to find x such that y = H(x) given an arbitrary y.

(Even SHA-1 has this property, despite not being collision resistant. However, you should still use something else, like SHA-2.)

Preimage resistance is part of what makes a hash function one-way. The best attack against hashed passwords is guess-and-check. If you make totally uninformed guesses against a 512-bit hash then the probability of guessing a valid x is 2-512. The expected number of guesses you need to make before you succeed is unimaginably large.

When it comes to recovering a password from its hash, it doesn't take nearly that many guesses for a typical user's password. Password crackers have a good understanding of what kind of passwords humans choose. They succeed because they prioritize checking plausible candidate passwords.

If someone has an unrealistically strong password, then it is almost certain that the password cracking attempt will fail, just as if the cracker were making uninformed guesses. But since people don't choose nearly strong enough passwords, developers should choose a dedicated password hashing algorithm. The goal of these algorithms is to make the process of testing candidate passwords more expensive. (People should use Argon2id today. It's expensive, relatively efficient on commodity hardware, and harder to parallelize than bcrypt or PBKDF2.)

If every person used passwords that were sufficiently unpredictable, then expensive hash algorithms wouldn't be needed or useful. If each password were unique (and never reused) then salting wouldn't be necessary either. (But salting is necessary for ordinary passwords for multiple reasons.)

If you control the "password" then you can make sure that those values are sufficiently unpredictable. (At that point, you may as well call it a "key".) As long the the hash function is preimage resistant and inputs each are sufficiently unpredictable, then plain old (secure) hashing is safe, even if the table of hashes gets leaked.

Because it doesn't matter if the hashes are leaked, you don't need a separate ID to use as a primary key. The hash of a password itself can be used as the primary key.

(You wouldn't want to do this for user-chosen passwords. A side channel (like timing attack based on database lookup) might leak the hash output, which potentially could allow offline password cracking.)

If the entropy of each token is large enough (256-bits is definitely safe, see "birthday paradox") then you can be assured that each token is unique. Similarly if a hash function's output is large enough, then distinct (real-world) inputs will result in distinct hash outputs. Thus, it is okay to generate a 40 character token, give the token to the user, and store a 64 byte hash in a database (alongside the account ID, expiration time, or anything else relevant).

Future Security
  • 1,701
  • 6
  • 13
  • Excellent answer, thank you. It did not even cross my mind to use the hash itself as the primary key, that's a very interesting idea: indeed there is no reason to split the token into an "id" and a "password", I can hash the whole token and use that as PK; the only drawback *could* be performance by requiring a longer primary key, but I'm not sure if this makes a practical difference. – BenMorel Apr 14 '19 at 20:13