3

Maybe I'm misunderstanding the purpose of k-anonymity, but I don't see the why HIBP uses it when checking user passwords.

This website, which explains HIBP's implementation of it, says, "The client will then truncate the hash to a predetermined number of characters (for example, 5) resulting in a Hash Prefix of a94a8. This Hash Prefix is then used to query the remote database for all hashes starting with that prefix (for example, by making a HTTP request to example.com/a94a8.txt). The entire hash list is then downloaded and each downloaded hash is then compared to see if any match the locally generated hash."

I can see why this would beneficial over an unsecured connection like HTTP, as anyone snooping the connection would get a whole list of possible passwords instead of just the plain text/hashed password. That being said, why not just send the password over an HTTPS connection to begin with? After all, it's encrypted in such a way that the data (i.e. the password the user submitted for checking AND any matches found) would be incredibly difficult to decrypt, maintaining data security and thus rendering the possible list of passwords unnecessary.

Am I misunderstanding (or just flat out missing) something here? It seems like extra work for no real reason.

noahG3
  • 33
  • 4
  • 2
    Does this answer your question? [How to explain "the k-anonymity model used by HaveIBeenPwned for pwned passwords doesn't expose your passwords" to a layman?](https://security.stackexchange.com/questions/238149/how-to-explain-the-k-anonymity-model-used-by-haveibeenpwned-for-pwned-passwords). In short: the idea is that HIBP does not get the password you are trying to check, so that one can use this service without fully trusting it to not collecting passwords. – Steffen Ullrich May 18 '22 at 05:24
  • @SteffenUllrich well, it already has the password if found, but this way it doesn't even get the hash – user253751 May 18 '22 at 09:44
  • @user253751: *"it already has the password if found"* - when using the API then HIBP does neither get the orginal password the OP checks nor the full hash of this password. – Steffen Ullrich May 18 '22 at 10:17
  • @SteffenUllrich it has the password already. It just doesn't know which one is being searched for. – user253751 May 18 '22 at 10:19
  • @user253751: HIBP does not have every possible password. So it might have the password checked by the client or it might have it not. It just presents options where the hash starts the same way as the hash for the checked password. This also means that HIBP cannot collect new passwords this way. – Steffen Ullrich May 18 '22 at 11:00
  • @SteffenUllrich In case HIBP has the password it already has the password and doesn't need you to send it. In case HIBP does not have the password, having the password hash is assumed not to be useful since it's not the password. However, it could allow HIBP to crack a few weak passwords. What's much more of a problem is HIBP being able to associate the password with the IP address, timestamp etc when it was requested. – user253751 May 18 '22 at 11:05
  • @SteffenUllrich That makes sense, thank you. And just to clarify, HIBP doesn't necessarily use k-anonymity for data security reasons, just to give users assurance that their data isn't being collected, correct? – noahG3 May 18 '22 at 20:28
  • 2
    @gorge42: *"HIBP doesn't necessarily use k-anonymity for data security reasons, just to give users assurance that their data isn't being collected"* - I don't see much difference in these two options. The best security is the one where the party does not need to be trusted to not collect sensitive data but where it is not able to collect sensitive data by design. Solely relying on trust might fail if the site gets hacked or the business model changes. – Steffen Ullrich May 18 '22 at 20:41

1 Answers1

4

It's so HIBP doesn't know your password. HTTPS doesn't protect your data from the server for obvious reasons.

eesiraed
  • 224
  • 1
  • 5
  • So would this be more about peace of mind for the user in the sense that they aren't sending sensitive data to some random server instead of actual data security? – noahG3 May 18 '22 at 20:23
  • 1
    @gorge42 I would count not sending your passwords to some random server as "actual data security." Even if you trust that website, others might not and that's perfectly reasonable. Also, as Steffen Ullrich mentioned, websites get hacked from time to time. – eesiraed May 19 '22 at 04:06