36

According to Is it common practice to log rejected passwords?, I know logging rejected plain text password is a bad idea, but how about if I store the hashed form of rejected passwords? I want to have more information about the failed login for analysis,eg:

  1. Check if it is just typo : if a user often failed login at first time with the same hashed password but later it logs in successfully, then it is possibly just a typo

  2. Check if there is someone trying to guess password : Some common passwords, just like 123456 and birthday, which has fixed hash, if those attempts exists, the failed login may be done by password guessers.

  3. Check if a user have alternative account : some users may have alternative account but with different passwords, if a user usually failed to login with the same hash, and the hash is the same as another account but then login successfully, then it may be a user trying to login with an alternative account but forgot to switch password.

My question is, is logging the hashed form of rejected passwords has the same problem as logging plain text rejected passwords?

ocomfd
  • 525
  • 1
  • 4
  • 7
  • 55
    This sounds like a really bad idea. – nulldev Jul 03 '18 at 06:19
  • 28
    None of these 1.2.3. can be done if password log is properly hashed (each hash is purely unique). See the good answer below for complete infos. – Xenos Jul 03 '18 at 08:55
  • 6
    1 is solved by logging username and times of success/failure. Failure followed almost immediately by success suggests typo. Multiple failures until the account is locked suggests either an attack or a user who has completely forgotten password. – Matthew Jul 03 '18 at 09:04
  • 38
    Scenario 3 offends my sensibilities as a user. If I have separate accounts it's for a reason and I resent your clumsy attempts to tie them both to a particular person. – Omnifarious Jul 03 '18 at 13:24
  • No.2 sound plausible, which required you to store a hash of the worst password list , e.g. https://www.symantec.com/connect/blogs/top-500-worst-passwords-all-time , but I don't see why you should use it. – mootmoot Jul 03 '18 at 14:03
  • 5
    @mootmoot Number 2 sounds like a good idea to implement by storing the 500 most common passwords in plain text in a lookup table in your application, and force user to not use these during sign-up. If you have a working app with user base, you could also force users to change their password on next signup if they use one of the common passwords. For this to work securely, no salt is needed at all, because the most common passwords are not a secret... – Alexander Jul 03 '18 at 15:39
  • @Alexander Agree. The check should be enforced during user registration, not during user login even though it can be done. In fact, I doubt many organisation has such resources to audit such information (which are mostly useless). – mootmoot Jul 03 '18 at 15:52
  • 1
    @Alexander Better to use something like this: https://haveibeenpwned.com/passwords Also: online you can easily find lists of the most common XXk passwords... 500 is way too small to be effective. – Bakuriu Jul 03 '18 at 17:52
  • For number 1: you can basically take the user input and create some variants and try all of those. In fact Facebook does this: it inverts the case of the password so you can login even if your caps lock is on – Bakuriu Jul 03 '18 at 17:53
  • 1
    @Matthew Locking the account in that case is a bad idea as it would make the system a trivial target for DoS attacks. A much better approach is to temporarily block IP addresses from which you see many incorrect passwords. Even that is risky though as you may come across such silliness as an attacker appearing from the same IP as a legitimate user because they happen to be accessing your site through a NAT. – kasperd Jul 04 '18 at 07:42
  • @kasperd A lot of attacks nowadays come from multiple IP addresses - I've seen brute force attacks against systems coming from 100s of IP addresses, all trying to get into a relatively small number of accounts. I tend to recommend temporary locking of accounts, which means unless the attacker is really persistent, they'll probably be unlocked by the time the legitimate user wants to access - IP blocking can be very risky if you have large numbers of mobile users, for example, who are NAT-ed in huge groups in some countries. – Matthew Jul 04 '18 at 07:48
  • @Matthew Even if each IP doesn't attempt more than one failed login per account it should still be blocked if it attempts failed logins against many accounts. And in some cases it is appropriate to expand the block to IP ranges if you see multiple IPs blocked in a small range with no known legitimate logins. The problem about NAT can be addressed in a couple of ways: Recommend that users stop using NAT and instead do something more appropriate like switching to an IP version with enough addresses. – kasperd Jul 04 '18 at 07:58
  • @Matthew If you upon successful logins store a cookie indicating that a client has previously logged in as a specific user you can use that cookie to allow that client to bypass the block when attempting to log in as that same user again. That will avoid a lot of the collateral damage caused by NAT. – kasperd Jul 04 '18 at 08:00
  • 2
    @kasperd They aren't useful things to suggest though: not many users have the choice about whether they are going through NAT or not, let alone about whether they can use IPv6 rather than IPv4. Not many businesses are willing to potentially block legitimate users as collateral damage from preventing locked accounts. Cookie concept sounds fine - it's essentially how Facebook allow login from known devices without additional verification. – Matthew Jul 04 '18 at 08:05
  • 1
    @Matthew Your initial suggestion of locking an account after a few failed attempts would allow an attacker to lock out a legitimate user without even needing to come from the same IP address. That is even worse. – kasperd Jul 04 '18 at 20:30
  • Here is an easier way to detect that it is possibly the same user, is it the same IP address and web browser? Seriously, much better ways to do this, and it is a bad idea regardless. – ewanm89 Jul 04 '18 at 23:36
  • One good reason I can think of to do this would be *if* you lock an account or force password reset after X failed logins, you could use this to prevent lockout after entering the *same* one or two typos X times (maybe caps lock is on or a keyboard key is sticky). But I don't know about what ramifications it may have in other areas. – Ben Jul 08 '18 at 17:54

7 Answers7

101

If properly hashed (i.e. with random salt and strong hash) a hashed password is not reversible and hashed passwords for different accounts differ even if the passwords are the same.

This means that almost none of the analysis you want to do can be done with the properly hashed (i.e. random salt) passwords in the first place, i.e. you gain almost nothing from logging passwords and at most you lose since you leak some information into places where an attacker might get easier access since logs are usually not considered as sensitive as stored passwords.

When using a plain hash instead (no salt) some of the things you mention are possible at the cost of an increased attack vector since now an attacker can use pre-computed hash tables to reverse your logged passwords.

Maybe you have some misconceptions about what proper password hashing means. I recommend reading How to securely hash passwords? to get an idea how proper password hashing is done and why it is done this way but in the following I will address some of the misconceptions you seem to have:

Check if it is just typo : if a user often failed login at first time with the same hashed password but later it logs in successfully, then it is possibly just a typo

You cannot check a typo using the hashed passwords since even a small change on the input results in a huge change in the output. You also cannot check for the typo in the original passwords since you cannot reverse the hash to get the password for comparison. To check if the entered wrong password is always the same you could log the hash salted with the same salt as the stored (correct) password as Jon Bentley suggested in a comment. If the logs are at least as well protected as the stored passwords then this would only slightly increase the attack surface, but as I said logs are commonly not considered as sensitive as password storage.

... Some common passwords, just like 123456 and birthday, which has fixed hash, ...

Proper password hashing uses a random salt to make attacks using pre-computed hashes with common passwords impossible. This means that the same password results in a different hash when it gets hashed, i.e. your assumption of these passwords having a fixed hash is wrong. Again, you could use the more easy to reverse unsalted hashes here at the cost of an increased attack surface.

It might instead be better to do that kind of analysis when the entered password is still available and only log the result of this analysis.

... if a user usually failed to login with the same hash, and the hash is the same as another account but then login successfully, ...

Since hashes for the same password differ you need the originally entered password to do this kind of comparison. Since you cannot get this back from the logged hash it does not help to have the hashed password logged. Again, you could do this kind of analysis with unsalted hashes, but this would mean that all of your accounts must have their password available in the insecure unsalted way - which is a large increase of the attack surface. You could probably do this kind of analysis with salted passwords too but then would need to log the newly entered passwords salt-hashed with all the salts you currently have in use (i.e. one for each account).

psmears
  • 900
  • 7
  • 9
Steffen Ullrich
  • 190,458
  • 29
  • 381
  • 434
  • 5
    In the context of what OP wants, wouldn't the natural choice be hashing with the salt of the account's actual password (the same hash operation used to verify the password)? It's still a really bad idea, but unlike independently random salt it would give information about whether the user tried the same wrong password they did at some time in the past. – R.. GitHub STOP HELPING ICE Jul 03 '18 at 15:39
  • 2
    @R..: I think your proposal is covered by the answer. To cite from it: *"To check if the entered wrong password is always the same you could __log the hash salted with the same salt as the stored (correct) password__ as Jon Bentley suggested in a comment"*. – Steffen Ullrich Jul 03 '18 at 16:20
  • OK, I missed that. Still seems useful to highlight. – R.. GitHub STOP HELPING ICE Jul 03 '18 at 16:22
  • 3
    You mention increased attack surface/vectors a couple of times, but I think those mentions could use some more alarmism. The risks posed by using weak hashes or not using salts are properly regarded as unacceptable for password hashes. The OP, being this unfamiliar with password protection, might not have a grasp of just how bad the problem is just from those phrases. – jpmc26 Jul 03 '18 at 23:15
  • 1
    One caveat I am missing in this answer is the possibility of a user mistakenly typing in their password for another system. If the user did accidentally type in their password for another system it would be very bad to store that information in any form, even if it is properly hashed. And there is no reasonable way to verify if the typed in password was for another system and for that reason shouldn't be stored. Thus the only way to avoid that is to just not log any information about the contents of incorrect passwords. – kasperd Jul 04 '18 at 07:49
18

This is not a common practice, and goes against security in general. IMHO, none of the reasons you list are good enough to log hashed passwords. In fact, I can’t think of a good reason to log them. You might run into compliance/legislation issues (PCI for example). Users typically fail passwords by typos, forgetting which password they used for which site, etc. You also will have these passwords in a place other than your database, even a remote server if you are using some sort of logging via syslog.

None of these are reasons to log hashed passwords. To answer your question directly, it is “better” to log hashed passwords compared to plaintext ones, but both are very bad and should not be done.

Example: Part of HIPPA focuses on the need to adequately and effectively protect Electronic Protected Health info (ePHI) by adhering to good and standard practices. According to the auditor I work with, logging passwords (hashed or otherwise) would leave you as incompliant because it is not a recommended and standard practice.

pm1391
  • 1,427
  • 2
  • 8
  • 19
  • 12
    This answer would be better if you justified it. E.g. *"goes against security in general"* - in what way? *"might run into compliance issues"* - might? can you give concrete examples? *"I can't think of a good reason"* - the OP provided reasons; perhaps you can explain why they are wrong? *"very bad and should not be done"* - why? This answer reads like *"this is a bad idea because I say so"* but really the OP is seeking an explanation as to why it is a bad idea. – Jon Bentley Jul 03 '18 at 07:10
  • @JonBentley Noted and agree. However, I believe I offered a example of why it is bad (storing passwords in log files when usually you don't protect those the same way as a database). My answer simply tried to answer the question asked , "is it a common practice to log passwords" – pm1391 Jul 03 '18 at 12:07
13

You're making many assumptions about:

  • The usefulness of these actions.
  • That hashing and storing the passwords are the only way to achieve those actions.
  • Even that some of those actions can be gleaned from a properly salted and hashed password.

If a user fails a login, it could be for many reasons. For example, it might be because they have multiple passwords and are cycling through them, wondering which they used on the site... now you went from storing a (salted) hash of the users password for that one site to potentially multiple (salted) hashed passwords that the user tends to use. Similarly with emails, I have multiple emails I use, I may not recall which I used for the site. Now, you're storing my emails to correlate them (this portion depends on your correlation of password to user logic). Basically you're making a compendium of usernames/emails and password combinations for a potential future adversary of your site. Some of your ideas:

"Check if it is just typo"

Why do you care? If it is a typo, the user made a typo and figured it out. If your goal is to develop fingerprinting to tell an attacker brute forcing, there are other ways. This also isn't possible with a properly salted and hashed password. That is the very goal of salting and hashing, you, yourself, shouldn't be able to reverse engineer what the original plain text was without considerable effort. If you don't have the plain text and the hashing algorithm is good, you will not be able to achieve telling if a typo was made.

"Check if there is someone trying to guess password"

First, make sure actual users don't have these passwords. Second, there are other methods. Third, if you really wanted to, you could in stream hash the input and compare it to hashes of known weak passwords, if so, then log that it matched, there is no need to log the password in any form itself.

"Check if a user have alternative account"

Unless you have a an explicit reason to do so, something that goes against your terms, this is unneeded from a user standpoint. Further, a properly salted and hashed password will not be immediately comparable to other records in your system. That is the point of salting, so pre-computed hashes of various plain text inputs can't be matched. Thus, if someone did have the same password across two different accounts, that hash by the notion that the salt added to them is random, will not match regardless. Even if they did match, it could be an entirely different user with the same password or a password that is one character off (typo, or password variation).

The very idea pretty much violates every security notion that exist. I would not recommend it.

5

Steffen's answer is spot on but I think it's important to know that you have other options without logging the hashed passwords. My answers assume that you're running an external website with a login. If it's internal, you can store just about everything about the computer/device they logged in on.

Check if it is just typo : if a user often failed login at first time with the same hashed password but later it logs in successfully, then it is possibly just a typo

Just log when a user fails and succeeds. You can run reports and see if a user frequently fails the first login.

Check if there is someone trying to guess password : Some common passwords, just like 123456 and birthday, which has fixed hash, if those attempts exists, the failed login may be done by password guessers.

Log the IP Addresses of login attempts. While this is a common practice, you'll have to make sure that you do this in a legal way in the places that you operate. Many sites require additional information if you login at an unfamiliar place.

Check if a user have alternative account : some users may have alternative account but with different passwords, if a user usually failed to login with the same hash, and the hash is the same as another account but then login successfully, then it may be a user trying to login with an alternative account but forgot to switch password.

This seems like a combination of the other two answers.

MiniRagnarok
  • 151
  • 2
1

You can accomplish what you want without password logging. But it's also important to ask "why?" Why do these checks? What are you realistically going to do with this information? Can they be accomplished by passive means?

Check if it is just typo : if a user often failed login at first time with the same hashed password but later it logs in successfully, then it is possibly just a typo

Check if there is someone trying to guess password : Some common passwords, just like 123456 and birthday, which has fixed hash, if those attempts exists, the failed login may be done by password guessers.

Logging the account, timestamp, IP, and whether they succeeded is sufficient. You can deduce a lot from this information.

If you see a pattern of fail, fail, fail, pass in quick succession for the same account it was probably a typo. If you see a great many failures in a row and very quickly that's likely an attack on that account.

But a smart attacker will play the numbers game and distribute their attacks across multiple accounts. If the attacker has no information about your accounts, any given password guess is just as likely to work on any given account. If they have 100 passwords to try it doesn't matter if they try them all on 1 account or spread them across 100 accounts, the odds are the same.

Similarly they might use multiple IP addresses. If you see a group of nearly all failures from many accounts but a single IP that could be a distributed attack, or it could be a bunch of real users behind the same NAT or VPN. It's difficult to tell.

Realistically there's a much simpler way. These attacks are made more costly by rate limiting and soft lockouts. Guessing a password relies on being able to do a great many login attempts very quickly. Logins should already be rate limited by account, there's no reason a real user will attempt to login multiple times a second. Pick the highest rate limit you can that is still not noticeable by your users. This will foil rapid-fire automated login attempts. If there's persistent attempts anyway, automatically and exponentially increase the rate limit for that account (and possibly IP), reduce it once the alleged probes subside.

In addition, you can provide good feedback about password strength when users are creating their accounts to avoid easily guessed passwords.

In this way you can foil automated, rapid-fire probes while not annoying legitimate users who are just having a hard time typing today.

Check if a user have alternative account : some users may have alternative account but with different passwords, if a user usually failed to login with the same hash, and the hash is the same as another account but then login successfully, then it may be a user trying to login with an alternative account but forgot to switch password.

This seems more a business requirement than a security issue. You're essentially trying to detect the unique person behind an account, and that's quite difficult to get right: the internet very much does not want you to have that information. The cost to get it right is high, and as you get more and more users it's more and more likely to have a lot of false positives annoying your real users.

Unless that's your business, if you're a data mining company for example, consider why you have this requirement: what's its real effect on your system? What metrics do you have to back that up?

If you decide you really need to prevent alts, make use of various semi-unique real world IDs to add a cost to alt accounts and reduce their frequency. Email address is the most common. Making accounts cost money or requiring a credit card number is another. Even tax identification numbers if you're a big business site. It all depends on what you're doing.

Schwern
  • 1,558
  • 8
  • 17
1

Is this common practice, certainly not; is it a good practice, probably not. However I have seen this once, and to have a complete discussion on the subject I wish to go against the grain here and say what it can be useful for, and what precautions were taken to avoid the problems noted by others.

In a context of incessant external attacks on well-known logins, using brute force, dictionary attacks, one-password-any-user attacks, spearphishing and password-sniffing malware, the organization was having severe problems with mobile phones, tablets, and password change.

When the mobile session got reset following a password change, the mobile phones typically tried the old password several times before giving up, usually some multiple of three (maybe mail, calendar, tasks, or something like that), before prompting the user for a new password. This locked the user's account because of an administratively and technically non-negotiable three-strikes rule on the authentication backend.

It is very possible that this behavior of mobile devices has been corrected since then (this was some years ago), and I would be interested to know if this is the case.

In short, users got a "change your password" on their desktop every month, and those who did not immediately change the password on all their mobile devices got locked out across all devices, even those connected to the (more) secure internal network, which was a problem more severe than "merely" being locked out coming from the Internet. Sometimes, immediately wasn't enough, and there was a procedure with a complicated dance involving airplane mode to avoid being locked out. Users having both mobile phone and tablet of course had even more problems.

The organization implemented logging of the hashed password in the following manner:

  • inserted as a layer between the application and the authentication backend.
  • hashing using PBKDF2.
  • one salt from midnight to midnight, one salt from midday to midday, two hashes stored for each login. The salt was kept in the layer and not stored anywhere else (a layer restart meant a new salt, and when the salt was regenerated it was lost, since it was not stored in the hash, like bcrypt would have done).
  • pepper with login (meaning that identical passwords for different accounts were not detectable).
  • stored in a specific (non-helpdesk-accessible) database and (optionally) logged to a specific datastore in the form failure(date, user, source IP, hash1, hash2, useragent . . .) and success(date, user, source IP, useragent . . .).
  • database used an API forbidding any actions not specifically listed here (for example, listing hashes for all users).
  • upon receipt of an authentication request:
    • check in the database if a password with the same hash for that user has been rejected (in the preceding 24 hours), and if so, refuse the login without presenting the authentication request to the backend, and log that reason in the helpdesk-accessible logs (without any hash).
    • check in the database if the same source IP has failed authentication for multiple user accounts in the last n hours, with no successful logins at all, and if so, refuse the login without presenting the authentication request to the backend, and log that reason in the helpdesk-accessible logs (still without any hash).
    • otherwise, pass the request to the backend, and log/store the result.
  • when the user password is changed, the authentication backend triggers removal of all records for that user from the database (which logs the fact that the password is changed). This was the only change to the authentication backend.
  • records older than 24 hours were expunged from the database.
  • logs were only accessible to the security auditors.
  • in the case where logs were accessed, the API to access them strips the hashes, presenting only names in the form PASS1, PASS2 . . . PASSN to the security auditor. Access to the raw database was extremely restricted, basically only on presenting credentials of a senior developer and of a security manager. This was not really because of the presence of the hashes, it was because of the sensitive nature of the logs in general, and was (is) standard procedure in the organization for access to raw databases outside the approved API.
  • it was debated whether it was necessary to actually store the user name. The same desired features could probably have been achieved without doing so, and that would have avoided unnecessary user tracking. In the end it was deemed important to know whether an attacker is targeting one user specifically or has access to a list of users.
  • there was also a function logging non-existent users in a secure (hashed) way (made possible by the authentication backend returning a detailed reason for any refusal), so that an external agent trying many non-existent users would also be blacklisted for some considerable amount of time.

In short, the concept behind this implementation was that dictionary attacks are a problem, but repeated tries of the same (wrong) password do not constitute a dictionary attack.

This implementation:

  • successfully avoided getting users locked out when one of their devices repeatedly tried the same password, which was the desired and extremely urgent goal
  • prevented brute-force attacks targeting common passwords against multiple accounts, which was not a function that the authentication backend could provide
  • decoupled authentication from Internet from the internal system, so that attacks from the Internet would not impact the user logging in on their workstation
  • greatly reduced calls to help-desk on the subject
  • greatly cut down on escalations to tier 3 because tier 2 now had access to the reason for rejected authentication for a given account (with the user-agent but without any hashes).

To answer the question . . . you present several reasons for logging the hashes of rejected passwords, but as you see this concrete example of actually choosing to log the hashes of rejected passwords does not apply to those reasons. I personally think that the reasons you give are not valid.

  1. Check if it is just a typo: as long as a user actually logs in after a limited number of attempts, you do not have a problem. It can be a typo or an old password or the password of another account, but should you care?

  2. Check if there is someone trying to guess password: yes, but you do not and should not compare to hashes of known passwords. If you want to prevent common passwords being used, you should enforce rules when the user chooses a password, not afterwards.

  3. Check if a user has an alternative account: linking accounts can possibly be done by source IP, but I do not think that knowing that a user is first trying some other user's password before their own is something that will help you prevent unauthorized access. If you have some pressing and legal reason for preventing users from having multiple accounts, you would do better by tracking source IPs, devices, and login times that by tracking mistakes in passwords.

More could have been done by identifying the mobile device, but the real solution here was Mobile Device Management and a hardware MFA device for desktop logins, and once the situation was stabilized, that was the choice of the organization in question.

Hope this helps.

Law29
  • 721
  • 1
  • 5
  • 10
0

As indicated by several answers on this post it may not be practical to analyze failed passwords with stored passwords (hashed) to analyse the use cases as you have indicated due to enhanced security features like 'Salting', etc.

The one of the real world requirement I can see is, below from HIPAA (Health Insurance Portability and Accountability Act) standard but there also no insistence to record the failed passwords but to record/monitor the other information as highlighted below:

As per HIPAA standard [164.308(a)(5)]- Procedures for recording logon activity including failed login attempts

As per this procedure you have to record logons and logouts occurring within your systems. You have to Track failed logon attempts including those using an incorrect password and those attempt with a non-existent account. The logging should include failed IP address, logon name, error type, and system name.

Hope this is useful...

Sayan
  • 2,033
  • 1
  • 11
  • 21