39

For input validation on a website, are there any security concerns with disclosing to the user exactly what characters are valid or invalid for a given field?

CWE-200: Information Exposure says one should try not to disclose information "that could be useful in an attack but is normally not available to the attacker". A specific example would be preventing a system from exposing stack traces (addressed by CWE-209: Information Exposure Through an Error Message).

Should one ensure that error messages are vague, like "The text you entered contains invalid characters"?

Is it a security vulnerability to include the regular expressions used to validate input in client-side code, such as JavaScript, that would be visible to attackers? The alternative, validating inputs server-side, would reduce usability somewhat as it would require more backend communication (e.g. it could cause the site to respond and display errors slower).

Or is this a form of "security through obscurity" as the user/attacker can deduce what characters are valid by repeatedly submitting different characters to see if they produce errors?

Is it worth the hits to user experience to potentially slow down an attack?

I should note, as far as I can tell, OWASP's Input Validation Cheat Sheet and Data Validation development guide don't provide direction on this topic.

Edit 2020-01-17:

There have been several questions (including answers that I went to the effort of writing comments on that have since been deleted) as to why one should be doing any input validation.

First off, thank you to @Loek for the comment pointing to OWASP's Application Security Verification Standard that provides guidance on passwords on page 23 and 24: "Verify that there are no password composition rules limiting the type of characters permitted. There should be no requirement for upper or lower case or numbers or special characters.".

I think we can all agree that limiting characters in a password is generally a bad idea (see @Merchako's answer). As @emory points out it probably can't be a hard and fast rule (e.g. I have seen many mobile apps that use an easier to use secondary "PIN" to secure the app even someone else has access to log into the device.) I didn't really have passwords in mind when I asked this question but that's one direction the comments and answers went. So for the purposes of this question let's consider it to be for non-password fields.

Input validation is part of "defense in depth" for websites, web services, and apps to prevent injection attacks. Injection attacks, as stated by OWASP, "can result in data loss, corruption, or disclosure to unauthorized parties, loss of accountability, or denial of access. Injection can sometimes lead to complete host takeover. The business impact depends on the needs of the application and data."

See OWASP's #1 vulnerability, A1-Injection, and CWE-20: Improper Input Validation for more detailed information. Note they both say input validation is not a complete defense but rather one layer of that "defense in depth" for software products.

csrowell
  • 487
  • 1
  • 4
  • 7
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackexchange.com/rooms/103507/discussion-on-question-by-csrowell-is-it-a-security-vulnerability-to-tell-a-user). – Rory Alsop Jan 20 '20 at 14:51

7 Answers7

127

... that could be useful in an attack but is normally not available to the attacker

Knowledge of invalid input characters are useful but can easily be found by the attacker with just a few tries. Thus this information is not really secret and keeping all users unaware of what exactly went wrong does not actually deter attackers, it only keeps innocent users away since they cannot continue.

Conor Mancone
  • 30,380
  • 13
  • 92
  • 98
Steffen Ullrich
  • 190,458
  • 29
  • 381
  • 434
  • 4
    IMO that's the whole point of the question. I agree with the answer, but the same happens with the server version, if disclosed via HTTP headers. The latter, however, falls into the scope of the CWE because 1) input characters disclosure are *useful* to the user and easy to discover and 2) server software version *has no usability/compatibility* effect while increasing attack surface – usr-local-ΕΨΗΕΛΩΝ Jan 14 '20 at 21:06
  • 5
    @usr-local-ΕΨΗΕΛΩΝ: In case of invalid characters the attacker can quickly find out which ones are invalid by trying a few. In case of the server version the attacker cannot easily determine the version though unless the server explicitly provides it (in the header). Therefore providing the server version is a problem while providing which characters are invalid not. – Steffen Ullrich Jan 14 '20 at 21:20
  • 15
    While not a security vulnerability in itself, when a website tells me not to use `'`, `"`, backslash and `-` in my password that immediately brings into question whether they have SQL injection issues and/or use plain text storage for the entered information. – SeinopSys Jan 15 '20 at 18:43
  • 4
    @SeinopSys That could also be a security in depth approach. You can use parametrized queries everywhere but also disallow typicall SQL-injection characters if they can be avoided in certain inputs. So if somebody inserts a possible vulnerability during development you have a chance that it's a little bit harder to exploit. – Giacomo Alzetta Jan 16 '20 at 08:51
  • 3
    "server software version has no usability/compatibility effect while increasing attack surface" **this is security by obscurity, and it does not prevent anything**. Rather than hidding the server version, one should update it to a secure version. Lying / hidding it does nothing to prevent the issue and only makes it slightly harder to attack. – Vinz243 Jan 16 '20 at 13:12
  • @Vinz243: This would only be security by obscurity if the server would run an insecure software version and the admin would know it and not patch it. But otherwise is hiding such information just part of a defense in depth, i.e. make it harder for the attacker to gain information about the infrastructure. – Steffen Ullrich Jan 16 '20 at 17:26
  • 2
    Security.SE already has plenty of dedicated questions on the topic of hiding server versions: [Hiding version - valuable or just security by obscurity?](https://security.stackexchange.com/q/14709/10843) , [Is displaying what server I am running on the error pages a security risk?](https://security.stackexchange.com/q/4940/10843) , and more generally [The valid role of obscurity](https://security.stackexchange.com/q/2430/10843). – Brian Jan 16 '20 at 22:43
40

Creating a psychologically frustrating situation for users could incline them toward less secure decisions. For example, they try to write a password in Swedish, but your input refuses the character å without explanation. Instead of picking a different good password without å, they throw up their hands and use password123—one that's easily defeated with a dictionary attack.

Thus, you have attempted to create a "more secure" system through obscurity, but once we account for user behaviors, it is in fact less secure.

Valid characters are ultimately discoverable through systematic means. Hiding them is not worth the damage you might do to user behaviors.


For further reading, see "Security and Cognitive Bias: Exploring the Role of the Mind" and "Measuring the Security Impacts of Password Policies Using Cognitive Behavioral Agent-Based Modeling" by Sean Smith et al.

Merchako
  • 501
  • 3
  • 5
18

It depends.

If the message describes an error from the normal users point of view, like "The 3rd character must be a digit", it is not a security vulnerability. But displaying regular expression used for validation means disclosing implementation details which can be a weakness, e.g. some libraries use recursive calls extensively and some expressions can cause stack overflow or high memory and CPU usage, which may lead to DoS.

The CWE-200 defines disclosure of information as a weakness only if user is not explicitly authorized to have access to that information. You are considering user input. Means, user is allowed to enter this information and thus allowed to see this information and naturally he is allowed see any validation error messages related to this input.

Stack trace as well as other technical messages can provide information about the technologies used in the application (e.g. what version of what libraries are used), about the environment (e.g. directory layout and permissions, OS type and version), etc. An attacker can use this information to effectively choose exploits for these particular technologies, libraries, OS, etc. That's why disclosing such information to user is potentially a weakness.

Not explaining the user what is wrong in his input is definitely bad UX, but not security improvement.

Don't confuse that with similar cases, when data is sensitive and any validation related to it can be sensitive. For instance, if user entered letters in the field where you expect a year, then explaining such error to user is safe. But if the user enters an email address and you check if such address is registered in your system, then any information (address is known, address is unknown) can be a weakness. That's why when you are going to implement validation messages or any other feedback regarding user input, evaluate the consequences. But prohibiting any feedback by default for all fields can be a poor UX with a false feeling of security.

mentallurg
  • 10,256
  • 5
  • 28
  • 44
  • Sorry, it wasn't my intention to suggest showing the user the RegEx. It was more along the lines of wondering if a RegEx should not be embedded in client-side code. I'll revise my question to make this clearer. – csrowell Jan 14 '20 at 21:43
  • 2
    @csrowell Don't forget ALL that is client side is "shown". Even if you don't show it on the screen it is shown to the malicious type of user (who certainly looks through all client code he gets transmitted). – Thomas Jan 15 '20 at 11:17
  • 1
    @Thomas: yes, and I think that was the OP's point. Is it safe to use regexes in client-side JS in the code that helps the user figure out why their input was invalid? (Because that will disclose the rules to attackers). – Peter Cordes Jan 16 '20 at 03:03
  • @PeterCordes I don't see how does it matter, since the check must still be duplicated on the server (otherwise malicious user could craft unacceptable password and submit it, bypassing the client-side check). If you already do checks client-side in addition to server-side (for example to reduce server load on invalid attempts), there is no point in hiding unacceptable characters, – Dan M. Jan 20 '20 at 12:15
  • @DanM.: Right, the answer to the question is that keeping this obscured hurts usability more than it helps defend anything. I'm only trying to state what the OP was asking, not ask it myself. – Peter Cordes Jan 20 '20 at 20:05
9

This is just one of the many scenarios where security tradeoffs are expected in favor of the usability of the system.

However, there is a huge difference between showing stack traces (poor error handling), and disclaiming which characters are expected or prohibited in an input field.

For most scenarios, one can argue that disclaiming which characters are expected or prohibited in an input field is not a security vulnerability at all, if other security mechanisms are in place, such as limiting password attempts.

It can be determined using a threat modelling framework that this is an accepted risk of system.

Filipe dos Santos
  • 1,115
  • 5
  • 15
9

Obscurity is not security. A good system is secure even when an attacker knows everything about it. In your case, paring down the wasted cracking guesses by a certain percent, say 25%, should not make-or-break the practicality of cracking: 1 trillion years is just as practical as 0.75 trillion years. Having implementation details not occluded also helps new good guys defend.

There's also social considerations. A "BEWARE OF DOG" sign might leak security implementation details, but it also discourages a certain scale of common attacks. The dog helps, known or unknown, but a sign could reduce fence maintenance costs, which frees resources for fighting other battles.

dandavis
  • 2,693
  • 10
  • 16
  • 3
    "A good system is secure even when an attacker knows everything about it." This conventional wisdom is of course, true. But when you gain no benefit of disclosing information to the user, don't. This is because complicated systems are essentially never 100% secure. In this specific case there's obviously a huge usability benefit, and definitely not a security issue, but you need to be careful when applying this logic to the general case – Cruncher Jan 15 '20 at 20:41
1

The other answers show how disclosing this is not a security issue.

I will argue with a true anecdote that not disclosing can lead not only to frustrating users but even to less security.

This happened to me more than once on sites that didn't make clear what characters are allowed. At that time I didn't use a password generator so I had a mental scheme to remember passwords. It involved using a few special characters like $@.&. A few frustrated attempts where all I got was "invalid characters in password" and I progressively started to eliminate special characters. Then came "your password is not safe enough, try adding some digits and symbols" ... yeah ... I am definitely relaxed at this point.

I finally got it to accept my password. Which not only contained just one special character but also didn't fit any of my schemes for remembering passwords so I had to ... hmm ... store it somewhere else (this was before I even knew about password managers so I will let you imagine how I secured it. I will tell you it was like a helicopter and like a string).

bolov
  • 135
  • 8
0

I don't see a problem in disclosing the set of characters the user is able to choose from, as long as these rules are on the one hand also enforced and checked. Which is to say, if the user is not allowed to use \ in the field, it is not enough to check this in the JS input, but also before entering this i.e. in the DB and when the data is used. So the rules have to apply all the time (TOCTOU). The other part is, it is no problem to reduce the set of characters, as long as the entropy is still given. You can reduce the Password characters to "a" and "b", but then you have to increase the min PW length accordingly (and check, that they are not only holding the "a" key down until it is long enough). With a long enough (random) password, this also would be a secure password. In the end, all there is to a character are bits, and everyone knows these are 0 & 1, so the (real) alphabet is always known, how many possible arrangements there are, is up to you to specify (and check).

Sango
  • 117
  • 4