HTML encoding to protect against XSS

Question

While going through some references about protection against XSS i found that it is a good practice to encode data (entered by users) before using it to generate a dynamic page. I was not able to find out a detailed explanation of this statement.my question is

How does encoding help prevent XSS?
Does it provide full protection against stored XSS?
Are there any other countermeasures that need to be taken into consideration.

the meta discussion about this question being a duplicate http://meta.stackexchange.com/q/172307/201081 — Shurmajee, Jun 13 '13 at 05:58

score 18 · Accepted Answer · edited May 23 '17 at 12:40

(Copied from my answer on StackOverflow )

No.
HtmlEncode simply does NOT cover all XSS attacks.
Encoding is the correct solution, but not always HTML encoding - you need context-sensitive encoding.

For instance, consider server-generated client-side javascript - the server dynamically outputs htmlencoded values directly into the client-side javascript, htmlencode will not stop injected script from executing.

Next, consider the following pseudocode:

<input value=<%= HtmlEncode(somevar) %> id=textbox>

Now, in case its not immediately obvious, if somevar (sent by the user, of course) is set for example to

a onclick=alert(document.cookie)

the resulting output is

<input value=a onclick=alert(document.cookie) id=textbox>

which would clearly work. Obviously, this can be (almost) any other script... and HtmlEncode would not help much.

There are a few additional vectors to be considered... including the third flavor of XSS, called DOM-based XSS (wherein the malicious script is generated dynamically on the client, e.g. based on # values).

Also don't forget about UTF-7 type attacks - where the attack looks like

+ADw-script+AD4-alert(document.cookie)+ADw-/script+AD4-

Nothing much to encode there...

The solution, of course (in addition to proper and restrictive white-list input validation), is to perform context-sensitive encoding: HtmlEncoding is great IF your output context IS HTML, or maybe you need JavaScriptEncoding, or VBScriptEncoding, or AttributeValueEncoding, or... etc.

If you're using MS ASP.NET, you can use their Anti-XSS Library, which provides all of the necessary context-encoding methods.

Note that all encoding should not be restricted to user input, but also stored values from the database, text files, etc.

Oh, and don't forget to explicitly set the charset, both in the HTTP header AND the META tag, otherwise you'll still have UTF-7 vulnerabilities...

Last point, regarding Stored XSS - since you would be doing the encoding during the page generation, on the data output, it is agnostic as to the source of the data, whether from user input (i.e. Reflected XSS) or Database/files (i.e. Stored/Persistent XSS). (So basically yes.)

Some more information, and a pretty definitive list (no longer updated), check out RSnake's Cheat Sheet: http://ha.ckers.org/xss.html

DOM based XSS from the point of defenders is the same as reflected XSS except it's reflected not by server but client code and should be escaped there. — Smit Johnth, Mar 18 '13 at 07:33
_http://ha.ckers.org/xss.html_ - I lol'd, who calls a page of 100 kb text a _cheat sheet_? — Smit Johnth, Mar 18 '13 at 07:39

score 6 · Answer 2 · edited Mar 14 '13 at 10:48

Basically XSS happens when an attacker is successful in executing some kind of unauthorized script on a webpage viewed by a potential victim. So if you HtmlEncode the fields before printing on the webpage, the page will not interpret the data as script. It will interpret the characters as content and the content will be printed on to the page as it is.

eg: say you have something like

<form action="/printname" >
  <input type="textbox" name="name">
  <input type="submit">
</form>

and on the server side you have something like

k=request.GET['name']
return HttpResponse("hai" + k)

if the user inputs the name as

<script>alert("hia")</script>

the page would execute the script instead of printing it. However if suppose the HttpResponse function HTML encodes the output before sending it to the client, < will be replaced by < and > will be replaced by > and the double quotes will be replaced by ".

The result would be the page interpreting the the output as content instead of script.

Hope my answer is clear. And it does help in preventing stored XSS, because the stored value is encoded before printing.

Unfortunately, HTML Encoding [doesn't provide full protection against XSS.](https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet#Why_Can.27t_I_Just_HTML_Entity_Encode_Untrusted_Data.3F). You're better off by [using a well-vetted security library.](https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet#You_Need_a_Security_Encoding_Library) — Adi, Mar 14 '13 at 10:28
right.. it is better to use the rules mentioned in the OWASP page — aRun, Mar 14 '13 at 11:00
+100500 to template frameworks and +100500 to OWASP. Only PHP monkeys seem not to use frameworks, see "PHP: Fractal of bad design". — Smit Johnth, Mar 18 '13 at 07:35

HTML encoding to protect against XSS

2 Answers2

Linked