16

Why does this vector :

<svg><script>alert&#40/1/.source&#41</script>

works in http://jsfiddle.net/ZgPY4/2/ and this one

<script>alert&#40/1/.source&#41</script>

doesn't. How is <svg> making it work?

Hendrik Brummermann
  • 27,158
  • 6
  • 80
  • 121
Daniel
  • 1,432
  • 4
  • 21
  • 32
  • When `` used ASCII encoding is possible otherwise not, anyway here is xss js append payload when following characters are NOT allowed ? = ( ) ; : ----------- might someone find it usefull ;) [http://jsfiddle.net/sh00pac/GrnqV/](http://jsfiddle.net/sh00pac/GrnqV/) – sh00pac Jun 01 '13 at 15:37

2 Answers2

21

HTML <script> has special powers other elements don't: it is a "CDATA element". That means that any < or & characters up to the end of the element are taken as literally meaning those characters. So what is passed to the JS interpreter is:

alert&#40/1/.source&#41

which is obviously not valid JS syntax.

The concept of a "CDATA element" comes from the SGML world from which HTML developed. But SVG comes from the XML world where things are simplified and there are no CDATA elements. Consequently the SVG <script> element has no special powers: inside an SVG script, < introduces a tag and & introduces an entity or character reference.

Consequently the &#40 gets parsed to ( and the resulting string passed to the JS interpreter is:

alert(/1/.source)

In XML terms the <script> element is in the HTML namespace and the <svg><script> element is in the SVG namespace, so they are different elements. HTML5 makes this all less clear by hiding the namespace prefixes and applying a non-XML parser to the SVG (which is why &#40 works despite it being non-well-formed; should be &#40;).

bobince
  • 12,534
  • 1
  • 27
  • 42
  • Brilliant explanation! +1. May I, then, ask what the equivalent way of getting this script to work without the SVG tag would be? – Lex May 31 '13 at 10:31
  • 4
    You'd just go with the unescaped version, ``. You would only need to consider the SVG workaround if you were dealing with an input filter that blocked use of `(` and `)` characters. There are many funny little workarounds for particular filters that arbitrarily block certain characters. – bobince May 31 '13 at 13:10
  • Much appreciated. – Lex May 31 '13 at 13:24
  • If anyone else is wondering where this is talked about in the spec... the contents of `` are parsed as "foreign content". (`` too). I feel unclean... – sourcejedi Jun 02 '13 at 12:51
  • 2
    FWIW, we did this, because the SVG WG wanted the XML serialization of SVG to be copyable and pasteable into the HTML serialization in a text editor. – hsivonen Jun 07 '13 at 13:32
  • 3
    Moral of the story: Always sanitize user-provided markup by parsing with a real HTML parser, applying a whitelist-based sanitizer to the parser output and re-serializing the sanitizer output. – hsivonen Jun 07 '13 at 13:34
3

The # (hash,sharp,pound) breaks your Javascript. Inside of an <svg> tag pair (your closing tag is not provided) the entity number is being translated to it's proper character reference as it's rendered as html, then the engine executes the ECMA script (which it is allowed to do for drawing).

OWASP.org documents this as hex encoding without semicolons.

AbsoluteƵERØ
  • 3,104
  • 17
  • 20