48

How do I use the Markdown library safely? What do I need to do to make sure that its output is safe to include into my web page?

I want to allow untrusted users to enter content (in Markdown format). I'll use the Markdown processor to generate HTML, and I'd like to include that HTML in my web page. What do I need to do to make sure this is secure, and is not a self-inflicted XSS vulnerability? What arguments do I need to pass? Is there any preprocessing or postprocessing I need to do? I'm using the python-markdown library, if that is relevant.

D.W.
  • 98,860
  • 33
  • 271
  • 588
  • 2
    Just a thought: You could use the [JavaScript version "PageDown"](http://stackoverflow.com/questions/134235/is-there-any-good-markdown-javascript-library-or-control/135155#135155) which would convert text to HTML when the page loads, that would let you use the sanitizer which probably has already been well vetted for use in the StackExchange system. However, not so good for SEO or JavaScript disabled clients. – 700 Software May 06 '12 at 23:08

2 Answers2

19

Recommended usage. The short answer is: Use markdown(untrusted, safe_mode=remove, enable_attributes=False).

Make sure you have an up-to-date version of the Markdown library, as older versions have some security problems.

You could also run the output through a HTML sanitizer, like HTML Purifier.

Rationale. It is a good idea to disable enable_attributes. While the latest development versions of the Python markdown library will disable enable_attributes by default if you set safe_mode, earlier versions did not do so. Consequently, just setting safe_mode is not enough on most versions of the Markdown library. If you just set safe_mode, the result is insecure:

import markdown
>>> markdown.markdown("{@onclick=alert('hi')}some paragraph", safe_mode=True)
u'<p onclick="alert(\'hi\')">some paragraph</p>'

At the moment, the fixes are only present in git. At the time of this writing, the latest released version of Python Markdown (2.1.1) remains vulnerable unless you explicitly set enable_attributes=False. Therefore, it is plausible that many systems currently using Python Markdown may be vulnerable.

The documentation could be better at warning users of Markdown about these pitfalls. It says things like "You may also want to set enable_attributes=False when using safe_mode", without disclosing that failing to do so creates an XSS hole with all but the very latest versions of the library. Later versions of the documentation say that setting enable_attributes "could potentially allow an untrusted user to inject JavaScript into your documents"; it would be clearer to say that setting enable_attributes does enable users to inject Javascript into your documents and is thus highly insecure if the Markdown might come from an untrusted source.

Doubts. That said, I'm not 100% certain whether the result will be secure, even when using it as recommended above. The developers have made comments like the following:

"safe-mode" was a poor name choice that we continue to use for backward comparability (old code still works with our newer versions). What it really is is a no-markup mode. In other words, it is just a way to disallow raw html and really doesn't guarantee safety.

Those kinds of comments are a bit scary.

In earlier versions of the Python Markdown library, its HTML sanitization looks a bit fragile to me, so I'm not sure whether I'd trust earlier versions of the Markdown library, regardless of what flags are passed. Consider the following:

>>> markdown.markdown("[Example](javascript://alert%28%22xss%22%29)", safe_mode=True)
u'<p><a href="javascript://alert%28%22xss%22%29">Example</a></p>'

Allowing javascript:-style URLs through Markdown's processing seems to me like a pretty dubious design decision. It feels like this is within a hop, skip, and a jump of XSS. All that's missing is a way to break out of the C++-style comment (the //), and it is game over. For instance:

>>> markdown.markdown("[Example](javascript://\nalert%28%22xss%22%29)", safe_mode=True)
u'<p><a href="javascript://&#10;alert%28%22xss%22%29">Example</a></p>'

How confident should I be that no browser will execute that Javascript? I dunno, but it is not giving me warm, fuzzy feelings. If it is secure, it is just blind luck.

Fortunately, the very latest released version of Markdown appear to do stricter filtering of script if you set enable_attributes=False. But make sure you set enable_attributes=False, otherwise Markdown falls back to the fragile HTML sanitization found in earlier versions, and I'm not confident in the security of that scheme.

What not to do. The following is not secure: markdown(escape(untrusted)).

  • You might think that first escaping the input would remove all of the HTML and make this usage safe. In fact I have seen this used in some systems and recommended by some. However, it is actually unsafe, since escaping is not enough to make URLs safe. For instance, this usage of Markdown can be defeated by "[clickme](javascript:alert%28%22xss%22%29)". In general, escaping the input to Markdown is not the right approach; the right approach is to invoke Markdown in the appropriate way (and possibly apply a HTML filter to its output as well, if you want extra protection).

If you use Django. If you use Django, the following should be a safe way to use Markdown:

{{ untrusted | markdown:"safe" }}

As of Django 1.4, this is safe. when you pass the "safe" argument, Django now has special support to set safe_mode and disable enable_attributes. But make sure to update to Django 1.4 or later; in earlier versions, this usage was insecure.

D.W.
  • 98,860
  • 33
  • 271
  • 588
  • What are disadvantages of using HTML sanitizer in comparison with safe_mode and enable_attributes? IMO advantages are: 1) safety (your doubts won't apply in case of sanitizer), 2) sanitizer will allow HTML (allowing HTML was one of design goals of markdown) – Andrei Botalov May 07 '12 at 21:02
  • 2
    @AndreyBotalov, I know of no serious disadvantages. Admittedly, it is another piece of software you have to install and configure properly. You also have to select a HTML sanitizer that will be trustworthy (some of the packages out there are crummy; others, like HTML Purifier, are excellent). But these seem minor. I agree with your list of advantages. – D.W. May 08 '12 at 08:00
  • Django doesn't ship with a `markdown` template tag by default. – Flimm Aug 12 '16 at 14:24
  • 1
    FYI, `safe_mode` is now [deprecated](https://python-markdown.github.io/change_log/release-2.6/#safe_mode-deprecated). – HBat Dec 19 '18 at 21:54
13

Markdown alone would not be sufficient for santizing output, since it allows arbitrary HTML/Javascript input and simply passes it unprocessed.

E.g. this is a valid markdown:

## heading

text

But also this:

## heading

text <script>alert('hello');</script>

From the markdown syntax page:

For any markup that is not covered by Markdown’s syntax, you simply use HTML itself. There’s no need to preface it or delimit it to indicate that you’re switching from Markdown to HTML; you just use the tags.

I just did a quick test using python-markdown and it does seem to work this way.

That said, given the limited character set that's used by markdown syntax, it might be easier to filter the character set you allow users to provide before you feed it to markdown (e.g. something like a-zA-Z* #+:/&?=-_()>), but even those might be sufficient to confuse some code that parses/encodes it... So I'm not really sure how much safety you get purely from the fact that you use markdown.

UPDATE:

upon further research I found this answer on SO which seems quite sensible.

I then also searched further and discovered the safe_mode switch (mentioned here and here).

A quick test seems to work quite well, but it might deserve further research...

>>> import markdown
>>> markdown.markdown("<script>alert('hello');</script> hello <strong>world</strong>")
u"<script>alert('hello');</script>\n\n<p>hello <strong>world</strong></p>"
>>> markdown.markdown("<script>alert('hello');</script> hello <strong>world</strong>", safe_mode=True)
u'<p>[HTML_REMOVED]</p>\n<p>hello [HTML_REMOVED]world[HTML_REMOVED]</p>'

Full option set for safe_mode available on the documentation page - which also mentions having enable_attributes set to False for safety.

Yoav Aner
  • 5,329
  • 3
  • 25
  • 37