Recommended usage. The short answer is: Use markdown(untrusted, safe_mode=remove, enable_attributes=False)
.
Make sure you have an up-to-date version of the Markdown library, as older versions have some security problems.
You could also run the output through a HTML sanitizer, like HTML Purifier.
Rationale. It is a good idea to disable enable_attributes
. While the latest development versions of the Python markdown library will disable enable_attributes
by default if you set safe_mode
, earlier versions did not do so. Consequently, just setting safe_mode
is not enough on most versions of the Markdown library. If you just set safe_mode
, the result is insecure:
import markdown
>>> markdown.markdown("{@onclick=alert('hi')}some paragraph", safe_mode=True)
u'<p onclick="alert(\'hi\')">some paragraph</p>'
At the moment, the fixes are only present in git. At the time of this writing, the latest released version of Python Markdown (2.1.1) remains vulnerable unless you explicitly set enable_attributes=False
. Therefore, it is plausible that many systems currently using Python Markdown may be vulnerable.
The documentation could be better at warning users of Markdown about these pitfalls. It says things like "You may also want to set enable_attributes=False
when using safe_mode
", without disclosing that failing to do so creates an XSS hole with all but the very latest versions of the library. Later versions of the documentation say that setting enable_attributes
"could potentially allow an untrusted user to inject JavaScript into your documents"; it would be clearer to say that setting enable_attributes
does enable users to inject Javascript into your documents and is thus highly insecure if the Markdown might come from an untrusted source.
Doubts. That said, I'm not 100% certain whether the result will be secure, even when using it as recommended above. The developers have made comments like the following:
"safe-mode" was a poor name choice that we continue to use for backward comparability (old code still works with our newer versions). What it really is is a no-markup mode. In other words, it is just a way to disallow raw html and really doesn't guarantee safety.
Those kinds of comments are a bit scary.
In earlier versions of the Python Markdown library, its HTML sanitization looks a bit fragile to me, so I'm not sure whether I'd trust earlier versions of the Markdown library, regardless of what flags are passed. Consider the following:
>>> markdown.markdown("[Example](javascript://alert%28%22xss%22%29)", safe_mode=True)
u'<p><a href="javascript://alert%28%22xss%22%29">Example</a></p>'
Allowing javascript:
-style URLs through Markdown's processing seems to me like a pretty dubious design decision. It feels like this is within a hop, skip, and a jump of XSS. All that's missing is a way to break out of the C++-style comment (the //
), and it is game over. For instance:
>>> markdown.markdown("[Example](javascript://\nalert%28%22xss%22%29)", safe_mode=True)
u'<p><a href="javascript:// alert%28%22xss%22%29">Example</a></p>'
How confident should I be that no browser will execute that Javascript? I dunno, but it is not giving me warm, fuzzy feelings. If it is secure, it is just blind luck.
Fortunately, the very latest released version of Markdown appear to do stricter filtering of script if you set enable_attributes=False
. But make sure you set enable_attributes=False
, otherwise Markdown falls back to the fragile HTML sanitization found in earlier versions, and I'm not confident in the security of that scheme.
What not to do. The following is not secure: markdown(escape(untrusted))
.
- You might think that first escaping the input would remove all of the HTML and make this usage safe. In fact I have seen this used in some systems and recommended by some. However, it is actually unsafe, since escaping is not enough to make URLs safe. For instance, this usage of Markdown can be defeated by "
[clickme](javascript:alert%28%22xss%22%29)
". In general, escaping the input to Markdown is not the right approach; the right approach is to invoke Markdown in the appropriate way (and possibly apply a HTML filter to its output as well, if you want extra protection).
If you use Django. If you use Django, the following should be a safe way to use Markdown:
{{ untrusted | markdown:"safe" }}
As of Django 1.4, this is safe. when you pass the "safe"
argument, Django now has special support to set safe_mode
and disable enable_attributes
. But make sure to update to Django 1.4 or later; in earlier versions, this usage was insecure.