ASP.NET MVC: Prevent XSS with automatic HTML encoding
ASP.NET, MVC, Security December 19th, 2007There’s an interesting (and sometimes heated) debate on the ASP.NET MVC forums about HTML encoding.
It started with a proposal for a helper method to HTML-encode strings as soon as they are received from the visitor, so they’d be stored HTML-encoded in the database. That way, you don’t have to HTML-encode them for display to prevent cross-site scripting. If that was the default behaviour for the UpdateFrom() method, the idea of encoding for storage would no doubt be widely adopted.
Almost everyone else on the forum, though, has a strong preference for not encoding anything until the moment of display. There are some obvious benefits to this approach - you don’t have to remember which strings were pre-encoded (according to their origin), and you don’t have un-encode them when outputting to any non-HTML format. But it does mean you have to remember to encode things wherever you output them.
Sadly the two methods are incompatible, and you will have to choose one side or the other. I am very definitely in the encode-when-displaying camp.
Another solution
What I’d really like is to change the default behaviour of ASPX’s <%= … %> syntax so that it HTML-encodes the result by default. That’s what you want 95% of the time, so why should you keep writing <%= HttpUtility.HtmlEncode(…) %> all the time?
| Current reality | In my ideal world | |
| Output unencoded string | <%= value %> | <%= (RawHtml) value %> |
| Output encoded string | <%= HttpUtility.HtmlEncode(value) %> | <%= … %> |
This would give us the best of both worlds. You wouldn’t need to remember to HTML-encode your strings (since that happens by default), so there’d be no need to store things pre-encoded in the database and then worry about double-escaping, sharing data with external systems, unencoding for output to non-HTML format and all that other nonsense.
Spike implementation
It’s a great credit to the ASP.NET architecture that we can actually implement that change of behaviour ourselves, and with not much code either. The idea is to intercept the code generation phase that happens when an ASPX file is compiled.
You can specify your own compiler implementation by editing this section of the web.config:
<system.codedom> <compilers> <compiler language="c#;cs;csharp" type="Microsoft.CSharp.CSharpCodeProvider .. etc" extension=".cs" warninglevel="4" /> </compilers> </system.codedom>
… and, helpfully, you can subclass CSharpCodeProvider, override the GenerateCodeFromStatement() method, and redirect all the <%= … %> evaluations through a suitable helper function.
Demonstration
You can download a demonstration project to see this in action, or to install the behaviour into your own project, follow these steps:
1. Download the SafeEncodingHelper assembly (or build it yourself - the demo project includes sources), and add a reference to it in your project.
2. In your web.config, edit the system.codedom.compilers element, to look like this:
<compiler language="c#;cs;csharp" type="SafeEncodingHelper.SafeEncodingCSharpCodeProvider, SafeEncodingHelper" extension=".cs" warninglevel="4"> <provideroption value="v3.5" name="CompilerVersion" /> <provideroption value="false" name="WarnAsError" /> </compiler>
3. Also in web.config, under pages/namespaces, add a reference to the SafeEncodingHelper namespace:
<namespaces> <add namespace="System.Web.Mvc" /> <add namespace="System.Linq" /> <add namespace="SafeEncodingHelper" /> </namespaces>
That’s all! You will now find that <%=…%> encodes its output, or you can get unencoded output by casting your value to the RawHtml type, i.e. <%= (RawHtml)myValue %>.
What about MVCToolkit?
You might be thinking that this is going to break the MVC toolkit, since you use it to build HTML controls with a syntax like this:
<%= Html.TextBox("myinput", "It's nice") %>
You might, reasonably, expect this now to render a bunch of useless HTML-encoded nonsense. There’s a neat solution, though - the MVC toolkit could return values of the RawHtml type (which is merely a wrapper around System.String which adds no functionality). This is specially recognised by the SafeEncodingHelper compiler, and bypasses the HTML encoding. So, you can keep your clean syntax for any methods that you specifically want to render unencoded HTML.
Also, if someone isn’t using SafeEncodingHelper, no problem! The RawHtml type has a .ToString() method that simply returns the underlying value, so the MVC toolkit methods would still work just as well.
The demonstration project contains an alternative MVC toolkit that behaves this way. Actually, it only has a single facility (TextBox), but it’s enough to give you the idea.
Should I really use this then?
Firstly, this code comes with no warranties at all. Use it if you want, but beware - I just cooked it up on impulse and there may be any number of special cases I haven’t accounted for. It’s a proof of concept, that’s all.
Unless Microsoft chooses to support the RawHtml type in their MVC toolkit and related methods, you would have to remember to cast all MVC toolkit output to RawHtml, or write your own wrapper methods or something. Not much fun, sorry.

December 19th, 2007 at 8:07 pm
>>Rob Conery of Microsoft is advising that we should HTML-encode all strings as soon as we receive them from the visitor, and store them HTML-encoded in the database<<
Steve this is not what I’m saying at all. As I’ve mentioned I need the most secure solution possible - and the discussion is ongoing.
December 19th, 2007 at 10:20 pm
Rob, I’m sorry if I misrepresented your position. It did seem to me that you were advocating that strategy for quite a while, as anyone can see if they read the thread. Thanks for reconsidering now. I know you’re on our side!
December 20th, 2007 at 1:24 am
[…] and Phil Haack who I believe in to push this from inside and Steve Sanderson who came up with an elegant prototype on how to tackle this at the […]
December 20th, 2007 at 9:55 am
Cool thing. cleans up a lot of HttpUtility.HtmlEncode calls
I’m just implementing a similar approach on AspView.
I’m also adding a helper method ‘RawHtml’ to allow
which is (imo) a bit more expressive than
In WebForms I’d add this method to my custom BasePage
December 20th, 2007 at 6:27 pm
I can see the confusion :). There’s a difference, I think, that I didn’t make clear. I wasn’t advocating that you must do this (put encoded data in your DB), I was advocating that if you want unencoded data, you ask for it explicitly. In this way you are 1) aware of the choice and 2) don’t find out the hard way.
People are saying that we’re “training people to do dumb things” - I don’t buy that argument. On my blog I suggested that if this supposition is true, then Microsoft is also training people to write bad HTML :).
Encoded data is only nasty (in terms of search) if people input HTML. 99% of the time (like with this comment, which gets stored encoded in your DB :):):) it doesn’t matter since people aren’t entering HTML.
All in all - the solution you offer here is a good one, and the ASP.NET team is reading…
December 24th, 2007 at 11:50 am
[…] ASP.NET MVC: Prevent XSS with automatic HTML encoding […]
May 25th, 2008 at 2:23 am
Microsoft will not support the Raw Html
May 25th, 2008 at 12:51 pm
Thanks Hip-Hop. Would you like to expand on this point?