This is usually a Java-only site, but I thought this paper was fairly interesting, so thought I’d do a quick post. The implementation is .NET, but the concepts are transferrable.
The good folks at Microsoft Research have come up with a clever new technique for XSS “prevention” called ScriptGard. The paper is located here. (Looks like there was a previous version of the paper in 2010, but this is the latest updated copy.) The paper is a good read, but I just wanted to summarize for folks with short attention spans like me :>.
Essentially, they’ve come up with a performant solution for runtime context-sensitive XSS protection in legacy applications, without requiring changes to the code. That’s quite an achievement, so how do they do it?
Here are the basic steps:
1. Instrument the code and insert hooks to perform taint propagation
2. Apply a simulation algorithm to execute the code and see where tainted data reaches a sink
3. Evaluate whether or not the trace (flow) from source to sink is XSS’able (via a browser implementation that actually analyzes the output context)
4. Mark and cache any “bad” flows
5. If any “bad” flows are executed, process the input and appropriately encode/sanitize the output.
There is a lot more that you can get from the paper, but that’s the gist as I read it.
There are several interesting points to consider from the paper:
- The proposed solution is meant to work on legacy code, in contrast to templating libraries which seem to be a good solution for new code. Templating libraries are good in that they help structure the code better and provide a better separation of code and data, but that doesn’t really help for legacy code, which is quite a challenge. [templating libraries such as OWASP jxt or Google AutoEscape are examples from Java]
- They initially tried doing all this work at runtime, but it was prohibitive from a performance perspective (one extreme example noted 175X – not %, 175 times – worse performance). This meant the process had to be broken up and some data carried over from testing and analysis to runtime.
- They found a significant number of examples in real tested code where a) the wrong encoding was used [bad, common], b) encoding was applied, but not necessary [ok, reasonably common], or c) the right encoders were applied, but in the wrong order [bad, becoming more common]. Lesson: applying encoding/sanitization is hard, even if you are paying attention and know what you’re doing. The rules can become somewhat esoteric, especially when browser bugs cause weird things to happen in corner cases and browser vendors don’t always agree on how things should be handled. Automation is your friend here.
- In order to figure out the context, they actually had to include a browser. I’ve been saying this was required for years now, but had no idea how to make it performant. Looks like they are smarter than I am (DOH!). This does have a certain issue, however, regarding browser implementation quirks. What if the internal browser implementation is wrong? Now the trained algorithm is wrong and the sanitization is wrong …
In conclusion, this paper is a good read if you have the time. XSS doesn’t appear to be going away anytime soon, but research like this gives me hope for novel ways of solving the problem.
Thanks to @securityninja for pointing this paper out.