User Input, Part 1 - Encoding

The Problem

Security 101 - ensure you are implementing input validation to prevent XSS. From the OWASP Top 10 Attack Vector #2

You need to ensure that all user supplied input sent back to the browser is verified to be safe (via input validation), and that user input is properly escaped before it is included in the output page. Proper output encoding ensures that such input is always treated as text in the browser, rather than active content that might get executed.

Example 1: outputting request parameters

<cfset = "Jon" />
Thank You, <cfoutput></cfoutput>

In the above example we are setting the form field "Jon" But if we collect the value from a user (like forms normally do), AND if  the user supplied the name as follows:

<cfset = "<script>document.location=''</script>" />
Thank You, <cfoutput></cfoutput>

What happens when the output gets "displayed" to the screen? What happens is an instruction is executed by the browser! The script instruction redirects the browser to wikipedia. Imagine if that website was Evil. Really Evil. We're not talking "the diet coke of evil" or the "margarine" of evil, we're talking Real Evil. Bad news for your browser.

So we have to figure out a way to tell the browser not to treat these characters as instructions? But before we try to figure out how to do that, what characters are we talking about?


character encoding
 <  &lt;
 >  &gt;
 "  &quot;
 '  & #39;
 &  &amp;

The bracket characters you already know are characters that can embed a <script> tag into a page. These are big naughty characters. Big. Naughty.

But what about the " and the ' character? Well, checkout this example:

<cfset = 'Jon" onMouseOver="javascript:alert(document.location);"' />
<input name="name" type="text" value="<cfoutput></cfoutput>" />

So it's pretty obvious where this is going. Any characters that are used for markup should be encoded, lest they be used for big naughty things. Or small naughty things for that matter.

The solution (or at least a solution)

The simplest of solutions is encoding the data on the way out (when sent to the browser). So in the above examples, if we were to simply squeeze our output through the built in ColdFusion function xmlFormat(), we're good to go:

<cfset = "<script>document.location=''</script>" />
Thank You, <cfoutput>#xmlformat(</cfoutput>

<br />
<cfset = 'Jon" onMouseOver="javascript:alert(document.location);"' />
<input name="name" type="text" value="<cfoutput>#xmlformat(</cfoutput>" size="100" />


Next Steps:

In part duex, we'll look at a slight drawback to using the xmlFormat() function. Also, we'll examine an approach to encode the input on the way in. Finally, we'll look at filtering input to wipe out any other characters that are used for naughty purposes, such as tabs, carriage returns and other non printable characters. Until then, stay safe...