Tuesday, July 6, 2010

Final step for handling put data, URL decoding

As we saw earlier we can get the put data from the body of the HTML request and decode it into name value pairs. However our values are URL encoded. That is spaces have been converted to "+" characters and others are encoded into %FF hex style. See:

URL encoding at Wikipedia

We need to decode this into plain text. Fortunately the standard java.net package has a URLDecode class that will do the job for us:

URLDecode man at Sun.com

This has two methods, the simpler one (taking only the string to decode as an argument) has been deprecated so we'll not use it. The second method takes the string to be decoded and a string representing the encoding method. This is usually (but not always) UTF-8. So our code to decode the PUT values is now:

URLDecoder dc = new URLDecoder();
System.out.println("String was "+dc.decode((String)hm.get("Software"),"UTF-8"));

Remember from last time the name value pair is stored in a hashmap (here hm). "Software" is the name of the field we are going to retrieve.

One last thing to do before sending this off for storing in a database. In order to avoid Cross Site Scripting attacks we should escape any html in the value field. This is to stop users putting text such as <script>alert("test")</script> into the input. We'll use the commons lang stringescapeutils package to deal with this:

String escape utils

Our code for dealing with the name value pairs now looks like:

String Software=org.apache.commons.lang.StringEscapeUtils.escapeHtml(
      (String)dc.decode((String)hm.get("Software"),"UTF-8"));

No comments:

Post a Comment