What the &#x0

Friday, October 02, 2009 / Posted by Luke Puplett /

The character combination above is no obfuscation for an expletive. Its HTML notation for NULL and I know this for many reasons. Firstly, because it says so here, but also because it ended up being appended to a JSON object I was passing around.

I have a WCF web method that takes a string but its design is to take some entity as XML. Unfortunately, when there’s a big payload XML wrapped in XML seems to cause HTTP 400 bad request to be spat out by IIS or WCF.

Rather than troubleshoot that, and because I need to get on, my brainwave was to switch serializers. So out went my XmlSerializer and in came DataContractJsonSerializer.

Immediately the deserialization routine threw out a message saying something like invalid character \0 in my Json string.

Reaching for Fiddler I inspected the raw data going over the wire. My Json was all there but so too was ���� etc. at the end of the object.

I have a helper class which has serialization boilerplate inside and one of the methods takes an array of Char as the raw data to deserialize. Looking at the array of characters as it went in revealed a whole bunch of character 0 elements.

My initial thought was that the object being serialized has an array that has not been trimmed (TrimExcess) and serialization is making an exact representation, excess included.

Trimming didn’t fix it so I wondered if Microsoft’s DataContractJsonSerializer was pushing arrays around and MS had forgotten to trim the excess.

No. My helper class goes like this (this one is for Xml):

public static void Serialize(object serializable, out string xml)
{
    var memoryStream = new SysIO.MemoryStream();
    Serialize(serializable, memoryStream); // delegates to overload
     
    var encoding = new System.Text.UTF8Encoding();     
    xml = encoding.GetString(memoryStream.GetBuffer()); // bug
}

While looking at the MemoryStream instance I noticed its buffer size was the same as the character array length at my service end.

The problem/solution is this: the GetBuffer() method returns the buffer (okay, its obvious now) while the ToArray() method, which is what I should be using, returns just the data. I assume the buffer has automatically sized itself and has spare capacity.

What is unusual is that the problem is only evident when deserializing Json object strings. The empty data is encoded in the raw string so it must have been there when the text was XML, so it must also have been there when I succesfully deserialized it at the service – maybe the XmlSerializer simply ignores character 0 and gets on with the job.

It you don’t have control over the creator of the data, then writing a function to clean the trailing 0s from the end of a string.ToCharArray() would likely fix the issue.

Labels: ,

1 comments:

Comment by Luke Puplett on Friday, November 26, 2010

Using GetBuffer when serializing or deserializing objects can also cause "hexadecimal value 0x00, is an invalid character." because the deserializer converts all the null buffer elements to bazillions of \0 sequences.

Post a Comment