Why do I get the error 'Invalid character in the given encoding'?

Document Encoding does not match Encoding attribute
When loading a 3rd party supplied XML document into the generated classes, you may see the error "Invalid character in the given encoding. Line xx, position yy".

The issue appears when the XML document has not been saved in the same encoding as is specified in the documents Encoding Declaration (typically in the first line of the document).

In other words, if a document is saved in the standard Windows-1252 encoding, simply stating encoding="utf-8" in the first line of the does not make the encoding UTF-8.

Whilst this will not show as an error for standard 'common' characters (which match in both encodings), it will fail when a 'foreign' character is found which is valid in Windows-1252 standard set of characters, but not found in the UTF-8 standard set of characters. E.g. The character ö.

In Windows-1252 encoding, the 'foreign' character ö is 1 character wide as it is part of the standard Windows-1252 character set. But in UTF-8, these kind of chracters are between 2 to 5 characters long as they are not part of the standard character set.

So either the 3rd party XML document needs to be saved correctly in UTF-8 encoding, or the first line header needs to declare the correct encoding, e.g.
<? xml version="1.0" encoding="windows-1252" ?>

Missing BOM Marker
When loading an xml document that contains Unicode characters and does not have a BOM (Byte Order Marker) at the start of the file, the error 'Invalid character in the given encoding' may be raised.

A workaround would be to either get the system that produces the xml documents to append the Unicode BOM or you could preprocess the file in your application.

In the Liquid XML Studio, an option to write the BOM (default) or not for compatibility with older applications is available. Please see:
http://www.liquid-technologies.com/SmarterTrack/KB/a4/byte-order-marker-bom.aspx

Article ID: 84, Created: 12/21/2011 at 12:00 PM, Modified: 3/9/2012 at 8:07 AM

Why do I get the error 'Invalid character in the given encoding'?

Was this article helpful?