Byte Order Marker (BOM) and Document Encoding

Note: For further information about the Byte Order Marker (BOM) please see Byte Order Marker

When Liquid XML Studio loads a document it uses the BOM (if present) to determine the encoding, and uses this to decode the document into an internal Unicode format. The encoding attribute from the XML (if present), is used to refine the decoding of the data to its Unicode form.

If the BOM and encoding attribute conflict then a best guess is made (typically relying on the BOM).

The XML document is stored and manipulated internally as Unicode, and it keeps the encoding provided by the encoding attribute (or BOM) as a property of the document.

Whenever you paste data into the document, it is dealt with as Unicode.

When you save the document the document is written out using the encoding property associated with the document (this may be changed by the user in the properties window). S o you should have no issues with encoding.

Issues when using a BOM with UTF-8 Encoded Documents

Unicode files must have a BOM to identify them. The only time the BOM becomes an issue is UTF-8. For UTF-8 there is no standard that says whether the BOM should be written or not. Most Microsoft applications and newer applications write the BOM, some older applications do not write the BOM and worse still don’t understand it when a document is read in that contains a BOM. They may just see the characters EF BB BF or  at the start of the document and assume the document is invalid.

In order to accommodate this we write the BOM by default, but there is a global option to turn it off for UTF-8.

To stop the BOM from getting written, change the setting in menu:

Tools->Options->Environment->Write Byte Order Marker (BOM)