Extensible Markup Language (XML) has become a readily used and widely accepted general-purpose markup language. XML is an open standard that has been adopted by many business and non-business entities for use in a variety of applications. One notable use of XML is for the sharing of structured data across different information systems, such as via the Internet and the World Wide Web (WWW). Greater detail concerning XML may be found at www.w3.org/XML; in the XML 1.0 standard specification, Extensible Markup Language (XML) 1.0 (Fourth Edition), W3C Recommendation 16 Aug. 2006, edited by Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, and François Yergeau (available at www.w3.org/TR/xml/); and in the XML 1.1 standard specification, Extensible Markup Language (XML) 1.1 (Second Edition), W3C Recommendation 16 Aug. 2006, edited by Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, François Yergeau, and John Cowan (available at www.w3.org/TR/xml11/), the content of each of which is hereby incorporated by reference in its entirety, including for purposes of more fully describing the standard form and use of XML and requirements for well-formed XML and valid XML.
Like computers and other computing devices and systems, printers and printing systems may also use XML data, and notably rather than a proprietary data format and/or a proprietary or fixed single data encoding scheme. Printing systems, such as for printing barcodes and for transmitting data to a barcode printer, are widely used. And while many such printing systems and printers use proprietary data formats and/or proprietary methods of data encoding and may not be interchangeable and/or compatible with other printing systems and barcode printers, some beneficial printers and printing systems have been developed that use the XML data format and readily acceptable data encoding formats, such as XML data encoded according to UTF-8.
The XML standard specification allows for XML data to be stored using multiple character encoding schemes, including but not limited to ISO-8859-1, Extended Unix Code for Korean text (EUC-KR), UTF-8, and UTF-16. And the XML standard specification requires that processors of XML support the Unicode character encodings UTF-8 and UTF-16. Use of more limited encodings, such as those based on ISO/IEC 8859 and UTF-32, is acknowledged and is widely used and supported, but is not a mandatory requirement of XML specifications. In XML, attributes (also referred to as elements and pseudo-attributes) in a declaration may optionally be included, such as an XML declaration that states what version of XML is being used. An XML declaration may also contain information about character encoding (also referred to as an encoding declaration). For example, an XML script may begin with the XML declaration<?xml version=“1.0” encoding=“UTF-8”?>, indicating that XML version 1.0 is being used and that the encoding is UTF-8. Thus, the primary method used by computing devices and systems, including printers, to accurately detect the encoding used in XML data and thereby decode the XML data is to examine the encoding attribute in the XML declaration at the start of the XML data stream, such as <?xml encoding=“UTF-8”?>.
However, this is a problem for UTF-16, which is a non-ASCII transparent byte serialized encoding scheme that may be either big-endian (BE) or little-endian (LE) and, thus, define the order of the bytes in the encoding scheme. Computing devices and systems are not able to decode the encoding attribute for UTF-16 if it is encoded as UTF-16 unless the computing device or system first knows that the incoming XML data is encoded in UTF-16. This presents a logical Catch-22. To resolve the potential problem, the XML standard specification requires that any XML data encoded in UTF-16 must be prefaced with a valid Unicode UTF-16 byte-order mark (BOM) described in ISO/IEC 10646 or Unicode with a Zero Width No-Break Space character, xFEFF also called a Byte Order Mark (byte sequence FE FF in UTF-16BE and byte sequence FF FE in UTF-16LE). Greater detail concerning character encoding in XML may be found, for example, in Section 4.3.3, entitled Character Encoding in Entities, in Extensible Markup Language (XML) 1.1 (Second Edition) and in Sections 2.5 and 2.6 entitled Encoding Forms and Encoding Schemes in The Unicode Standard, Version 5.0.
Furthermore, the Unicode standard itself states that the use of a BOM is optional. This presents another problem for the use of UTF-16 and XML. For example, many utilities that are used to create XML data are Unicode compliant, but are not necessarily written specifically for XML, and may, therefore, be valid but not well-formed XML. And, as a result, UTF-16 XML data may not be preceded by the required BOM. The XML standard specification provides that it is a fatal error for an entity which begins with neither a Byte Order Mark nor an encoding declaration to use an encoding other than UTF-8.
This problem is exacerbated when a computing device or system must interpret an incoming data stream of XML data, such as from multiple hosts/sources, each which may be using their own encoding scheme, rather than individual XML data files, a single host/source, or multiple hosts/sources using a single, known encoding scheme. For example, a computing device or system may not be able to detect the presence of the start of a new XML declaration and/or may not be able to determine the encoding scheme used for the subsequent XML data, particularly where XML data is encoded in UTF-16, but no BOM is provided.
A need exists for printers and other electronic devices, systems, methods, and computer program products that may receive an incoming data steam to unambiguously automatically detect and determine a UTF-16 encoding scheme and the endianness thereof for an XML declaration in the incoming XML data stream without a BOM.