A programming language is used to write source code that contains instructions that can be compiled and then executed by a computer. The source code usually contains identifiers, which refer to user-defined entities such as variables, methods, functions, classes, etc. Every programming language has a syntax that defines rules for forming identifiers. The syntax specifies a set of characters that can be used to form the identifiers. This set of characters may be referred to as “legal identifier characters.” Programming languages typically have reserved words, such as “function” or “class.” The syntax typically also specifies that reserved words cannot be used in identifiers.
A markup language is a set of words and symbols for describing the information that is contained in a document. An example of a markup language is the Extensible Markup Language (XML). Markup languages also have syntax for forming identifiers. For example, each markup language has a set of characters that can be used to form identifiers. As a particular example, the Extensible Markup Language (XML) defines a specific set of characters that can be used to form identifiers that can be used in documents as names of user-defined elements and attributes. Thus, XML has a set of legal identifier characters.
The syntax for forming identifiers in XML is different from at least some other languages. For example, the legal identifier characters in the XML syntax are different from at least some other languages, such as the JAVA™ programming language. As a particular example, the XML syntax allows the punctuation character “-” to be used in an identifier. However, the JAVA programming language does not allow a “-” character to be used an identifier. Rather, that character is reserved for use as a minus sign in an expression.
The different syntaxes that languages have for forming identifiers can sometimes present problems. For example, it may be desirable to allow an existing programming language to work with (e.g., create and modify) Extensible Markup Language (XML) documents. This may require a program that is not written in XML (“non-XML program”) to reference an XML identifier. If the of XML and the syntax of the non-XML language are incompatible, then it may not be possible for the program written in the non-XML language to reference the XML identifier.
To allow a non-XML language to work with XML documents, a set of library APIs can be defined for creating and manipulating data structures that represent XML documents. The non-XML program uses these APIs in order to create and manipulate data structures that represent XML documents. In this API technique, XML identifiers are represented as strings in the non-XML program. A drawback with this approach is that a more intimate connection between the two languages is desired.
Another technique for allowing an existing programming language to work with XML documents is the E4X standard. The E4X standard (ECMA-357) defines an embedding of XML into the ECMAScript programming language. In this embedding, if the XML identifier happens to be a legal ECMAScript identifier, the XML identifier can be written as-is, without any special notation. However, if the XML identifier is not a legal ECMAScript identifier, then it must be written as an ordinary string constant and surrounded by square brackets. For example, the XML identifier “foo-bar” would be written as follows, where “ns” is a variable representing the identifier's namespace.
ns: [“foo-bar”]
The E4X approach has several drawbacks. One drawback is the complexity of the syntax required to represent XML identifiers that are not legal ECMAScript identifiers. Another drawback with the E4X approach is that it may not be clear to the programmer whether the identifier is intended as an XML identifier or a regular program identifier. The E4X approach does define rules for the compiler to interpret the identifier as either an XML identifier or a regular program identifier. However, because no special notation is used for the XML identifier when it is also a legal ECMAScript identifier, it is not readily apparent to the programmer by examining the identifier itself whether it is intended to be an XML identifier or an ECMAScript identifier.
Because of these and potentially other drawbacks, these approaches do not provide wholly satisfactory results. Consequently, an improved mechanism for allowing programs that use different legal identifier characters to work together is desired. Furthermore, an improved mechanism for allowing programs written in languages other than XML to work with XML documents is desired.