The development of computerized distribution information systems, such as the Internet, allows users to link with servers and networks, and thus retrieve vast amounts of electronic information that was previously unavailable using conventional electronic media.
Users may be linked to the Internet through a hypertext based service commonly referred to as the World Wide Web (WWW). (The WWW may also be used in a broader sense to refer to the whole constellation of resources that can be accessed using one or more of the protocols that embody the TCP/IP suite, described further below.) With the World Wide Web, an entity may register a “domain name” correlated with an electronic address (referred to an IP address) representing a logical node on the Internet and may create a “web page” or “page” that can provide information and some degree of interactivity.
The Internet is based upon a suite of communication protocols known as Transmission Control Protocol/Internet Protocol (TCP/IP) which sends packets of data between a host machine, such as a server computer on the Internet commonly referred to as web server, and a client machine, such as user's computer connected to the Internet. The WWW communications may typically use the Hypertext Transfer Protocol (HTTP) which is supported by the TCP/IP transmission protocols, however, file transfer and other services via the WWW may use other communication protocols, for example the File Transfer Protocol (FTP).
A computer user may “browse”, i.e., navigate around, the WWW by utilizing a suitable web browser, e.g., Netscape™, Internet Explorer™, and a network gateway, e.g., Internet Service Provider (ISP). A web browser allows the user to specify or search for a web page on the WWW and subsequently retrieve and display web pages on the user's computer screen. Such web browsers are typically installed on personal computers or workstations to provide web client services, but increasingly may be found on other wired devices, for example personal digital assistants (PDA) or wireless devices such as cell phones.
As noted above, transactions between Web client and server may be dynamic and may be interactive. A user of a Web client may, for example, request information from the Web server, such as, by way of example, a stock quotation (which is typically dynamic, that is changes over time), or product information (which may be static information maintained in a database by the provider of the Web server). The request message may be communicated to the server in accordance with HTTP, and may additionally, be encapsulated in accordance with an information exchange protocol. One such open-architecture protocol is the Simple Object Access Protocol (SOAP), which is a protocol for the exchange of information in a distributed environment. (A specification for SOAP 1.1 may be found in World Wide Web Consortium (W3C) Note 8 May 2000, copyright 2000, which is hereby incorporated herein by reference.) SOAP is an eXtensible Markup Language (XML) based protocol, whereby the SOAP message may be encoded using XML. (A markup language is a mechanism to identify structures in a document, and an extensible markup language constitutes a meta-language for defining particular markup languages. XML is a particular extensible markup language, having, as recognized by those in the art, an open specification. Another example is the Standard Generalized Markup Language (SGML). Another, non-extensible, markup language is the Hyptertext Markup Language (HTML).) Note that a request message may include a remote procedure call (RPC) whereby a server-side application procedure may be invoked to service the request. That is, the message may be an interapplication communication. SOAP messages may be carried in HTTP, that is, may be embedded in an HTTP request. Hence, the SOAP provides a mechanism for carrying RPCs via HTTP. The response to the request may be returned to the client via an HTTP response carrying a SOAP message encapsulating the response encoded as an XML text stream.
Thus, transactions between a client and server may include a sequence of messages each of which may constitute a stream of characters in which the characters are defined in accordance with a markup language specification. Each character stream may be parsed into elements constituting the message in accordance with the message encapsulation protocol, such as the SOAP. The parser determines if the characters in the stream are valid characters as defined in the markup language specification. Each character may be represented in accordance with the markup language specification by an n-bit value, however not all n-bit values need necessarily represent a character within the specification of a particular markup language. For example, in XML, characters are represented by sixteen-bit values, however, not all such values correspond to valid characters in the XML specification. Typically, parsers validate characters by applying a set of “IF-THEN” rules. However, applying such a rule set, which may be complex, to validate each character may consume significant data processing resources. Consequently, there is a need in the art for systems and methods for parsing character streams that reduce the consumption of processor resources, particularly processing cycles.