1. Field of the Invention
The invention generally relates to command string interpreter usage implemented in firmware and/or software. The invention more particularly relates to embeddable interpreters for hypertext transfer protocol (HTTP) and hypertext markup language (HTML) interpreters used in networked environments. Relevant networked environments include, inter-alia, the Internet.
2. Description of the Related Art
A plethora of information is now readily available to computer users. Online services and networks, especially the Internet, and in particular the World Wide Web (WWW) implemented thereon, make a large amount of information accessible to almost any personal computer. connected to the Internet.
The widely used layout language for a WWW document is Hypertext Markup Language (HTML). HTML exists in several versions, or revision levels, and extensions providing session level features such as encryption are available (SHTML, PHTML, etc.).
WWW documents are typically given a xe2x80x9cUniform Resource Locatorxe2x80x9d (URL). An URL is essentially a protocol selector together with an address path identifying the server computer that hosts the desired document, also together with the location of the document on the server""s file-system (also known as xe2x80x9cfilestorexe2x80x9d). Using browser software such as NETSCAPE(copyright) NAVIGATOR(copyright), a person can send a request from a client computer to access a document stored at a server referenced by a URL. When the server receives the client""s request, the server sends a representation of the data in the requested HTML WWW document to the client computer wherein a representation of the document can be displayed upon a Cathode Ray Tube (CRT).
A session protocol known as xe2x80x9cHypertext Transfer Protocolxe2x80x9d (HTTP) is typically used in making a request for a WWW document and in transferring representations of WWW documents. Servers that maintain HTML Web documents are commonly known as xe2x80x9cWeb Sitesxe2x80x9d. For more background information about WWW, see for example T. Berners-Lee, et al. xe2x80x9cThe World Wide Web,xe2x80x9d Communications of the ACM, vol. 37 (8), August 1994.
One of the design aims of HTML is that information about data structure is transferred, but the finer details of data presentation are a matter for the client software to decide. In principle, the client software could (for example) translate the document into a foreign language or a different typeface. A common client (browser) presentation feature is to vary the color of the displayed representation of hypertext links present within the HTML according to whether the further referenced document is, or is not, cached. By convention, blue is the preferred color for a non-cached hyperlink. Linking using hypertext links is well known in the WWW arts. Presentation software in a client can perform many things, for example if the display is limited in capability (e.g. a limited size LCD (liquid crystal display) then adjustments to optimize presentation are possible.
The widespread adoption of the WWW, especially by those who use personal computers has lead to a great proliferation in server software and expertise in the creation, manipulation and use thereof. Hence, it is unremarkable that this has in turn lead to extensive re-use of the relevant system programs. In particular, Internet Protocol (IP), HTTP, and HTML have all found use in the world of dedicated and embedded systems (in addition to the more obvious usage in personal computers (PCs)).
There is also a trend towards incorporation of specialized Internet access into devices that are not dedicated computers. Such appliances might include domestic appliances (e.g. refrigerators, washing machines and various cooking devices), terminals in public places such as airports, display devices in motor vehicles and so forth.
Embedded computers are paradigmatically used as part of dedicated equipment in relatively stable applications. Such equipment is often capable of unattended operation, and the relative lack of need for flexibility (as compared with PCs) provides a beneficial opportunity to pare costs and improve performance by eschewing features.
As intimated above, costs are an important factor in embedded systems, especially as such systems are commonly sold as a bundled package. Moreover, many embedded systems also have a SCADA (Supervision, Control And Data Acquisition) aspect or instrumentation aspect and thus a have need for high-performance real-time capabilities. Thus, it is desirable in those embedded systems that incorporate browsers (or other HTML etc. interpreters) to use Microprocessors (MPUs or, sometimes CPUs) that offer high computing power at low prices. Such MPUs are typically modish and popular consumer units, and (at the present time) usually have a 32-bit architecture. However, the pace of MPU development is hot and the market for MPUs is dynamic so it is important that any software and/or firmware code used be portable across architectures. Thus, the MPUs may be selected according to whichever provides the prevailing optimal price/performance tradeoff. In particular small-endian CPU architectures (as typified by INTEL(copyright) products) vie for price performance advantage with large-endian architectures such as Power PC(copyright) products from IBM(copyright) and MOTOROLA(copyright). Large-endian and small-endian architectures are well known in the relevant arts.
Implementing computer code for portability typically conflicts with implementing for run-time performance, and the prior practice of handcrafting optimized instruction codes is thought to be too burdensome nowadays. This is especially so as compared to the excellent results of which modern optimizing compilers are capable.
Referring to FIG. 1 (Prior art) a code fragment rendered in the popular xe2x80x98Cxe2x80x99 language shows a simple parse loop for a leading command in a text string utilizing the well-known strcmp( ) function. Whilst highly portable it is apparent to a practitioner in the art that this code (FIG. 1) will consume needlessly many MPU clock cycles when executed. Still referring to FIG. 1, re-implementing strcmp( ) as an in-line function may reduce the number of clock cycles to execute, but may produce startling performance variations across platforms and indeed across compilers. Although this code may exploit MPU optimization features including pipelining, branch prediction and data and instruction caching, the scope for exploiting other features such as out of order instruction execution, wide memory access and loop unrolling is limited. Moreover, there are simply too many instruction executions per iteration (even if the loop were unrolled).
Another approach is illustrated in FIG. 2 (prior art). This involves nested parsing of single characters from the string. Although less tidy in source, and perhaps still quite portable this approach also produces disappointing average execution times for the parsing of rich and complex command and/or markup languages. In addition, this type of code can be expensive and tedious to maintain or enhance.
Thus a need exists for a method and apparatus to improve efficiency in character oriented protocol interpreters as typified by HTML interpreters. Preferably, such an improved interpreter should be implemented in a manner that exploits the characteristics of modern MPUs as catered for in various optimizing high-level language compilers and in a manner that provides for a good measure of platform independence.
Accordingly, the present invention provides a method and apparatus to improve efficiency in character oriented protocol interpreters, whilst providing a good measure of platform independence.
Embodiments of the invention parse data protocol streams and according to the data content therein, conditionally cause execution of instruction codes constituting corresponding service routines.
In one embodiment of the invention, there is provision for compiling a function IBM of a sub-string from a byte oriented protocol stream into on-chip scratchpad memory and performing multiple (many or very many) word-oriented comparisons therefrom, This can take any of a number of forms. For example: four bytes of the string may be converted to a 32-bit integer by scaling each byte in turn, so that the bit patterns in the four bytes are orthogonal in the intermediate 32-bit result code. Alternatively, 64-bit integer may be preferred to hold the result if that were a favored word size in a CPU architecture. However, in the interests of portability it may be a better compromise to use a data type that is bound to the optimal precision at compile time, for example a xe2x80x9clong intxe2x80x9d in the popular xe2x80x98Cxe2x80x99 language.
In another embodiment of the invention, there is provision for a small, but nonetheless useful, pre-parsing of the data stream. In this embodiment, bytes are compiled into words until a terminating character code is encountered, a maximum string length is reached, or some combination of such conditions occurs. Again this embodiment may be both portable across disparate architectures whilst retaining high efficiency in terms of MPU clock cycles.
All the features and advantages of the present invention will become apparent from the following detailed description of its preferred embodiment whose description should be taken in conjunction with the accompanying drawings.