1. Field of Invention
The present invention relates generally to the field of remote procedure calls. More specifically, the present invention is related to performing remote procedure calls utilizing a markup language which is encoded utilizing tokens as the marshalling format.
2. Discussion of Relevant Art
In computer processing systems, and distributed or parallel processing systems in particular, one of the issues which must be faced is interprocess communication and synchronization. Interprocess communication and synchronization concerns itself with how different processes, typically running in parallel, cooperate. For example, if a data item D is needed by a first process P1 and the data item D is the result of a second process P2, there must be a method of transferring the data D between the two processes. In addition, if process P2 has not been able to communicate the data D to process P1, then the first process P1 must be able to wait for the data D.
One of the ways in which interprocess communication is performed is via message passing. In message passing communication, as opposed to other methods such as data sharing, a sender process sends a message to or invokes a receiving process. As part of the message or invocation, parameters are provided to the receiving process. These parameters are items that the receiver process needs to perform its function.
The most elementary primitive for message passing communications is one-way, point-to-point passing of the message. However, most interactions between processes are essentially two way interactions. While this can be simulated using two point-to-point messages, having a single construct for two-way messaging is more efficient. One such construct is the remote procedure call (RPC). A remote procedure call is just like a normal procedure call except the caller and sender are different processes, such as processes running in two different applications, or on different machines. For an RPC, a first process A calls a remote procedure R of process B and sends the input parameters P to B. When B receives the invocation request, it executes the procedure R and returns the output parameters back to A. After A calls R, A is blocked until it receives back the output parameters.
The remote calls, such as those made by A, are marshalled into a format that is understood by both processes. Machines which are running the same software have no problem understanding the calls initiated by another machine, because the marshalled formats will be the same. For instance, two machines running Windows™ can be networked together and perform RPCs without any problems. However, difficulties exist when RPCs are to be made across platforms, as the processes may not be able to agree on the marshalled format.
This difficulty leads to the need for a standardized cross-platform approach for performing RPCs. With a cross-platform approach, a system running Windows™ and a system running Unix™ can easily perform RPCs. This need has lead to the development of a cross-platform RPC approach known as XML-RPC. For XML-RPC, XML is utilized as the marshalling format. XML-RPC leverages technologies, such as XML, which were designed to be platform independent. The XML-RPC protocol performs remote procedure calls over HTTP. The XML-RPC message is an HTTP-POST request. As is well known, the HTTP-POST method is used to send data which is to be processed in some way by the server. The body of an XML-RPC request is in XML. Based upon the request, a procedure executes on the server and the value returned by the procedure is formatted in XML and returned to the client. The procedure parameters can be scalars, numbers, strings, dates, etc., and can also be complex records and list structures. The drawback to XML-RPC is that it generates large HTTP messages over the network, utilizing a larger amount of the network bandwidth. There is a need to reduce the overabundant bandwidth usage when performing remote procedure calls utilizing XML-RPC.
One method of dealing with bandwidth problems in general has been the use of compression. A lot of work has already been done on lossless data compression (Mark Nelson, The Data Compression Book, M&T Books, 1992). Researchers have developed fast and powerful algorithms for data compression. Their principles are mostly based on Claude Shannon's Information Theory. A consequence of this theory is that a symbol that has a high probability has a low information content and will need fewer bits to encode. In order to compress data well, you need to select models that predict symbols with high probabilities. Huffman coding (Huffman, D. A., “A Method for the Construction of Minimum-redundancy Codes,” Proceedings of the IRE, Vol. 40, No. 9, September 1952, pp. 1098–1101) achieves the minimum amount of redundancy possible in a fixed set of variable-length codes. It provides the best approximation for coding symbols when using fixed-width codes. Huffman coding uses a statistical model because it reads and encodes a single symbol at a time using the probability of that character's appearance. A dictionary-based compression scheme uses a different concept. It reads input data and looks for groups of symbols that appear in a dictionary. If a string match is found, a pointer or index into the dictionary can be output instead of the code for the symbol. The longer the match, the better the compression ratio. In LZ77 compression (Ziv et al., “A Universal Algorithm for Sequential Data Compression,” IEEE Transaction on Information Theory, Vol. 23, No. 3, May 1997, pp. 337–343), for example, the dictionary consists of all the strings in a window into the previously read input stream. The deflate algorithm (P. Deutsch, “DEFLATE Compressed Data Format Specification version 1.3,” RFC 1951, Aladdin Enterprises, May 1996) uses a combination of the LZ77 compression and the Huffman coding. It is used in popular compression programs like GZIP (P. Deutsch, “GZIP File Format Specification Version 4.3,” RFC 1952, Aladdin Enterprises, May 1996) or ZLIB (Deutsch et al., “ZLIB Compressed Data Format Specification Version 3.3,” RFC 1950, May 1996).
One drawback of these text compression algorithms is that they perform compression at the character level. If the algorithm is adaptive (as, for example, with LZ77), the algorithm slowly learns correlations between adjacent pairs of characters, then triples, quadruples and so on. The algorithm rarely has a chance to take advantage of longer range correlations before either the end of input is reached or the tables maintained by the algorithms are filled to capacity, especially with small files. To address this problem, R. Nigel Horspool and Gordon V. Cormack explore the use of words as basic units of the algorithm (Horspool et al., “Constructing Word-Based Text Compression algorithms,” IEEE Transaction on Information Theory, 1992). In most implementations of dictionary-based compression, the encoder operates online, incrementally inferring its dictionary of available phrases from previous parts of the message. An alternative approach proposed by N. Jasper Larsson and Alistair Moffat (Larsson et al., “Offline Dictionary-Based Compression,” IEEE Transaction on Information Theory, 1999) is to infer a complete dictionary offline to optimize the choice of phrases so as to maximize compression performance. An additional disadvantage of these algorithms is the fact that they are unable to retain the structure of an XML document.
The Wireless Application Protocol Forum has proposed an encoding format for XML based on a table (the code space) that matches tokens to XML tags and attribute names (“WAP Binary XML Content Format”). It takes advantage both of the offline approach (the code space can be built offline) and of the word-based compression (tags and attribute names are usually the most frequent words in an XML document). Moreover, unlike the previous compression algorithms, it retains the structure of XML documents. But it does not compress at all the character data content nor the attribute values which are not defined in the Document Type Definition (DTD). Moreover, it does not suggest any strategy to build the code space in an efficient way. The preferred encoding format utilized by the present invention addresses both of these drawbacks: it is designed to compress character data and defines a strategy to build code space. The present invention allows for remote procedure calls to be performed utilizing XML-RPC with a reduction in bandwidth utilization.