Parsing and pattern-matching are important functions for many different applications, including compilers for programming languages, security, e.g., intrusion detection, virus scanning, etc., and data compression.
As is known in the art, parsing and pattern-matching typically involves testing an input document, in the form of an input stream of characters, to see if it meets or matches one or more conditions. Examples of such conditions include testing if a character is part of a reserved word or variable or attribute name, if the character obeys any defined naming conventions, e.g., what characters are allowed to be part of such a name, and/or if the character string obeys the relevant syntax rules, etc. The latter may also include checking the character string against certain document state information, such as, for example, testing a new variable name against a list of all previously processed variable names to determine if the new variable name is unique. Another example would be to test if a value assigned to a variable is in accordance with the type declaration of that variable.
These functions are typically implemented in software. However, the Applicants have recognized that the parsing and pattern-matching performance of existing, conventional software implementations can be limited, e.g., due to the way in which the processors implementing the software tend to operate.
Furthermore, newer applications that require parsing and pattern-matching functions that impose significantly higher processing throughput requirements regarding, e.g., the number of characters and/or documents that need to be parsed and/or searched for patterns per unit time, and/or that require significantly reduced latency in the parsing/pattern-matching process are becoming more commonplace.
An example of such applications is emerging applications based on the Extensible Markup Language (XML), which provides, as is known in the art, a standard format to exchange electronic documents. These applications of XML include web-pages, data storage and retrieval, communications protocols, e.g., XML-RPC and SOAP, object serialization, etc. These applications have in common that they require a high-performance parser function for processing the XML-based information.
The Applicants believe therefore that there will increasingly become a need for more efficient parsing and pattern-matching systems.
Objects and Advantages of the Invention
According to a first aspect of the present invention, there is provided an apparatus for pattern-matching characters in a stream of received characters, the apparatus comprising:    a character processing unit comprising means for storing characters, and means for comparing a received input character with one or more stored characters; and    a controller for controlling the character processing unit, the controller including means for receiving an input stream of characters to be pattern-matched and means for controlling the character processing unit to compare characters from the input stream with characters stored by the character processing unit.
According to a second aspect of the present invention, there is provided a method of pattern-matching characters in a stream of received characters, the method comprising:    storing one or more characters in a character processing unit comprising means for storing characters;    selectively providing one or more characters from a received input character stream to the character processing unit; and    controlling the character processing unit to compare a received input character with one or more of the characters stored by the character processing unit.
The pattern-matching system of the present invention includes a character processing unit that is able to store and compare characters under the control of a controller of the system.
The Applicants have recognized that conventional parsing and pattern-matching systems that use general purpose processors can be restricted by the fact that conventional general purpose processors do not handle character and string functions very efficiently. This is because “basic” character handling functions such as encoding, combining characters in a string, string copy, and compare operations, etc., may each require a significant number of instructions to implement them, thereby resulting in decreased performance.
However, by providing a character processing unit that can store and compare characters and that can be used by a controller receiving the character stream, e.g., document, to be parsed or pattern-matched as in the present invention helps to alleviate and overcome these problems, because it allows the character handling functions to be offloaded to the character processing unit, rather than them having to be carried out in software on the main processor of the system such as would be the case with existing software implemented parsing and pattern-matching.
The character processing unit carries out character handling functions, including at least storing and comparing characters. Most preferably the character processing unit can store and retrieve (read and write) characters and character strings, and compare stored characters and/or character strings with input characters or character strings that it receives from the input character stream.
Thus, in a preferred embodiment, the character processing unit includes a memory into which it can write characters, and character strings, and from which it can retrieve characters and character strings for, e.g., comparison purposes.
The character processing unit is preferably able to store characters received in the input character stream for later retrieval, e.g., as those characters are provided to it by the controller.
In a particularly preferred embodiment, characters or character strings can also be pre-stored in the character processing unit, i.e., the character processing unit can be preloaded with characters and character strings as well as storing characters from an input character stream to be pattern-matched. Thus the character processing unit preferably stores one or more predetermined characters or character strings. This may be useful where, e.g., particular, known and predefined character strings may be expected in the input character stream and it is desired to identify such character strings in the input character stream, which, as is known in the art, is a relatively common occurrence in, e.g., programming language parsing.
The way that the character strings are preloaded in the character processing unit can be selected as desired. They could, for example, be stored by providing an appropriate input character stream containing the character strings in question to the character processing unit, or by writing them directly to the character processing unit, e.g., its memory, using a dedicated, e.g., memory, interface.
The character processing unit is preferably able to identify particular strings of characters that it has stored or is storing. Thus it is preferably able to “combine”, e.g., stored, characters into strings. This is preferably done by storing information allowing the set of characters forming the desired character string to be identified. Most preferably, this is done by storing information identifying the boundaries of the character string, such as the start and end characters in the string, e.g., by storing the addresses of the start and end characters in the string. It would also, e.g., be possible to store the data identifying the characters forming a string at a certain memory location/address, and to then use that memory address directly or indirectly as the character string identifier.
Preferably each stored character string is given a unique identity that can be used to identify and retrieve the character string. Most preferably the character string identity is stored in association with the data, e.g., start and end addresses, identifying the characters forming the string. In a particularly preferred embodiment, individual character strings are associated with “tokens”, with each token having a, preferably unique, identifier, thereby identifying the character string, and having stored associated therewith data identifying the stored characters forming the string, preferably in form of the start and end addresses for the character string.
Thus, in a particularly preferred embodiment, the character processing unit includes a memory that stores the character string information, e.g., start and end addresses, relating to each character string (token). As will be appreciated by those skilled in the art, this “token” memory could be a separate memory device to the character-storing memory, or simply part of the same overall memory device.
The character processing unit is accordingly preferably able combine characters into an identifiable character string, e.g., by creating a new character token and storing the start and end address of the character string in the token memory, where provided.
The character processing unit can preferably also select and, e.g., read out, a stored character, character string or selected characters from a stored character string, etc. This is preferably be done by using the character string identifier, e.g., token, stored for the character string. In another preferred embodiment, character strings can also or instead be retrieved on a last-in, first-out (LIFO) basis. The Applicants believe that this latter arrangement may be particularly suited to situations such as can occur in, e.g., XML parsing, where it is desired to check that successive character strings in a given document match each other. Most preferably the character processing unit can also identify if the character it has currently retrieved from its memory is the last character of the current character string.
As discussed above, the character processing unit includes means such as suitable logic for comparing characters with one another. Most preferably a character of a stored character string can be compared with a “current” input character from the input character stream to be analyzed.
The character processing unit is preferably also able to provide an output that can then be used, e.g., as feedback, by the controller to further control the overall input character stream processing operation. Thus, for example, the character processing unit can preferably output the results of each character comparison, e.g., whether characters being compared match, indicate to the controller whether the current character being assessed is the last character of the current character string and/or provide other character or character string related feedback.
The character processing unit can preferably also output an indication of whether the characters being analyzed are of a particular type, e.g., whether a character is part of a certain, e.g., predefined, group of characters, such as “white space”, e.g., space, carriage return, tab, or line feed characters, or if the character is a valid character for use in, e.g., an element, variable or attribute name or value. The character processing unit can also preferably determine, and provide feedback on, encoding of or in the input bit stream, and/or whether other conditions have occurred, such as the end of the input stream being reached, that there are no more character strings stored in its memory, etc.
In a particularly preferred embodiment, as well or instead of being able to provide feedback relating to characters in the input character stream as discussed above, the character processing unit can output characters themselves, e.g., for subsequent inclusion in an output character stream. This could be desirable where, e.g., it is desired to translate character strings in the input character stream into another form, e.g., from XML to HTML, to add new character strings to the input character stream, or to correct errors in the input character stream, etc. Thus in a particularly preferred embodiment the character processing unit can output characters and/or character strings. This could be the current input character or character string, a character or character string retrieved from the character processing unit memory, and/or a character or character string provided by the controller, e.g., as an operand value associated with a given command (see below).
The character processing unit is accordingly preferably able to carry out character and/or character string conversion operations. This could be, e.g., to convert a hexadecimal string representation of a number into the actual number (integer), e.g., to convert, e.g., 0x12AB″into the actual hexadecimal value, and/or, e.g., to convert character encoding, e.g., UTF-8 into UTF-16.
The character processing unit is preferably implemented, so far as possible, in hardware, although it could still be implemented or at least partially implemented in software, where appropriate or desired. Thus in a particularly preferred embodiment, the character processing unit includes a memory unit, character comparison logic and appropriate control logic. It is preferably implemented on a single chip (silicon substrate), although that is not essential.
As discussed above, the system of the present invention also includes a controller that receives the input character stream to be pattern-matched or parsed, and then controls the character processing unit to carry out character comparisons, store and retrieve characters, etc.
Thus the controller can preferably send commands or instructions to the character processing unit, e.g., to command the character processing unit to write and read characters and/or character strings to and from its memory, to compare stored characters or character strings with the input character stream, and/or to output, e.g., stored, characters or character strings. This is preferably done, where appropriate, by the controller providing to the character processing unit the identity of the character string, e.g., token identifier, in question.
These operations are preferably carried out by the controller in response to the characters received in the input character stream. Thus, the controller can preferably assess each character in the input character stream and then selectively, for example, on the basis of whether the input character is of a type that should be compared with a previously received or stored character, control and use the character processing unit on the basis of that assessment.
The controller also preferably receives the outputs from the character processing unit and processes and/or acts upon those outputs accordingly. Thus it can preferably evaluate the “feedback” or result information from the character processing unit, e.g., whether the input character matched a stored character or not, and react thereto.
The controller preferably also controls the overall output of system, e.g., whether the input character stream is simply output in the form that it is received, whether parts of the input character stream are deleted or replaced with other characters or character strings, whether characters or character strings stored by the character string processing unit are inserted in the output stream (as discussed above), etc. Thus in a particularly preferred embodiment the controller is operable to perform one or more of the following output functions: write the current input character to the output character stream; write a character or character string from the character processing unit, e.g., the “current” character in the character memory, to the output stream; output information otherwise generated by the controller, e.g., by writing it to the output character stream; and/or provide no output at all.
The controller also controls the provision of the input character stream to, e.g., the character processing unit and/or to the output of the system. In a particularly preferred embodiment, the controller is able to delay or pause the processing and input of the input character stream. This would make it possible, e.g., to process a single input character using a function that takes longer to execute and/or using multiple functions that are executed sequentially. It would also, e.g., facilitate the insertion of additional information within an input character stream that is, e.g., being “copied” to the output of the system.
The controller itself can be any suitable device, e.g., processor, that can control the operation of the character processing unit and operate as described above. It is preferably programmable. In a particularly preferred embodiment, the controller is a “fast” device that can control, and respond to, the character processing unit substantially in “real time”, e.g., can preferably respond to outputs of the character processing unit within one or only a few clock cycles.
As discussed above, the controller receives the input character stream and operates to analyze and pattern-match it, using the character string processing unit where appropriate. It preferably does this by evaluating each character in the input stream in turn.
Most preferably the controller can evaluate multiple conditions for, e.g., a given character and then select a corresponding action all in a relatively short time period. Most preferably it can evaluate multiple conditions for, e.g., a given character, in parallel and/or simultaneously. This is all most preferably done within a single clock cycle.
This facilitates finer grain control of the “instruction execution flow”, which the Applicants have recognized is desirable to allow more efficient evaluation of multiple conditions that can typically occur at the level of individual characters, as well as strings of consecutive characters, in the overall “stream” of characters that is to be parsed or pattern-matched. This should be contrasted with a more “conventional” software approach, in which conditions can typically only be evaluated one at a time and are typically used to control conditional branch functions, e.g., jump on zero, which means that conditions will typically only affect the “instruction execution flow” at a coarse granularity of multiple blocks of sequentially executed instructions.
In a particularly preferred embodiment, the controller is in the form of a programmable state machine. The use of a programmable state machine is advantageous, because, as is known in the art, a programmable state machine can evaluate multiple conditions in parallel and select a corresponding action, typically within a single clock cycle, which as discussed above is particularly advantageous for pattern-matching and parsing applications.
Indeed, the Applicants believe that the provision of a programmable state machine in combination with a character processing unit as in the present invention is particularly advantageous as this can provide a pattern-matching system that is both programmable and that can achieve high performance through tight control of the character and character string handling functions by supporting fast evaluation of multiple conditions in parallel and reaction thereto, which features are important for many parsing and pattern-matching applications.
Thus according to a third aspect of the present invention, there is provided an apparatus for performing pattern matching of an input character stream, comprising:    a character processing unit that can store characters and compare characters provided to it with characters that it has stored; and    a programmable state machine for receiving the input character stream and for controlling the character processing unit to compare characters in the input character stream with characters stored by the character processing unit.
According to a fourth aspect of the present invention, there is provided a method of performing pattern-matching of an input character stream, the method comprising:    receiving the input character stream at a programmable state machine; and    the programmable state machine controlling a character processing unit that has stored one or more characters to compare characters in the received input character stream with a character or characters stored by the character processing unit.
These aspects and embodiments of the present invention can, as will be appreciated by those skilled in the art, include any one or more or all of the preferred and optional features of the invention described herein.
Where the controller is implemented as a programmable state machine then any suitable programmable state machine design can be used.
However, in a particularly preferred embodiment, the programmable state machine transition rules that include a ternary test vector, e.g., in the form of a test value/mask that tests for bit values “0”, “1”, or “wildcard” (“don't care”), that is compared against the current state register value and, optionally, e.g., other conditions, such as, e.g., the current (character) input value. The matching transition rule with the highest priority is then selected as the state transition to be triggered by the input character and used to determine the next state. In a preferred arrangement, plural so-called state spaces are used to facilitate the use of state registers with a limited fixed size, thereby increasing the efficiency of the implementation (this will be discussed further below).
Thus, in a particularly preferred embodiment of the present invention, the controller is in the form of a programmable state machine in which state transitions are represented as a list of state transition rules that involves match operators and priorities, with the next state and output being determined by searching the state transition rule list for the highest priority state transition rule that matches the current state and input. Preferably the state transition rules are in the form of ternary test vectors. Preferably the state transition rules involve wildcards, e.g., “don't care” conditions, and/or priorities. This arrangement provides a set of state transition rules for the programmable state machine that is more efficient than in conventional programmable state machines.
The highest priority state transition rule is preferably searched for using a form of the BaRT algorithm (as described, e.g., in J. van Lunteren, “Searching very large routing tables in wide embedded memory,” Proceedings of the IEEE Global Telecommunications Conference GLOBECOM'01, vol. 3, pp. 1615-1619, San Antonio, Tex., November 2001). This further reduces the state transition rule storage requirements. Thus, in a preferred embodiment, the transitions (rules) are selected using a form of the BaRT algorithm.
Where the BaRT algorithm is being used, the encoding of the states, which will be discussed in more detail below, is preferably such that all the encoding bit positions that are part of the hash index determined using the BaRT algorithm are at consecutive positions in the encoding vectors for the states, as that allows the bits that form the hash index to be extracted more easily from the state vector, e.g., by performing a mask operation, e.g., bitwise AND operation with a vector, on the state value
It is also preferred to partition the state transition diagram into multiple smaller segments, i.e., to distribute the possible states over multiple state spaces implemented using separate state transition rule lists. In this case, each state-transition rule is, e.g., extended with the index mask and a base address pointer of the appropriate state transition rule list for the next state indicated by the state-transition rule.
The way that the controller operates to control and use the character processing unit in response to the input character stream can be selected as desired. As discussed above, the controller preferably “tests” or evaluates characters in the input stream (in turn) and operates and controls the character processing unit accordingly in response to the test result. Thus, for example, where the controller is in the form of a programmable state machine, the form of the input character could be used to trigger a particular state transition, as is known in the art. The input character can preferably also trigger the sending of a command, etc., to the character processing unit, if appropriate.
Thus, for example, where the controller is in the form of a programmable state machine, the state transition rules preferably additionally include, e.g., a command field that can be used to indicate a required operation of the character processing unit if, e.g., particular test criteria, such as a current state and input character conditions, are met. Thus, for example, the state transition rules preferably further include one or more of a test option field, command option field, e.g., for the character processing unit, and an operand field, to facilitate control of the character processing unit in response to the received input character stream.
It is believed that such arrangements may be new in the context of programmable state machines generally. Thus, according to a fifth aspect of the present invention, there is provided a programmable state machine in which one or more state transitions can cause the programmable state machine to send a command or instruction to a processing unit under the control of the programmable state machine.
According to a sixth aspect of the present invention, there is provided a method of creating a data structure for a programmable state machine, comprising:    deriving and storing a set of state transitions for the state machine;    wherein one or more of the stored state transitions can cause the programmable state machine to send a command or instruction to a processing unit under the control of the programmable state machine.
This aspect of the invention can include any one or more or all of the preferred and optional features of the invention described herein. Thus, for example, the state transitions are preferably represented by state transition rules, with one or more of the state transition rules accordingly including a command field or portion that can be used to control the issuing of a command or instruction to an associated processing unit, such as a character processing unit in accordance with the present invention
It is also preferred for the controller to be able to control the input character stream, e.g., to “hold” or pause the input of characters to the controller. This would allow, e.g., the same input character to be processed in multiple consecutive cycles, or the input stream to be paused (held) while executing functions that last multiple cycles. Where the controller is in the form of a programmable state machine, this is again preferably facilitated by including an appropriate command field in the state transition rules.
Thus according to a seventh aspect of the present invention, there is provided a programmable state machine in which one or more state transitions can cause the programmable state machine to pause the input of data to the programmable state machine.
According to an eighth aspect of the present invention, there is provided a method of creating a data structure for a programmable state machine, comprising:    deriving and storing a set of state transitions for the state machine;    wherein one or more of the stored state transitions can cause the programmable state machine to pause the input of data to the programmable state machine.
This aspect of the invention can again include any one or more or all of the preferred and optional features of the invention described herein. Thus, for example, the state transitions are preferably represented by state transition rules, with one or more of the state transition rules accordingly including a command field or portion that can be used to pause or hold the input, e.g., of characters from the input character stream, to the programmable state machine.
In a particularly preferred embodiment where a programmable state machine is being used, a mechanism is provided whereby the process can be sent from one or more states to a subset or subroutine of state transitions that may be common to different “locations” within the overall state diagram. Thus preferably there is a common set of states/transitions that can be invoked from different locations within the state diagram, with the system then returning to its original location, or, indeed, a different location, once the subset of state transitions has been completed. This provides a form of procedure call and return mechanism for common sets of states/state transitions that would otherwise have to be “stored” for multiple different locations within the overall state diagram.
This function is preferably achieved by using a stack for state space addresses in which the current state space address and a state register value can be stored (“pushed”) for later retrieval (“popping”) once the invoked subset (subroutine) of states/transitions has been completed to allow the system to return to the desired part of the overall state diagram. In a preferred such embodiment, a “state space” identification, e.g., identifying the relevant part or segment of the overall state diagram (as discussed above), and a corresponding mask are pushed/popped onto/from the state stack.
In these embodiments, the return state from the procedure call, i.e., the state that is returned to, could, e.g., be the location from which the procedure originally jumped, i.e., from where the procedure call was made In a preferred embodiment it would also or instead be possible to select a different location for the system to return to. Thus, preferably, the system provides a means of selecting or varying the return location. It would also instead be possible to, e.g., fix the return state, e.g., to state SO, for some or all “procedure calls”. In this latter case, there would be no need to store the “return” state in the state block.
It is again believed that these arrangements may be new in the context of programmable state machines generally. Thus, according to a ninth aspect of the present invention, there is provided a programmable state machine comprising means for invoking a single common set of state transitions from more than one location in the state diagram that the state machine is programmed to represent.
According to a tenth aspect of the present invention, there is provided a method of creating a data structure for a programmable state machine, comprising:    deriving and storing a set of state transitions for the state machine; and    deriving and storing a plurality of other state transitions that will invoke the stored set of state transitions.
These aspects and arrangements of the present invention can again include any one or more or all of the preferred and optional features of the invention described herein. Thus, for example, the state transitions are preferably represented as (sets of) state transition rules, preferably involving wildcards and/or priorities, and, most preferably, the BaRT algorithm is used for searching the state transitions (rules).
Such arrangements involving “procedure calls” and a state stack could also be viewed as the system comprising multiple finite state machines that each, e.g., implement a given procedure, with one finite state machine being active at any given time. A different finite state machine could then be activated (called) from the “current” finite state machine by a given state transition.
In this case, a “procedure call” would accordingly involve calling another finite state machine while the current active finite state machine and a local return state within that current finite state machine would be stored for later retrieval, i.e., pushed on the state stack
Then, when the new, called finite state machine reaches the “return” state transition, the stored finite state machine identity and local state can be retrieved (popped) in order to return the system to the original, calling finite state machine and a desired local state within that finite state machine. In another such arrangement, the return state within the “calling” finite state machine could, e.g., be predetermined or fixed, in which case would not be necessary to store the local return state in the state stack.
Where the controller is in the form of a programmable state machine that uses state transition rules to represent state transitions, then in a particularly preferred embodiment, the state transition rules can be of a plurality of different types, with each type of rule involving different test conditions. For example, one type of rule could involve test conditions relating to the current state and input character (as discussed above), and another type of rule could instead relate, e.g., to error conditions such as memory overflow situations, that may, e.g., not be particular to any given state or input character In such an arrangement, the highest priority matching transition rule is again preferably determined, but in order to determine if a rule is matching, different conditions will be evaluated, depending on the test conditions of each rule An arrangement in which one set of rules relates to error conditions could be used, e.g., to make a transition into a certain error state upon the occurrence of an error, irrespective of the current state and input.
It is again believed that such an arrangement may be new and advantageous in the context of programmable state machines more generally. Thus, according to an eleventh aspect of the present invention, there is provided a programmable state machine in which state transitions are represented by state transition rules, wherein one or more of the state transition rules include one set of test condition types, and one or more other of the state transition rules include a different set of test condition types.
According to a twelfth aspect of the present invention, there is provided a method of creating a data structure for a programmable state machine, comprising:    deriving and storing a set of state transitions for the state machine that include one set of test condition types; and    deriving and storing another set of state transitions for the state machine that include a different set of test condition types.
These aspects of the present invention can again include one or more or all of the preferred and optional features of the invention described herein. Thus, for example, one of the sets of test condition types is preferably dependent on the current state and/or current input value, and the other set of test condition types is preferably additionally or instead dependent on an error condition.
This arrangement of the state transition test conditions facilitates in particular the use of state transition rules that can be considered to be “global” in nature, i.e., that, in contrast to “normal” state transition rules that are specifically related to a particular state space, and can accordingly be regarded as “local” rules, are not related to a specific state space, but instead apply more generally across the state diagram, together with more “local” rules that are dependent on a particular state. An example of a more “global” rule, might be error condition dependent rules, as discussed above.
The use of such global state transition rules avoids, e.g., the need to store each “global” rule multiple times, once for each state space where it might apply, which might otherwise particularly be necessary where the state machine uses plural state spaces as discussed above. The global transition rules are accordingly preferably only inserted once in the state diagram data structure. The use of global transition rules in this way also facilitates more flexible and storage-efficient implementation of programmable state machines.
Thus, according to a thirteenth aspect of the present invention, there is provided a programmable state machine which includes state transition rules that are specifically related to particular states and state transition rules that do not relate to any particular state.
According to a fourteenth aspect of the present invention, there is provided a method of creating a data structure for a programmable state machine comprising:    deriving state transition rules that are specifically related to particular states; and    deriving state transition rules that do not relate to any particular state.
In the above aspects and arrangements of the invention, the two or more different types of state transition rules, e.g., “global” and “local” rules, are preferably evaluated in parallel, and are preferably evaluated separately.
As discussed above, a priority scheme arrangement is preferably further used for selecting which rule is to be used to control the transition to a new state, in the event that two or more of the different rule types, e.g., both a “global” transition rule and a “local” transition rule, are found to be matched. For example, “error condition” rules could be given the highest priority so as to ensure that an error condition is always responded to.
Thus the programmable state machine preferably includes some form of rule selector for supporting the multiple types of transition rules. This could, e.g., be in the form of separate transition rule memories and rule selectors which operate in parallel, with a multiplexer then selecting between, e.g., the highest priority matching transition rule or rules found for each transition rule type by each (individual) rule selector. Additionally or alternatively, if the number of transition rules of a certain type is relatively small, for example covering a limited set of error conditions, then that set of rules could, e.g., be implemented directly in a set of registers with corresponding comparator functions.
In use of the present invention, the input character stream, e.g., document to be parsed, will be provided to the controller which will then assess each received character and carry out operations in response to the character, such as, for example, providing it to the character processing unit for storage and/or comparison, performing another operation, or providing the input character to the output character stream, etc. It will also monitor any feedback signals from the character processing unit and operate accordingly, for example to accept or reject the input document, cause the character processing unit to write characters to the output character stream, etc. The controller preferably also carries out a lexical analysis of the input character stream to, e.g., divide it into selected character strings.
The present invention can be implemented as desired. As will be appreciated by those skilled in the art, it will find particular application in servers and network systems, particularly where, e.g., it is desired to, e.g., parse many XML documents in quick succession. Thus the present invention also extends to a computer system and to a server including any of the above aspects of the present invention.
The present invention can be used whenever it is desired to carry out pattern-matching a stream of characters, such as for parsing. As discussed above, it is believed that the present invention will be particularly, but not exclusively, useful for parsing of XML documents. Thus the present invention also extends to the use of the methods and apparatus of the present invention for XML parsing.
The methods in accordance with the present invention may be implemented at least partially using software e.g., computer programs. It will thus be seen that when viewed from further aspects the present invention provides computer software specifically adapted to carry out the methods hereinabove described when installed on data processing means, and a computer program element comprising computer software code portions for performing the methods hereinabove described when the program element is run on data processing means. The invention also extends to a computer software carrier comprising such software which when used to operate a pattern-matching or parsing system comprising data processing means causes in conjunction with said data processing means said system to carry out the steps of the method of the present invention. Such a computer software carrier could be a physical storage medium such as a ROM chip, CD ROM or disk, or could be a signal such as an electronic signal over wires, an optical signal or a radio signal such as to a satellite or the like.
It will further be appreciated that not all steps of the method of the invention need be carried out by computer software and thus from a further broad aspect the present invention provides computer software and such software installed on a computer software carrier for carrying out at least one of the steps of the methods set out hereinabove.
The present invention may accordingly suitably be embodied as a computer program product for use with a computer system. Such an implementation may comprise a series of computer readable instructions either fixed on a tangible medium, such as a computer readable medium, for example, diskette, CD-ROM, ROM, or hard disk, or transmittable to a computer system, via a modem or other interface device, over either a tangible medium, including but not limited to optical or analogue communications lines, or intangibly using wireless techniques, including but not limited to microwave, infrared or other transmission techniques. The series of computer readable instructions embodies all or part of the functionality previously described herein.
Those skilled in the art will appreciate that such computer readable instructions can be written in a number of programming languages for use with many computer architectures or operating systems. Further, such instructions may be stored using any memory technology, present or future, including but not limited to, semiconductor, magnetic, or optical, or transmitted using any communications technology, present or future, including but not limited to optical, infrared, or microwave. It is contemplated that such a computer program product may be distributed as a removable medium with accompanying printed or electronic documentation, for example, shrink-wrapped software, pre-loaded with a computer system, for example, on a system ROM or fixed disk, or distributed from a server or electronic bulletin board over a network, for example, the Internet or World Wide Web.