1. Field of the Invention
Aspects of the present invention are directed generally to methods of providing efficient storage and, more particularly, to methods of providing efficient storage for finite state machines.
2. Description of the Background
Generally, finite state machines are used in language processing, such as natural language processing, for a wide range of tasks including the processing of dictionaries and expressions. Traditional methods of storing finite state machines, however, suffer either from excessive memory usage or from poor performance.
The diagram shown in FIG. 1 is a graphical description of a finite state machine, such as a sequential finite state transducer, which can be used to translate the names of the days of the week into a numeric counterpart. The circles define the states of the machine and the arcs define transitions.
With reference to FIG. 1, a method of storing the data of the states and the transitions includes the use of a transition matrix, a production matrix and a finality vector, as shown in FIGS. 2A, 2B and 2C, respectively, which reflect an assumption that that automation for the states and the transitions starts with State 1.
As may be seen from FIGS. 2A and 2B, both the transition and production matrices are sparsely populated. Therefore, it may be seen that each matrix wastes a relatively large amount of memory. In the particular case of the production matrix, it is sparse even with respect to the transition matrix.
As a solution to these and other problems, it has been seen that data compression may be used to reduce the sizes of the transition and production matrices. In particular, in accordance with one method of compressing the transition matrix so as to reduce the amount of required memory for the transition matrix, a row-compression storage method is generally employed. In this method, all non-zero data (e.g., each transition) is placed in an array along with some extra information needed to identify the state and character a transition refers to. This format uses an amount of storage that is proportional to the numbers of states and transitions, but finding the transition that is possible for a given state and input character in this case requires a searching operation to be performed. This requirement downgrades the performance of this solution.
Another solution, called row-displaced storage, overlays rows of the matrix to form a single array, and includes a second array of input characters as a means of identifying the transitions that belong to a state. Performance of this solution is similar to that of the performance of matrix storage, and the space requirements are often similar to, and sometimes even smaller than, the row-compressed storage. Unfortunately, sometimes the rows of the transition matrix do not fit well together, especially in the cases where the input alphabets are large.