In computers, searches for a predetermined piece of or unit of data, from a data base in the form of a character string or the like, are frequent. Such a search is often executed by software, but, for shorter processing times, it is can be accomplished by hardware. A conventional search circuit for accomplishing this is illustrated in FIG. 17A.
The search circuit 500 illustrated in FIG. 17A is comprised of by a plurality of registers 502, a shift register 504, a plurality of comparators 506, and an AND circuit 508.
Retrieval data So through Sn are stored in the plurality of registers 502. The shift register 504 holds a portion of data to be searched Do . . . . The comparators 506 compare the data held in the registers 502 and the data held in the shift register 504 and when a match is obtained a high-level comparing signal is fed from the respective comparators 506 to the AND circuit 508.
In conventional associative memory, i.e.,content addressable memory (CAM) when a data set is designated, the address of the storage region in which the data is stored is the output. A data search device using a conventionally-proposed associative memory (e.g., Japanese Patent Application Laid-Open No. 2-66671), is illustrated conceptually in FIG. 17B, as circuit 510. Here the registers 502 and comparators 506 of FIG. 17A are replaced by associative memory portions 514.
The associative memory portions 514, comprised of comparing circuits, store data of a predetermined length, and compares the stored data with input data. A search is effected by storing data in the associative memory portion 514 and comparing it to input data held in the shift register 504, and the output fed to a precharged match line 516 which is set low, i.e., discharged, when no match is found.
Accordingly, the match line 516 is maintained high (i.e., charged) only when the comparison results of all of the associative memory portions 514 are "matching". Therefore, a high level on the match line 516 is determinative of the existence of a match.
However, when data of an undefined length (hereinafter, "variable-length data") is searched for, it is necessary to correspond the length of data which can be stored by the shift register 504 to the maximum length of the search data, and to correspond the number of associative memory portions 514 to the maximum length of the search data. However, when the search data is short, all the circuits are not required and the excess circuits uselessly consume electric power.
In order to prevent the comparison outputs from the unnecessary associative memory circuits from influencing the results of the search circuit 510, it is necessary, for example, to add circuits effecting "Don't Care" (DC) processing in which either "matching" comparison results are compulsorily provided from the unnecessary associative memory circuits, or in which the comparison outputs are ignored and the number of relevant associative memory circuits is varied in accordance with the length of the search data. For associative memory devices having such a DC processing function, it is difficult to design an integrated circuit structure having a high degree of integration, and as a result, it is difficult for the apparatus to be made compact.
Accordingly, various drawbacks arise when a search for variable-length search data is to be effected by using associative memories.
Moreover, when a search for fixed-length data of a set length is undertaken, the search time increases in proportion to the length of the data to be searched.
Now however with multimedia technology there is a need to handle large amounts of various data such as image data, voice data, document data, programs and the like for various systems. It is very advantageous to store large amounts of data and to transmit the data after compression. Consequently, the needed importance of data compression techniques has rapidly increased.
Data compression techniques can be roughly classified into two types: lossy compression and lossless compression. Although the compression rate in lossy compression is high, information is lost in the processes of compressing data and restoring the compressed data. Therefore, lossy compression can only be applied to specific fields. On the other hand, although the compression rate of lossless compression is low, information is not lost in the processes of data compression and restoration of the compressed data. Because the restored data completely matches the original data before compression, lossless compression has a wide range of applicability.
In 1977, Lempel and Ziv proposed a universal lossless compression algorithm LZ77 based on the dictionary technique. The compression rate of LZ77 is high as compared with entropy codes such as the well-known Huffman code. LZ77 basically searches for repeating data, which is included among original data, by replacing the repeating data by another code and compressing it by eliminating redundancy. In LZ77, it is relatively easy to restore the compressed data, however, because the repeating data is of undefined variable, much work is required in searching for the repeating data when compressed.
Other software techniques also use data compression. In these techniques, original data is converted into tree structure data, and a search for the repeating data is made.
However, because the algorithm for this approach (converting data into tree structure data) is extremely complicated, much time is required for processing and a large load is placed on the CPU.
To avoid these drawbacks, it has been proposed to replace these software techniques by hardware. However, this cannot be realized easily for the time required for compression is not uniform because the tree structures differ in that they are based on the contents of the original data.