1. Field of the Invention
The present invention relates to data security, and more specifically, relates to recognition of a data pattern in a data stream.
2. Description of Related Art
As the population of communication networks users grows at a rapid rate, it is expected that the networks be capable of delivering data more effectively. In other words, how to utilize the transmission bandwidth efficiently is a key upon which the success of the communication networks heavily relies. Obviously, an economic way to utilize limited bandwidth efficiently is to send smaller amount of data by using data compression mechanisms. Accordingly, compressed pattern matching (CPM) that performs pattern search directly on the compressed data without initial decompression gains more and more attention.
Compressed pattern matching (CPM) is an emerging research field addressing the problem: given a compressed sequence and a pattern, find the pattern occurrence(s) in the (uncompressed) sequence with minimal (or no) decompression. It can be-applied to detection of computer virus and confidential information leakage in compressed files directly.
Since Lempel-Ziv-Welch (LZW) compression algorithm is one of the most effective and popular lossless compression algorithms, CPM in LZW compressed sequences is quite important. In the last decade, many related researches have been conducted. The first CPM algorithm which finds the first pattern occurrence in an LZW compressed file was developed and it employs an algorithm that takes O(n+m2) time and space, where n and m are, respectively, the lengths of the compressed text and the pattern. However, with different implementations, one can trade between the amount of extra space used and the algorithm's time complexity. This algorithm is referred to as the Amir-Benson-Farach (ABF) algorithm.
There were proposals to extend the ABF algorithm to find all pattern occurrences. The basic idea is to use a flag to indicate that complete pattern occurs inside a compressed data block, in addition to checking pattern occurrences across two consecutive blocks. However, the algorithm cannot tell how many occurrences are there inside a block. Another CPM algorithm was proposed to do decompression and pattern matching on the fly. The drawback of the algorithm is its high computation complexity because it still needs partial decompression. Another algorithm, Navarro-Raffinot (NR), presents a general scheme to find all pattern occurrences in sequential blocks and realizes the scheme by using the technique of bit-parallelism. This scheme can be applied to LZ-family compression algorithms such as LZW and LZ77.
Notwithstanding all the algorithms that have been proposed, the implementation of these algorithms involves complex computation and demands complex hardware and large memory. Therefore, it is desirous to have an apparatus and method that finds occurrences of a specified pattern in a sequence of compressed data with no decompression, and it is to such apparatus and method the present invention is primarily directed.