In database data processing systems, it is often desirable to provide efficient, high-speed data access and searching capabilities for data stored in a database. A typical database system provides an index mechanism for accessing records of data in the database, without having to search through each element of data stored in each data record. There are many database indexing, accessing and searching techniques in widespread use.
Tree-based indexes are one form of indexing and searching mechanism. In tree-based index database systems, a common data field in each database record is used as a keyword to create the index. The index is organized as a tree data structure, having a head node where searches begin, and one or more branch nodes referenced from the head node. All other nodes below the head node may also contain one or more branches referring to other nodes. Each index node contains one or more pointers, such as record numbers, to that node's respective data record within the database.
To search the tree index, a search value is provided by a user or program. The search value is then compared with node values beginning with the head node. At each node in the tree, if the search value occurs, for example, alphabetically before the current node's value, one branch may be followed to the next node, but if the search value occurs alphabetically after the current node's value, another branch to a different node may be taken. If the search value and node value are equal, a matching node has been found. The matching node's corresponding database record reference is used to retrieve the matching search data from the database.
These and other database tree-structure searching algorithms have been commonly used in the prior art for the purpose of searching databases. Other specific examples of such searching algorithms include Apostolico, Galil, and Oxford pattern matching algorithms. While such techniques are common in conventional database searching, they have not been applied to other technologies such as scanning data for virus signatures.
The generation and spread of computer viruses is a major problem in modern day computing. Generally, a computer virus is a program that is capable of attaching to other programs or sets of computer instructions, replicating itself, and performing unsolicited or malicious actions on a computer system. Generally, computer viruses are designed to spread by attaching to floppy disks or data transmissions between computer users, and are designed to do damage while remaining undetected. The damage done by computer viruses may range from mild interference with a program, such as the display of an unwanted political message in a dialog box, to the complete destruction of data on a user's hard drive.
It is estimated that new viruses are created at a rate of over 100 per month. This rate has resulted in a need for tens of thousands of virus signatures to be searched in suspect data. This, in turn, has resulted in virus searching algorithms requiring a large amount of time and computer resources when scanning for virus signatures. There is thus a need for the application of advanced techniques to optimize the virus scanning process.