Traditional databases have historically been used as a convenient tool for organizing large amounts of data in a structured and easily searchable format. Unfortunately, in order to provide this level of convenience, traditional databases are structured in such a way that their utility is considerably limited in several applications, especially those requiring fast access time and protection of highly valuable data when the database is copied to a new location. For example, applications such as those involved in the field of automated data capture and perfection frequently compare data contained in an input stream to reference data contained in a database. Often the reference data contained in the database has been accumulated at great expense, representing an extremely valuable resource to the developer of the database. To capitalize on this value, the developer must protect the information in the database from being copied or extracted.
Unfortunately, modern automated data capture and perfection systems, such as those used in mail-sorting operations, typically handle input streams with items passing at a rate of between ten and twenty per second. In order to provide convenience and flexibility and to allow the user to access data records at the desired rate, traditional databases have made the data available in a clear text form searchable with a general query language. As used throughout this specification, the term “general query language,” or the like, refers to query language of a general nature, such as the industry standard structured query language (SQL). These traditional databases can be very large, requiring vast amounts of memory for storage. Also, in order to allow query language of a general, user-definable nature, traditional databases are heavily indexed, often with each data record referencing or pointing to multiple other data records. This also makes the database much larger and requires more memory, with the result being slower access time.
FIG. 1 shows a traditional database system typical of the prior art. With reference to FIG. 1, a customer-defined query 10 is entered into a general query language module 12. The general query language module 12, which supports general query language, interacts with the database server 14. The database server 14 can also interact with a transaction module and locking module 16 or a metadata table 15. The metadata table 15 is typically a catalogue of database contents. The database server 14 then accesses the database 18 and any of several data tables 19 potentially within the database 18. In a traditional database system, the database 18 typically stores data in a clear text format. Because a traditional database 18 has to support a wide variety of possible queries that can be constructed by the database customer using the general query language module 12, the traditional database 18 usually requires dedicating large amounts of memory to data structure storage and indexing to support general queries. Also, the ability of the customer to create a customer-defined query 10 requires enabling multiple searches and makes the overall database 18 insecure when installed at the database customer's site by allowing the customer to use a customer-defined query 10 to extract data methodically from the database 18.
Having a traditional database that is in clear text or that allows general query language provides a greater source of concern for the developer of the database when the database is distributed or copied beyond the site at which it was developed. Enabling general queries and storing the data records in clear text allow a user of the database to extract or copy the data from the database. Once the data has been extracted, the value of the original database is minimal. Traditional technological attempts to remedy this problem, such as through using data encryption, have typically been unsuccessful for database developers because the data records must be decrypted before they can be accessed, thus reducing the access time of the records too significantly for many high-speed database applications. Once the data is decrypted, the user can still use general queries or other methods to extract the data. Accordingly, an operable traditional database is typically only secure at the site of its creation. When created, a traditional database can be encrypted or stored securely; it can even be encrypted for transport to a customer. However, once a traditional database is provided to a customer in an operational form, the customer can either copy the entire database or use general query language to extract the valuable data.
There are numerous additional problems with traditional databases. For example, once a database has been given to a user, it is often difficult to repossess or restrict future access to the database once the term of use has expired. Another problem with traditional databases is that they require persistent efforts to keep the database records current. Mass updates of new records or changes in data structure can be time consuming and difficult to incorporate into an existing database because of the elaborate indexing system and interconnectedness of the various existing data records. Significant system downtime is typically required to update the data records. Even more downtime is required to install the updated database for the user and to make sure that all of the user's applications function with the updated database. Such downtime can be very costly to all parties involved.
Finally, the inability of traditional databases to allow the use of customized indexing methods limits their usefulness for applications requiring fast access for verifying input stream data with a high degree of uncertainty or distortion. Most traditional databases are indexed for doing exact matching of data fields as much as possible. More complex databases can use wildcards, such as an asterisk, to do leading-edge matching as well. For example, a user can query the database to search for the word “database” and an exact search will bring back the data record for “database.” In a leading-edge search, the user can query the database to search for “dat*” and the database will supply results including “data,” “database,” “datum,” etc. However, if the user were to query the database to search for the word “*bas*,” a traditional database would have to search every record in the data table or index to supply any results that contain the search string “bas.” That procedure would make a traditional database with a standard indexing structure too slow to be useful for many high-speed database applications. The present invention solves all of these problems associated with the use of traditional databases in a simple and efficient manner.