The present invention relates to the field of controlled-access database systems for securely maintaining data records in an otherwise unsecure location.
Traditional databases have historically been used as a convenient tool for organizing large amounts of data in a structured and easily searchable format. Unfortunately, in order to provide this level of convenience, traditional databases are structured in such a way that their utility is considerably limited in several applications, especially those requiring fast access time and protection of highly valuable data when the database is copied to a new location. For example, applications such as those involved in the field of automated data capture and perfection frequently compare data contained in an input stream to reference data contained in a database. Often the reference data contained in the database has been accumulated at great expense, representing an extremely valuable resource to the developer of the database. To capitalize on this value, the developer must protect the information in the database from being copied or extracted.
Unfortunately, modern automated data capture and perfection systems, such as those used in mail-sorting operations, typically handle input streams with items passing at a rate of between ten and twenty per second. In order to provide convenience and flexibility and to allow the user to access data records at the desired rate, traditional databases have made the data available in a clear text form searchable with a general query language. As used throughout this specification, the term xe2x80x9cgeneral query language,xe2x80x9d or the like, refers to query language of a general nature, such as the industry standard structured query language (SQL). These traditional databases can be very large, requiring vast amounts of memory for storage. Also, in order to allow query language of a general, user-definable nature, traditional databases are heavily indexed, often with each data record referencing or pointing to multiple other data records. This also makes the database much larger and requires more memory, with the result being slower access time.
FIG. 1 shows a traditional database system typical of the prior art. With reference to FIG. 1, a customer-defined query 10 is entered into a general query language module 12. The general query language module 12, which supports general query language, interacts with the database server 14. The database server 14 can also interact with a transaction module and locking module 16 or a metadata table 15. The metadata table 15 is typically a catalogue of database contents. The database server 14 then accesses the database 18 and any of several data tables 19 potentially within the database 18. In a traditional database system, the database 18 typically stores data in a clear text format. Because a traditional database 18 has to support a wide variety of possible queries that can be constructed by the database customer using the general query language module 12, the traditional database 18 usually requires dedicating large amounts of memory to data structure storage and indexing to support general queries. Also, the ability of the customer to create a customer-defined query 10 requires enabling multiple searches and makes the overall database 18 insecure when installed at the database customer""s site by allowing the customer to use a customer-defined query 10 to extract data methodically from the database 18.
Having a traditional database that is in clear text or that allows general query language provides a greater source of concern for the developer of the database when the database is distributed or copied beyond the site at which it was developed. Enabling general queries and storing the data records iii clear text allow a user of the database to extract or copy the data from the database. Once the data has been extracted, the value of the original database is minimal. Traditional technological attempts to remedy this problem, such as through using data encryption, have typically been unsuccessful for database developers because the data records must be decrypted before they can be accessed, thus reducing the access time of the records too significantly for many high-speed database applications. Once the data is decrypted, the user can still use general queries or other methods to extract the data. Accordingly, an operable traditional database is typically only secure at the site of its creation. When created, a traditional database can be encrypted or stored securely; it can even be encrypted for transport to a customer. However, once a traditional database is provided to a customer in an operational form, the customer can either copy the entire database or use general query language to extract the valuable data.
There are numerous additional problems with traditional databases. For example, once a database has been given to a user, it is often difficult to repossess or restrict future access to the database once the term of use has expired. Another problem with traditional databases is that they require persistent efforts to keep the database records current. Mass updates of new records or changes in data structure can be time consuming and difficult to incorporate into an existing database because of the elaborate indexing system and interconnectedness of the various existing data records. Significant system downtime is typically required to update the data records. Even more downtime is required to install the updated database for the user and to make sure that all of the user""s applications function with the updated database. Such downtime can be very costly to all parties involved.
Finally, the inability of traditional databases to allow the use of customized indexing methods limits their usefulness for applications requiring fast access for verifying input stream data with a high degree of uncertainty or distortion. Most traditional databases are indexed for doing exact matching of data fields as much as possible. More complex databases can use wildcards, such as an asterisk, to do leading-edge matching as well. For example, a user can query the database to search for the word xe2x80x9cdatabasexe2x80x9d and an exact search will bring back the data record for xe2x80x9cdatabase.xe2x80x9d In a leading-edge search, the user can query the database to search for xe2x80x9cdat*xe2x80x9d and the database will supply results including xe2x80x9cdata,xe2x80x9d xe2x80x9cdatabase,xe2x80x9d xe2x80x9cdatum,xe2x80x9d etc. However, if the user were to query the database to search for the word xe2x80x9c*bas*,xe2x80x9d a traditional database would have to search every record in the data table or index to supply any results that contain the search string xe2x80x9cbas.xe2x80x9d That procedure would make a traditional database with a standard indexing structure too slow to be useful for many high-speed database applications. The present invention solves all of these problems associated with the use of traditional databases in a simple and efficient manner.
Embodiments of the present invention relate to a limited-access modular database designed to regulate access to the data records within the database while affording rapid data access rates and reduced requirements for data storage memory. The database design allows for efficient incorporation of high volumes of updated data records, and affords database designers a convenient method for obtaining updated information from an input stream with which the database is being used.
In accordance with the present invention, data is first obtained and stored within a clear text database. The data within the clear text database is then restructured, reorganized, and incorporated into a crystallized database. Throughout this specification and the attached claims, the term xe2x80x9ccrystalxe2x80x9d or xe2x80x9ccrystallizedxe2x80x9d is used to connote the structured and compact nature of the data records in the database. The crystallized database may include one or more data crystals. Similarly, each data crystal can contain a plurality of data records, which, in turn, can be stored in multiple data tables. Data records within an individual data crystal can be linked to various other data crystals via pointers or indices. Once created, the data crystals, including the data records within the data crystals, are obfuscated. As used in this specification and the attached claims, the terms xe2x80x9cobfuscated,xe2x80x9d xe2x80x9cobfuscation,xe2x80x9d or the like are used generally to refer to one of several methods known in the computer programming art for inhibiting the potential for data to be accessed as clear text or in an unadulterated manner. Examples of obfuscation include simple compression, encryption, exclusive-OR calculations, and similar alterations to the data. By compressing data records in the data crystals, the database requires much less storage memory than traditional databases, but it still remains rapidly accessible as required for use with many application programs.
Consistent with the present invention, iterators are created for accessing data records within the data crystals, typically in response to one or more queries of a predefined type. The predefined query types do not include general queries such as those available with general query language applications. As used throughout this specification and the attached claims, xe2x80x9citeratorxe2x80x9d refers to code containing instructions on how to locate, access, extract, or reconstruct the data records, including conducting any necessary obfuscation or de-obfuscation measures. Because of the additional capabilities of iterators, as described herein, the term xe2x80x9citeratorxe2x80x9d is defined to encompass more than is encompassed by use of that same term in the context of a Standard Template Library as used in computer programming. Database customer applications call queries belonging to the predefined query types to instruct the iterator to access the data records in the database. The database customer can be given access to select predefined query types, and the calling of the queries can be done by the customer application itself as part of standard operations. Database customer applications are only allowed to interact with queries. They cannot interact directly with the iterators. This prevents a customer from using the iterators to extract the entire contents of the crystalized database. The predefined query types are typically designed by the database designer to answer specific questions or to solve specific types of problems the database designer anticipates the customer to have. The complexity of the method used by the predefined query types to answer a particular question depends on the complexity of the question being asked or problem being solved. For example, a query can have multiple procedures for procuring information, and the actual procedures implemented can be responsive to the exact information needed. However, because the database designer does not have to structure the database to support customer-definable general queries, the database does not require the vast amounts of memory resources traditional databases commonly require for supporting numerous indexing and pointer structures. The database structure can be determined by the types of queries the customer application will be calling. Because the database only has to support limited types of queries, databases made in accordance with the present invention typically have a more compact data structure than traditional databases, resulting in improved access speed and storage memory requirements.
In a preferred embodiment, the iterator is preprogrammed code that is accessed by the queries and is designed to obfuscate or de-obfuscate data records in the data crystal in order to write or read a data record. In this embodiment, a database customer only has access to use the data in the database but does not have access either directly to view or to copy the data itself. The division of the crystal database into discrete data crystals also facilitates easy updating of the database. Access to new data crystals can be granted, and modified iterators or query types can be added as needed without taking the entire system offline for prolonged periods of time.
Several additional features can be added to the crystal database system to afford greater security. Keys can be implemented in either hardware or software to restrict use of the database to a particular customer or site. Similarly, the database designer can provide the customer with a database containing several different data crystals but only provide a key to access a limited number of the data crystals or a limited number of the predefined queries. Should the database designer wish to provide the customer with access to additional crystals or predefined queries, a new key could be provided to permit the customer the additional access without having to supply an entire new database. Additionally, because the user only has limited and specifically authorized access to the database, by incorporating an expiration date into the key or crystals, the database designer can effectively repossess the database from the user by restricting future access to the data.
An embodiment of this invention also provides convenient methods for both obtaining new data and updating data already in the database. For a database customer who is exposed to a large or continuous input stream of data onsite, the application calling queries of the predefined types can also store new data to a predetermined storage location. Examples of the predetermined storage location include another data crystal, an appendix to an existing data crystal, a clear text file or spreadsheet, another database, or an external site, such as the database designer""s Internet Web site, accessible via a network. The newly gathered information can then be collected by the database designer for addition to the original database and incorporation in future data crystals. The new information also can be analyzed for statistical significance before being added to the original database. Statistical analysis prevents an erred version of a current data record from being added to the database as a new data record, and it can also use several flawed examples to reconstruct the correct version of a data record for inclusion into the database.
Certain aspects of the limited-access database can also incorporate functionality of networks such as LANs, WANs, wireless networks, or the Internet. Data storage and access, authorization key access, or the accumulation of new data can each occur through network links to sites external to the system operating the database. For example, rather than containing data, a data record in a crystal database can contain a URL or hyperlink to retrieve data from a third party Web site.