This invention relates generally to database management and, more particularly, relates to an improved method for creating and searching data files in a database.
In the prior art, commercial databases are well known. Commercial databases are general tools created for a wide variety of activities such as, for example, performing arbitrary complex queries on data, on-line transaction updates, on-line information retrieval, batch report generation, database schema creation, database schema modification, table indexing, and data integrity. Typically, all of these activities occur for multiple simultaneous users.
The performance of the above described activities in currently implemented databases typically rely on the following philosophies:
(a) Information stored in a database is often times stored in human-readable format;
(b) To minimize disk space, key data is only stored once;
(c) The database is designed for an overall acceptable performance level for all activities, e.g., queries, updates, adds and deletes; and
(d) Index tables into the database rely on the entire content of a key field.
Unfortunately, the use of these philosophies in database management does have serious drawbacks. For example, formatting data in human-readable form increases the time it takes a computer to perform a search. Furthermore, when a commercial database is updating, modifying, or deleting records, it has considerable overhead in maintaining up-to-date indexes of table integrity. Still further, the search of key fields is performed linearly requiring that an excess number of records be examined.
Additional problems have also been recognized in commercially known databases. In this regard, most commercially known databases are large and cumbersome. Furthermore, enterprise licenses and system administrators for such commercially known databases are expensive. In addition, while these databases are powerful, they cannot always be tuned or modified to produce the best overall performance for what otherwise is a simple task that needs to be performed over and over. Many of these systems also utilize data files that are organized in a manner that makes them difficult to transform to other platforms.
From the foregoing, it is evident that a need exists for an improved system and method for storing and retrieving data in a database.
In accordance with these needs, an improved system and method for storing and retrieving data in a database is provided. The described system and method is implemented in a manner that takes advantage of knowing ahead of time what the operating constraints and context are going to be like, what the nature of the data is like, and what the needs are of the customer utilizing the system. More particularly, the system utilizes the following philosophies:
(a) Data is stored in a bitmask form to provide more efficient searching, disk reading, and memory usage (bitmask content is preferably converted back to human readable formats at the very last moment for use by customers);
(b) Data files are created with a very large degree of redundancy to improve access times (since disk accesses are much slower than memory accesses, the system takes advantage of the fact that it is often times much easier and faster to discard a small amount of extra, incorrect information obtained in one disk access than to use numerous disk accesses to find an exact result);
(c) To minimize system overhead, the data is read-only such that there are no updates to prefix files once they are copied to disk (updates of new information would be available at given intervals, such as monthly, and would replace old prefix files); and
(d) Instead of creating an index table telling the searching program where to look, a prefix file is utilized which allows the search engine to find the possible domain of matching records in a minimal number of disk accesses.
The system also preferably utilizes data files that are simple flat-files which allows the system to be exceedingly portable across different platforms.
As will be appreciated from the detailed description that follows, the described database chooses the best balance of system resources like memory, CPU speed, and disk space to create a searching facility that responds quickly and helps customers find products they need when a keyword is supplied. Furthermore, the improved database minimizes certain undesirable situations that can occur in commercial databases. These situations include, for example, timing out (when response times are too long, the search engine may automatically cease assuming that an error occurred in the search), displaying and stopping the search after the first N matching records are found which prevents the customer from seeing what parts of the implied data hierarchy contained matching records, and indicating to the customer that their search is too vague or ambiguous when too many matching records are found.