1. Field of the Invention
The invention relates to the field of data management, and particularly to indexing or sorting data by key fields to reduce search time requirements.
2. Description of Related Art
For discussion purposes, large datasets typically include billions of unique records which correspond to terabytes of raw data. The ability to search large datasets is a complex task, and tile ability to efficiently search key fields in such large datasets is extremely challenging.
Traditional methods of utilizing a general-purpose database are impractical due to both cost and complexity of required resources. Such costs can include both infrastructure and personnel costs.
An exhaustive search of such large data sets, for example, is easy to implement. Such searches, however, are inefficient in performance since search time is proportional to the size of the dataset. For example, a direct search of 20 billion records may take as many as five or more days on a single multiprocessor computer system using current computer technology.
Efficient searches that reduce the amount of time necessary to search such large databases are dependent on high cost solutions. Indexing algorithms, for example, have been introduced to try to reduce the time necessary to perform searches on datasets. Traditional indexing algorithms, although efficient for searching, are impractical to implement on large active datasets due to the quantity of the records introduced on a daily basis. The volumes of records, in addition to virtually unlimited key field values, are the main limiting factors to this approach. As all computing resources are finite, and as operating systems have a finite number of resources, file descriptors, I/O bandwidth, memory, and storage available, the traditional indexing algorithms simply break down when the data is generated faster than the data can be indexed on a single multiprocessor computer.
Unfortunately, splitting up the work among a set of servers, although providing a methodology of overcoming at least some of the limitations of traditional indexing algorithms, is extremely complex, and increases the overall cost, both in personnel, and in hardware resources. Further, implementation of a distributed database is also not a realistic, or at least not a preferred option, again, due to infrastructure and personnel cost, as well as limited products/solutions.
Accordingly, recognized is the need for a system, program product, and methods that can allow sorting or otherwise indexing data records in a dataset by key field value into a finite number of containers, effectively reducing the amount of time necessary to search the dataset by a factor of the number of containers, while maintaining placement of the records in linear time, that can also negate a requirement to distribute either the indexing or the searching across multiple physical systems.