Digital information is everywhere. TV shows, computer programs, books, text documents, patent applications, statistics, and many more types of information exist in digital forms. Furthermore, more and more digital data is created each day. Accordingly, systems and methods of both storing and providing access to digital data are desirable.
One common technique for storing and/or accessing digital data is through a database. Databases are organized collections of data. The organization of data stored in databases typically allows for faster access than data located in an ordinary “flat” file (or other “unorganized” techniques).
In an unorganized data scenario, a search for a given piece of information may be linearly related to the amount of data that is to be searched. For example, information on the last names of employees in a company may be stored in an unorganized manner. A search for a given last name will typically be linearly related to the number of last names that need to be searched over. In contrast, data stored with a database may be pre-sorted allowing for faster access. One technique accomplishes this using an index.
Indexes typically use a key where the stored keys are pre-sorted to allow for faster retrieval of the keys. Thus, in the above example, an index created against the last names may cause the last names to be pre-sorted in an alpha-numeric manner. Thus, a search for a given name (e.g., “Jones”) may be carried more efficiently because the location of “Jones” relative to the other names may be more quickly determined. For example, “Jones” will not appear after the name “Rich” in the index. As such, the records after “Rich” are not considered during the search and the cost of looking for “Jones” is reduced. This generally leads to a search efficiency that is better than the linear search efficiency of unorganized data.
One downside of indexes is that they are typically updated every time a new piece of data related to the index is inserted. An update of an index may require a re-sorting of the index. This can be an expensive process. Further, as the quantity of data being indexed increases the cost of maintaining the index may become more and more expensive in terms of both storage and computational costs.
While the cost of searching in databases increases less than unorganized collections of data, it still may increase. As noted above, the amount of data being created and stored is growing rapidly. Further, it is often desired to maintain digital data that has already been created (e.g., “old” data). This combination of requirements can lead to rapidly increasing quantities of data being stored in a database (and presumably accessible in some manner).
The growth of stored data is typically faster than increases in computing power. For example, performance of a database may be increased by storing more data in RAM instead of non-volatile storage (e.g., a hard-drive) to allow for faster response and access times. However, continually storing all data in RAM may become prohibitively expensive as RAM is traditionally more expensive and able to hold less information than other types of non-volatile storage. Additionally, the performance of databases may be improved by upgrading, for example, the computer processor that is used to a newer, faster model. However, these solutions can be outstripped by the ever increasing amounts of stored data.
The performance problems (e.g., index rebuilds and/or access times) associated with storing data may be relevant to very large systems holding vast amounts of data (e.g., the Library of Congress) to smaller systems holding more personalized data (e.g., a contact list or list of emails on a smart phone).
Accordingly, it would be desirable to develop systems and/or methods that improve the performance, efficiency, and the like of database systems.
In certain example embodiments, a partitioned index is provided. The partitioned index for use with a database provides (or recreates) the functionality of a traditional data table (in some case all of the functionality), while at the same time encompassing (or integrating) indexing and search methods.
Certain example embodiments may be adapted to partition data tables into equally sized smaller units, each having capabilities and functionality generally associated with a data table.
Certain example embodiments may be adapted to facilitate constant-time performance for index operations on new records added to, edited in or deleted from a data table over the entire lifecycle of the system.
Certain example embodiments may be adapted to facilitate increased search performance over traditional database models by allowing parallel concurrent execution of search methods across multiple partitions.
Certain example embodiments may be adapted to manage memory dynamically and efficiently by using a cache system that stores the most recently used (or most likely to be used) text data in a restricted-memory structure that avoids runaway memory usage, yet may still provide good performance.
Certain example embodiments may be adapted to enhance system resource flexibility by making partitions hardware-independent. In other words, different hardware units may host different partitions of the same data table.
Certain example embodiments may be adapted to allow users a flexible system memory usage model with customization options for the allocated memory per partition and the number of partitions simultaneously loaded into system memory.
Furthermore, in certain example embodiments a memory management technique may include a failsafe mechanism that dynamically decreases the memory footprint of each partition when system memory is nearly exhausted. Further, this memory management technique may be implemented with the partitioning techniques according to certain example embodiments.
In certain example embodiments, a computer implemented method for managing a database that is configured for use with a processing system is provided. A structured organization of data is maintained over a plurality of partitions within the database where each one of the plurality of partitions has a size limit. New data is inserted into the structured organization of data. A new partition is automatically added when the inserted new data results in a size of the one of the plurality of partitions meeting or exceeding the respective size limit. The data within each one of the plurality of partitions is indexed.
In certain example embodiments, a database-management system for managing a database is provided. The system includes a processing system that is configured to maintain a structured organization of data over a plurality of partitions within the database where each one of the plurality of partitions has a size limit. The processing system is further configured to insert new data into the structured organization of data. A new partition is automatically added when the inserted new data results in a size of the one of the plurality of partitions meeting or exceeding the respective size limit. The processing system is configured to maintain an index on each of the partitions for the data located on the associated partition.
In certain example embodiments, a non-transitory computer readable storage medium is provided for performing a string search against a database system storing string data over a plurality of partitions. Each one of the partitions includes: 1) a string file configured to include plurality of strings, and a table pointing to each one of the strings within the respective string file, 2) a plurality of index files, each one of the plurality of index files associated with a word or words within the string file. In certain example embodiments, each one of the plurality of index files stores a reference to instances of the associated word within the partition. The stored instructions are configured to execute a search, in parallel over the multiple partitions, that is associated at least one word that is within the string file. The stored instructions are configured to locate at least one of the plurality of index files related to the at least one word and identify at least one record index and at least one position index within the at least one of the plurality of index files. The reference information may be read and based on the reference information (e.g., the position and record index) the string/word that is being searched for may be retrieved from the correct string file in certain example embodiments.