Databases are computerized information storage and retrieval systems. A Relational Database Management System (RDBMS) is a database management system (DBMS) which uses relational techniques for storing and retrieving data. Relational databases are organized into physical tables which consist of rows and columns of data. The rows are formally called tuples. A database will typically have many physical tables and each physical table will typically have multiple tuples and multiple columns. The physical tables are typically stored on random access storage devices (RASD) such as magnetic or optical disk drives for semi-permanent storage. Additionally, logical tables or “views” can be generated based on the physical tables and provide a particular way of looking at the database. A view arranges rows in some order, without affecting the physical organization of the database.
RDBMS software using a Structured Query Language (SQL) interface is well known in the art. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American National Standards Institute (ANSI) and the International Standards Organization (ISO).
The SQL interface allows users to formulate relational operations on the tables either interactively, in batch files, or embedded in host languages, such as C and COBOL. SQL allows the user to manipulate the data. The definitions for SQL provide that a RDBMS should respond to a particular query with a particular set of data given a specified database content, but the method that the RDBMS uses to actually find the required information in the tables on the disk drives is left up to the RDBMS. Typically, there will be more than one method that can be used by the RDBMS to access the required data. The RDBMS will optimize the method used to find the data requested in a query in order to minimize the computer time used and, therefore, the cost of performing the query.
One way to optimize retrieval of data is to use an index. An index is an ordered set of references to the records or rows in a database file or table. The index is used to access each record in the file using a key (i.e., one of the fields of the record or attributes of the row). When data is to be retrieved, an index is used to locate records. Then, the data is sorted into a user-specified order and returned to the user. Additionally, if a join operation is involved, the join operation is performed prior to retrieving data and sorting. Although conventional indexes are useful, they are only useful in locating data. Next, the data must be retrieved from a data store (e.g., a database or file system).
It is typically very time consuming to retrieve data. The amount of time required to access data stored within databases and/or file systems is adversely affected by I/O (i.e., input/output) sub-system performance and cache designs. A cache is a high speed data storage mechanism that may be implemented as a portion of memory in a computer. Data that may be used more than once may be retrieved from a data store and stored in a cache for easy and quick access. Current cache designs do not guarantee that desired data will be present in memory when needed. When desired data is not in a cache, additional time is required to retrieve data from I/O sub-systems. This causes delays and fluctuations in access times needed to retrieve desired data.
FIG. 1 is a diagram illustrating a basic data store design using a non-persistent cache area. A user submits search requests 100, which are forwarded to a search engine 102. A search request 100 is, for example, a SQL query. The search engine 102 attempts to locate the data in the relational non-persistent cache 104. The term “non-persistent” indicates that the cache 104 in this example is non-persistent (i.e., the data is stored temporarily). If the search engine 102 locates the data in the relational non-persistent cache 104, the search engine 102 retrieves the data and returns search results 112. If the search engine 102 does not locate the data in the relational non-persistent cache 104, the search engine 102 uses the relational index 106 to retrieve relational data 110 from a data store and return search results 112. Therefore, some of the search results 112 may be provided from relational non-persistent cache memory, but this is not guaranteed. The relational non-persistent cache 104 is limited in size. Also, the relational index 106 and relational data 110 are located on magnetic media, and so I/O resources are needed to access this data. The relational data 110 may be in the form of a file.
There are many disadvantages to using this technique. In particular, the relational non-persistent cache typically contains data that has been received in response to prior requests for data. In many cases, users submit requests for data that was not recently received. In these cases, the data is retrieved from the basic data store. Accessing data from this basic data store typically uses up system resources for I/O. This conventional system has performance, capacity and cost issues as data stores and user load increase in size.
Thus, there is a need in the art for an improved technique of storing, updating, locating, and retrieving data.