There are many situations in which someone may wish to query a large database containing over one hundred million rows of data in order to extract information based on a number of search criteria. For example, people working in marketing will often work with large databases containing details of all potential customers who they may wish to target with a new offer. In order to tailor the offer to meet the requirements of their target audience, they need to retrieve information from the database for all people who fit into a particular profile. For instance, a typical database search query might be to find all clients aged under 20, who are married and who are earning over £40000 per year. This will then identify all of the appropriate people to the marketer, who can then analyse additional data relating specifically to that group of people in order to tailor their offer.
Given that people such as marketers perform frequent database queries of this nature, it is important that results are obtained quickly. Typically, query responses within one second are required. However, a large database may contain over a billion rows of data, therefore searching the database may be time consuming. There are a number of options available for improving the speed of a query response. Firstly, the user can purchase very fast hardware, to reduce the access times for read operations which is the main contributor to the response time. Such hardware may reside in equipment optimised for faster hard disk read/write operations. However, this option is very expensive, has a limit to the reduction of time which can be achieved, and is ultimately limited to the capability of hardware that is available. A second option is to implement a system which calculates in advance the results of all possible search queries, such that when the user later enters a search query, the result can be obtained from a look-up table. However, this option is complex and inflexible, in that it does not allow for the data within the database to be updated. A further option is to use a column database which has been optimised for data retrieval. In known arrangements which use this option, the use of a cache memory not requiring a disk access to store the results of previous searches is used to improve searching efficiency. Alternatively, where the cache is provided on disk, the time taken to conduct a query against the data of the large database can be reduced by providing the cache on disk. These results can be retrieved quickly from the cache if the user repeats the search at a later stage, thus bypassing the process of querying the database again, which may also include the time costly disk access.
The use of caching is well-known in the art, and a cache memory is employed in a number of applications in the field of computer science. For example, the central processing unit (CPU) of a computer has a block of cache memory, typically RAM, which is used to store information that is likely to be required again in the near future. Similarly, computer hard disks incorporate cache memory for speeding up common data retrieval, as do web-browsers and web-servers.
A cache memory has a finite size; if it did not, it would continue to grow indefinitely, and would eventually become larger than the database with which it is associated. If the cache memory becomes very large, it can be more time consuming to retrieve data from it, and it can also present problems of resource consumption to the system which has limited resources. The size of the cache memory is optimised so as to strike a balance between having enough capacity to store a useful number of results, and being small enough to be searched quickly and not consuming too much of the available resources. Clearly, then, after a while the cache memory will become full, thus preventing any new search results from being added. A common approach to managing this is to simply remove results that haven't been used recently. However, this does not take into account how useful those search results are, and therefore the cache can often lose search query results that are more useful than the new ones which replace them. This means that the set of query results that are retained by in the cache memory is not optimal.
It is desired to overcome or substantially reduce at least some of the above described problems with database searching systems which currently form the state of the art.