The proliferation of computer technology has led to it becoming an essential tool in situations where people wish to store and extract information from large pools of data. Marketing and customer intelligence are only two examples of fields which are relying more and more on computer technology for storing data and extracting useful information from it. For example, it is now common practice for marketers to work with very large databases containing details of their existing and prospective customers. In order to tailor their advertising, they need to be able to select for customers fitting into a particular profile. For instance, an exemplary query could be ‘find all clients under 20, who are married and are earning over £40,000 per year’. The database must therefore be able to efficiently perform this type of query.
As marketers frequently need to perform this type of query, performance expectations for speed and efficiency are high, while datasets are becoming increasingly large, making searches slower and more computationally expensive. Traditional databases scan their external, large datastores for the required results and subsequently perform any further processing required there, before they return the final result to the client. However, a database may contain vast amounts of data, making the search time consuming, while large datastores generally have slow access retrieval times.
Accordingly data storing and searching techniques and systems are not sufficiently fast to address the mismatch between increased workload for data storage systems and the need for speed and efficiency in querying databases and retrieving data selections.
Several approaches have been employed to seek to address this problem, with limited success.
One popular approach has been the incorporation of very fast hardware for integrated chip memory to mitigate the use of a hard disc access for each database access as this is one of the most time consuming parts of a search and retrieval process. However, one major disadvantage of this approach is that it comes with a very high cost. Also there are physical limitations to the improvements that can be made in access times using this approach, ultimately limiting the increase in speed that can be achieved.
An alternative approach is to pre-calculate all possible queries and query combinations, so as to eliminate the time required to retrieve the results when a client is performing a search. This technique however does not actually solve the problem of slow data retrieval, but merely conceals it from the user, by transferring it to the provider. There are also further disadvantages in that database updating becomes a more complicated task as all the searches need to be recalculated.
Another approach is the use of columnar databases. As column data is of uniform type, adjacent records are similar, therefore more compressible. The compression permits columnar operations such as MIN, MAX, SUM, COUNT and AVG to be performed very rapidly. Another benefit of columnar databases over row-oriented databases is that they are self-indexing thereby occupying less disc space than a relational database management system containing the same data. Columnar compression can reduce the disk space required further. However, due to the compression, data retrieval is very inefficient, while decompression is also very time consuming, adding an extra step in the process.
The present invention seeks to address at least some of the problems and provide an improved data retrieval method.