These days much data is generated and stored in digital form. Since the 1980s the world's capacity to digitally store information has increased by over twenty percent per year. In 2012 every day 2.5 exabytes (2.5×1018) of data were created. Some parts of this data is publicly available, other parts are in company data.
The term ‘big data’ is often used in this connection for a collection of data so large and complex that it becomes difficult to process using on hand database management tools or traditional data processing applications.
Much of this data is stored in large databases, sometimes referred to as data warehouses. Such databases can store millions or even billions of records. Each record can be associated with thousands of items of data.
There is a general need to be able to query databases to uncover records that correspond to a predetermined content, e.g. to determine which records contain certain items of data. However, with the explosive growth of the number of data records it becomes increasingly difficult to determine queries that properly yield records that provide the desired information. It will be clear that a query yielding a large number of records still leaves the user in doubt as to which record are more or less relevant.
Therefore, there is a specific need to efficiently and intuitively query databases. It is also of great importance to be able to perform queries in real time, i.e. with minimal delay. Delay times are often seen as a severe hindrance in querying, and may dissuade people from continued querying of a database. In other words, people simply give up and stop querying if delay times are perceived as annoying. In present times of relatively fast computing, delay times of as little as a few tenths of a second can already be perceived as prohibitively annoying.