In general, a database is an organized collection of data. A relational database, conceptually, can be organized as one or more tables, where a table is a two-dimensional structure with data values organized in rows and columns A row of a table contains the data values for one record of the table. A column of the table contains the data values of one field of the table across multiple records (rows) of the table. A database management system (“DBMS”) mediates interactions between a database, users and applications in order to organize, create, update, capture, analyze and otherwise manage the data in the database.
Some DBMSs implement column-oriented storage of data in a database. A database that uses column-oriented storage is a column-store database. A column-store database can include one or more tables. In a column-store database, a table of data is partitioned into separate columns, and the values of each column are stored contiguously in storage or memory. The columns of a table typically have the same length (number of records, or rows). The columns are independent, in that a column does not necessarily have to be written directly after the column that precedes it in the table. Column-oriented storage is efficient when aggregating values in a single column. Column-oriented storage also facilitates compression. On the other hand, inserting a new record, selecting a whole record, or processing data values on a record-after-record basis in a column-store database involves writing or reading values in multiple columns, which can be inefficient.
In many scenarios, data stored in a database can be accessed using a client application. For example, a client application transforms user input from a data analyst into queries issued to the DBMS managing the database. Typically, a database query is written in a database query language such as a structured query language (“SQL”). SQL is a special-purpose language designed for manipulation of data managed by a DBMS. SQL is an example of a declarative language. (In general, a declarative language specifies what a computer program should accomplish, without specifying how to accomplish it as a sequence of steps or actions. In contrast, an imperative language specifies a computer program as statements that change the program's state, using an explicit sequence of steps or actions.) A DBMS receives a database query from a client application, processes the database query and returns results of the database query to the client application. When it processes the database query, the DBMS can generate an intermediate representation of the database query, which specifies operations for retrieval and transformation of data responsive to the database query, then execute the intermediate representation of the database query.
For a column-store database, a database query can define an operation (a so-called linear operation) performed on one or more columns of data, on a column-after-column basis. A database query can also define an operation (a so-called non-linear operation) performed on one or more rows of data, on a row-after-row basis. Linear operations and non-linear operations can easily be specified in SQL. When a DBMS processes a database query, performing linear operations specified in SQL is typically fast and efficient, but performing non-linear operations specified in SQL can be slow and inefficient.