The present invention relates to storage of data in databases, and in particular, to processing a database query.
Unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
A database is an electronic filing system that stores data in a structured way. The primary storage structure in a database is a table. A database may contain multiple tables and each table may hold information of a specific type. Database tables store and organize data in horizontal rows and vertical columns. Rows typically correspond to real-world entities or relationships that represent individual records in a table. Columns may denote specific attributes of those entities or relationships, such as “name,” “address” or “phone number.” For example, Company X may have a database containing a “customer” table listing the names, addresses and phone numbers of its customers. Each row may represent a single customer and the columns may represent each customer's name, address and phone number.
Databases are generally stored in computer memory that is one-dimensional. Two-dimensional database tables must therefore be mapped onto a one-dimensional data structure to be stored within a database. One mapping approach involves storing a table in a database row-by-row (i.e., a row-oriented storage model). This approach keeps information about a single entity together. For example, row-by-row storage may store all information about a first customer first, then all information about a second customer and so on. Alternatively, a table may be stored in a database column-by-column (i.e., a column-oriented storage model). This approach keeps like attributes of different entities together. For example, column-by-column storage may store all customer names first, then all customer addresses and so on.
Data must generally be accessed from a table in the same manner that it was stored. That is, conventional computer storage techniques require dedicated query operators that can access specific types of storage models. For example, row query operators are used to process data stored in a database in row-formatted storage models and column query operators are used to process data stored in column-formatted storage models. Choosing which storage model to use thus often depends on how data will be used. Row-oriented storage models are commonly well-suited for transactional queries, while column-oriented storage models are generally well-suited for analytical queries. Accordingly, conventional query processing schemes are tightly bound to the underlying storage model of the database being queried.
In reality, however, a database having certain data stored in a column-formatted storage model may be asked to handle a transactional query relating to that data or a database having certain data stored in a row-formatted storage model may be asked to handle an analytical query relating to that data. For example, a database having data stored in a row-formatted storage model may receive a mixed set of queries requiring transactional and analytical processing of that data.
In responding to such a mixed set of queries, a query engine may seek to perform a mixed join operation. U.S. patent application Ser. No. 12/982,673 entitled “Processing Database Queries Using Format Conversion” was filed Dec. 30, 2010 and is hereby incorporated by reference in its entirety for all purposes. That patent application describes performing a mixed join indirectly in a conversion-based way. According to certain embodiments of this approach, row table data is converted into column format and then the join is performed in the column engine, or column table data is converted into row format and then the join is performed in the row engine.
However, conversion overhead for mixed join is usually not trivial from the perspective of performance and memory consumption. Therefore, it may not be desirable to use conversion-based mixed join queries in situations involving performance critical workload. Accordingly, the present disclosure addresses this and other issues with systems and methods for implementing a conversion-free native mixed join function.