Column-oriented data structures serving as a basis for column-stores have recently matured into a powerful component of today's enterprise applications. Column-stores can be found as standalone in-main memory database systems (e.g. the SAP HANA platform, available from SAP AG of Walldorf, Germany), or alternatively, integrated into other types of systems, including for example business software systems such as enterprise resource planning (ERP) systems or the like. Column store based data management architectures have proven to be superior to traditional row-stores in terms of performance, in particular for online analytical processing (OLAP) analytical workloads that are common in data warehousing and business intelligence (BI) applications. One reason for this improved performance relative to previously available approaches is the ability to efficiently process (compressed) column-oriented, in-main memory data structures through hardware-optimized scans. Several recent developments have added to improvements in performance and functionality, such as transactional environments or compression techniques.
For traditional warehouse and BI applications, the primary focus in terms of information needs has been on transactional data. Such data includes information about products, manufacturers, suppliers, customers, sales and shipment transactions, and the like. While storage and query processing techniques have been highly optimized for analytical workloads operating on such data, the aspect that the majority of the data also have some geographic component has mostly been neglected. Although information about point-of-sales or customer records typically contains address information, characteristics of such data, including tailored functionality such as spatial and topological predicates, have received little attention. A typical current approach is to geocode address information and to manage respective latitude and longitude information for addresses in extra fields. This approach is also applied in conventional column-oriented data management architectures in support of BI applications. With such arrangements, geographic coordinates corresponding to addresses are managed in standard columns that simply contain floating-point numbers. In other words, geographic data is not handled in a native way but managed and queried using techniques that are employed for traditional numeric and textual data.
An intuitive approach to add spatial features to column stores is to employ some of the proven spatial index structures that are used in relational database management systems (DBMS). Viable candidates for such an approach are R-Tree variants, K-d Trees, or Quadtrees, among many other (specialized) spatial index structures. However, these index structures are mainly targeted towards efficient access to secondary storage structures and therefore focus on block-optimized read and write operations. For column-organized data managed in an in-memory database, however, such tree-based index structures are not appropriate. There are several reasons for this limitation. First, column stores gain their performance through optimized scans of vector data that is not chunked into blocks. Second, tree-based index structures like the ones mentioned above incur overhead in space and time complexity because an index itself needs to be managed. In particular, nodes in these tree structures are linked in support of search and update procedures. Thus, they do not provide for continuous scans of in main-memory structures but require following link structures.