Technology advancements and cost reductions over time have enabled computers to become commonplace in society. Enterprises employ computers to collect and analyze data. For instance, computers can be employed to capture data about business customers that can be utilized to track sales and/or customer demographics. Further yet, individuals also interact with a plurality of non-enterprise computing devices including home computers, laptops and mobile devices. As a consequence of computer ubiquity, an enormous quantity of digital data is generated daily by both enterprises and individuals.
Large quantities of such data are housed in one or more databases and/or data warehouses. A database is a collection of data or facts organized in a systematic manner and persisted to a storage device. Similarly, a data warehouse is a much larger repository composed of a plurality of databases. In one instance, businesses can store customer information (e.g., name, address, product(s) purchased, date, location . . . ) to one or more data databases. For example, a transactional database can capture current data and aged data can be pushed to a warehouse. In another instance, entity and/or individual web pages can be housed in one or more databases.
Various components and/or systems are associated with respective stores to facilitate interaction with database data. For example, database management systems (DBMS) and warehouse management systems (WMS) provide functionality to manage requests or queries from users and/or programs, amongst other things. Upon receipt of a query, results that satisfy the query are returned. In this manner, users need not be knowledgeable as to how and where data is physical stored. Rather, programmers implement and expose an interface to users, which hides or abstracts details.
While such functionality is convenient for users, back-end query processing is difficult to implement efficiently especially over large data sets. Processing large quantities of data for data mining or analysis, for example, is problematic at least because the data set size is not conducive to simple sequential processing, as this results in unacceptable latency with respect to response time. Consequently, techniques need to be developed and employed to facilitate parallel and/or distributed processing. While problematic in its own right, this task is complicated by the fact that programmers conventionally need to understand much about the low-level details necessary for parallel and distributed execution.