Advancements in networking and computing technologies have enabled transformation of computers from low performance/high cost devices capable of performing basic word processing and executing basic mathematical computations to high performance/low cost machines capable of a myriad of disparate functions. For example, a consumer level computing device can be employed to aid a user in paying bills, tracking expenses, communicating nearly instantaneously with friends or family across large distances by way of email or instant messaging, obtaining information from networked data repositories, and numerous other functions/activities. Computers and peripherals associated therewith have thus become a staple in modern society, utilized for both personal and business activities.
Additionally, electronic storage mechanisms have enabled massive amounts of data to be accumulated by individuals and/or companies. For instance, data that previously required volumes of books for recordation can now be stored electronically without expense of printing paper and with a fraction of physical space needed for storage of paper. In one particular example, deeds and mortgages that were previously recorded in paper volumes can now be stored electronically. Moreover, advances in sensors and other electronic mechanisms now allow massive amounts of data to be collected and stored. For instance, GPS systems can determine location of an individual or entity by way of satellites and GPS receivers, and electronic storage devices connected thereto can then be employed to retain locations associated with such systems. Various other sensors and data collection devices can also be utilized for obtainment and storage of data.
Some business models rely heavily on their ability to process extremely large amounts of data. For instance, a search engine can collect a significant amount of data relating to millions of users, such as age, demographic information, and the like. In another example, a database that tracks alterations in the stock market can be associated with a tremendous amount of data, particularly if such tracking is done in a granular manner. If one desires to retrieve a particular entry or multiple entries from this collection of data, they can generate a query in a particular database query language, and data is organized and extracted from the database according to the query.
When there is a small amount of data, such as within a spreadsheet application, this data processing can be undertaken quite quickly. When an amount of data becomes quite large, however (e.g., multiple terabytes), processing such data can be computationally expensive and require a great deal of time. One conventional manner for reducing processing time relates to selecting a sample set of the data and performing processing on such sample set, wherein a size of the sample set can be dependent upon an amount of time necessary to process such sample set. While this reduces processing time, accuracy will be compromised, particularly in data mining applications. Another available approach is to reduce functionality and thereby lower computing resources necessary to process large amounts of data.