The present invention relates generally to the field of data management, and more particularly to analyzing data sets.
A data set is a collection of data where every column of the table represents a particular value and each row corresponds to a given member of the data set. The data set lists values for each of the variables, such as height and weight of an object, for each member of the data set. Commonly, a data set corresponds to the contents of a single database table or a single statistical matrix. The values in a data set may be numbers, such as real numbers or integers (e.g., representing a person's height in centimeters) but may also be nominal data (i.e., not consisting of numerical values), for example, representing a characteristic of a person. More generally, values may be of any of the kinds described as a level of measurement. For each variable, the values are normally all of the same kind. However, there may also be missing values.
Database analytics has seen an emerging emphasis on analyzing massive and complex data sets (i.e., big data). Big data is a term for a collection of data sets so large or complex that processing the collection of data sets becomes difficult when using traditional data processing applications. Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time.