The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also correspond to implementations of the claimed inventions.
The technology disclosed relates to automatic generation of tuples from a record set for outlier analysis. Applying this new technology, users need not specify which 1-tuples to combine into n-tuples. The tuples are generated from structured records or objects organized into features (that also could be properties, fields, objects or attributes.) Tuples are generated from combinations of feature values in the records. Thresholding is applied to manage the number of tuples generated.
Big data systems now analyze large data sets in interesting ways. However, many times systems that implement big data approaches are heavily dependent on the expertise of the engineer who has considered the data set and its expected structure. The larger the number of features of a data set, sometimes called fields or attributes of a record, the more possibilities there are for analyzing combinations of features and feature values.
Accordingly, an opportunity arises to automatically analyze large data sets quickly and effectively. There are many ways in which automatically spotting outliers in data relationships can be used to discover patterns and trends. Patterns and trends sometimes indicate fraud, as in insurance reimbursement claims, and other times indicate commercially valuable trends.