Unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Embodiments relate to data handling for analysis purposes, and in particular, to methods and apparatuses automatically enriching a data set with data type via an inference. Specific embodiments automatically assign a data type to data stored in a database, utilizing one or more inferences applied alone or in combination.
Databases and overlying applications referencing data stored therein, offer a powerful way of analyzing large volumes of data that are related in various ways. In particular, discrete values of stored data may be organized into larger data structures comprising related fields.
Such larger data structures may also be referred to as data objects, and may be represented in the form of tables having rows and columns. Through the skillful and intuitive presentation of such data structures in the form of tables and charts, a user can describe complex issues and the factual/forecast data underlying those issues.
A data type represents data corresponding to a common category. For example, in a data structure comprising a United States address, a data type may comprise the zip code field comprising five (or nine) digits.
In certain cases raw data received for handling by an information storage system may not include any indication of data type. For analytical purposes, however, it can be desirable to classify that raw data as a particular data type in order to facilitate its inclusion within larger data structures.
Such enrichment of a data set with data type may be valuable to inform the user regarding the nature of additional data values expected to be received. Furthermore, enriching data sets with data types can allow deduction of operations to be applied on different portions of the data. For example, an aggregation operation performed on a “zip code” field of a “U.S. Address” data object, would be expected to return groupings of data useful in geographic analysis.
Data types can be manually assigned by a user. Such manual approaches, however, may be time-consuming, as they require user insight to recognize complex/subtle relationships between data fields, and then to implement those relationships in abstract underlying data structures.