This description relates to data sets.
A table of a typical relational database, for example, represents a dataset of records. Each record has data values in fields that have been defined for the table. Each field can have at most one value for the attribute represented by the field. The table has a unique key that distinguishes the records from one another unambiguously. The relationships of the tables of the database are normally defined in advance and all of the data and the tables are represented in a commonly shared native format. In addition to performing transactions in the database, a user typically can view the records of each table, and combinations of data contained in related tables through an interface provided by a database application.
Sometimes, related data of an enterprise are not held in a predefined well-disciplined database but are generated as separate files, data sets, or data streams that may have different unrelated formats. Although the data in each of these sources may be construed as records, the delimitation of the records into fields, for example, may not be defined within the sources. Sometimes the data in different sources, though related, may be inconsistent or repetitive.
U.S. Pat. No. 7,512,610, issued Mar. 31, 2009, owned by the same company as this patent application, and incorporated here by reference in its entirety, described a way to process a source file, data stream, or data set, to make its data easily accessible and viewable as records that can be manipulated and analyzed by a user.