There exist numerous systems for storing data entities for subsequent retrieval. Typically, these systems take the form of an electronic database. Electronic databases are available in numerous types, such as flat file or relational. A relational database is typically a collection of data items organized as a set of formally described tables from which data can be accessed or reassembled in different ways without having to reorganize the database tables.
Information systems can be examined at different levels of abstraction, principally as a physical data model and as a logical data model. As an example of a physical data model, a relational database management system (RDBMS) can be implemented physically using, for example, an indexed file capability executing on an operating system. The RDBMS presents a logical model to its user: one consisting of tables with rows and columns, typically supporting SQL queries and amenable to techniques such as data normalization. While relational databases have proven useful, there are significant limitations to this method of storing data.
In particular, the logical data model does not capture semantic meaning. For example, a relational database table might store city names in multiple rows in a first column and respective population values in multiple rows in a second column. The bare fact that two records may be adjacent provides no information about the relationship between the records. Thus, while useful information can be extracted from a relational database, there is additional semantic value that is not represented in these systems. This semantic information is not inherently captured or represented by existing relational databases.
Furthermore, prior art relational database systems are ill-suited to storing information relating to taxonomies and anatomies. Taxonomies and anatomies are methods for describing a system in which an element's relative and absolute location in the system provides information about the element's specific role or function in the whole system. While this type of information from a taxonomy or anatomy could be stored in an additional field in a relational database, the relationship between the database records still would not provide any information about the relationship between the data entities.
In another existing system, OLAP cubes allow for “spinning and slicing” (pivot table-like) manipulation of N-dimensional cubes of data, and the elements of the N dimensions can be organized as a hierarchy. However, OLAP cubes have several limitations. OLAP cubes are neither well suited for handling variable tree depths nor are they designed to navigate to a component within the N-cube space. Rather, OLAP cubes simply facilitate analysis across the several dimensions. OLAP cubes are also not intended to support sparse matrices where, at some levels, in some dimensions the data is simply not there for valid reasons.
While semantic web technology (such as OWL, OPML, RDF etc.) can represent taxonomies and ontologies and is purposely designed to do so, its focus is on knowledge representation of an ad hoc nature and creating relationships between such ad hoc knowledge. This capability is in part due to the semantic web's having been derived significantly from artificial intelligence approaches to knowledge representation and relationships (famously the “IS-A” relationship—[crimson is-a red] [red is-a color] [color is-a physical-attribute] allowing the conclusion that crimson is a physical-attribute). Notably, however, the semantic web does not readily enable navigation via multiple standardized, ordered coordinates. Furthermore, these semantic web approaches do not attempt to implement or take advantage of domain-specific statistically relevant categories in defining these standardized coordinates as an inherent part of their technology.
The inability to derive semantic meaning from data structures becomes particularly acute when prior art systems operate on unstructured data. While some taxonomies and anatomies have a fundamental data structure, other types of data may appear to be completely unstructured.
For example, there exist various economic taxonomies such as the Global Industry Classification Standard (GICS) developed by MSCI and Standard & Poor's (S&P) for use by the global financial community. The GICS structure consists of 10 sectors, 24 industry groups, 68 industries and 154 sub-industries into which S&P has categorized all major public companies. The system is similar to ICB (Industry Classification Benchmark), a classification structure maintained by Dow Jones Indexes and FTSE Group. This taxonomy, however, has certain limitations. In particular, it does not define relationships common to all companies. Consequently, a user cannot compare two companies' GICS classifications to derive relevant meaning unless they share a common ancestor in the taxonomy. While it allows for great granularity and differentiation between companies, it lacks standardized values for comparison, since the rules for being at one level in a group are unrelated to the rules for being at a corresponding level in another. Thus, comparisons across disparate industries are not possible. Furthermore, the ten sector structure can be criticized as an arbitrary identification of ten sectors; there is not necessarily a relationship between one sector and the next. As a result, storing such data in a relational database table could not provide any additional semantic meaning because the original data entities did not have a consistent and uniform relationship among themselves.
Due to these types of limitations and others, no prior art classification system has been based on clearly defined relationships between the constituent elements because there has not been an underlying data model for the attributes on which that system is based. Additionally, because there has been no underlying model for economics—and even if there had been—there would be no suitable data structure for representing the model.