The analysis of complex systems, including the search and navigation of large data sets associated with, for example, biological, chemical, mechanical, physical, political, and economic systems, is challenging without an underlying functional model. Representations of these systems and their constituent subsystems frequently have been derived inductively or in an ad hoc fashion, and often are irreconcilable with one another.
The lack of ontological consistency, expressiveness, and interoperability of these representations inhibits the capacity to characterize phenomena associated with complex systems, to predict their behavior accurately, to develop normative models of their outcomes, and to search complex functional data.
In existing search and navigation systems, search results are displayed as independent categories; any successive iteration utilizes metrics gathered through user-generated search and browsing history as well as ad hoc content categorization systems derived from observation of phenomena.
When data is organized in coordinate format geographically or temporally, it is significantly easier to use for the purposes of query, navigation, and action. Other data management domains lack prior art methods for establishing such an underlying standardized coordinate model. Applications and search tools derived from prior art methods are frequently most effective when applied towards short-term individualized requests with clear geographic and temporal aspects, such as restaurant delivery, current celebrity information, and taxi retrieval services. For substantive issues related to complex systems, such as environmental, economic, and political systems, the lack of a structured syntax for organizing the underlying qualitative information leaves associated search and recommendation systems vulnerable to misinformation and provides little to no mechanism for users to engineer and improve the underlying systems or understand relationships between the parts and the whole.
Current machine learning techniques that seek to model phenomena regarding complex systems frequently suffer from the curse of dimensionality, a result of a nonsystematic approach to generating a representative space in which there is frequently no reason ex ante for any relationships among dimensions and correlations among them are likely to be random.
A machine learning application usually begins by asking a question. As a generalized example, one might want to classify a group of data into a set of groups, such as categorizing the content of a video through descriptive tags, and determining whether a patient has a disease given particular test results or is exposed to a health risk given a set of test results. Often in these classification use cases, the categories to which the applications are predicting do not have any a priori proximity relationship with each other. As a result, when comparing prediction results from such tests, machine learning applications that rely on error-prone classification systems can fall short. In such cases, while one can identify which data were classified incorrectly and the confidence of the prediction, it is impossible to tell the machine how far off the classification was, only that it was wrong. The predictions and evaluations of predictions lack adequate notions of proximity in the outputs and the assessments of those outputs.
The techniques are often highly opaque even to those who design, implement, and use them, often rendering it virtually impossible to audit or verify their results until they have already impacted the underlying system. The techniques also generally require extremely large data sets to attain an acceptable level of accuracy, which tends to decline significantly when phenomena in a test set diverge even modestly from those of a training set, a common challenge when dealing with information about complex systems.