Field of the Disclosure
The present embodiments define a means to compute using a numerical distance field that defines concepts with respect to attributes or description in order to provide a relativistic conceptual distance measurement system. This relativistic distance-field measurement between attribute or feature data to represent clusters that represent concepts is used to induce an ontology as a directed graph of a network of concepts by adding to each point of the distance field a vector-valued potential function. Methods of quantum computing processing are ideally suited in working with the present embodiments as all representations can be made in a Hilbert space for other operations based on the relativistic concept measuring system.
The relativistic conceptual distance measurement system is applied for data clustering and concept similarity measures and for reasoning using the distance field model in inducing an ontology from raw data about concepts. The present embodiments for the encoding of semantic information in numerical distance fields leads naturally to a field-structured representation of conceptual semantic knowledge where the discrete algebra describing the semantic entities is directly related to the continuous algebra of (vector) fields. The method maps the discreet definitional attributes from the ontology onto a continuous distance field in multidimensional space: this permits search functionality by simply writing search equations, as well as to use algorithms such as “beam sweep” to identify non-obvious though related concepts within a range of salience in a space of concept distance fields.
An ontology can be simply a list of sets concepts, arranged from left to right with the leftmost concept being the most General and the rightmost being the most specific concept. Every ontology has a Top and a Bottom. Therefore, a very simple ontology appears as: [Top, Transportation, Vehicle, Car, (Honda, Ford, BMW, Chevrolet), Bottom]
In this ontology, the set (Honda, Ford, BMW, Chevrolet) is the subset of “Car” and “Car” is a subset of Vehicle, which is a subset of Transportation. The Top level of an ontology subsumes everything (i.e. it is the set that contains every subset in the universe exclusive of itself). The Bottom is the set that contains no subset and being empty also contains itself.
Furthermore, any data may be clustered and ranked using the numerical distance computation of the present invention for associating semantic distances between data portioned as concepts using an ontology of metadata to provide similarity, search and reasoning processes.
Description of the Related Art
There is no related art in quantum computing literature for addressing how to represent concepts or ontologies in vector-valued or affine distance fields to provide a relativistic concept measurement system. In contrast, the system and method of the present embodiments can be used for reasoning or inducing ontologies from the distance-field representation. Superposition of multiple distance fields and representing relationships between different conceptual contexts is seamlessly handled by the present embodiments as is described below.
Ontologies provide the relationships between concepts and computing distance measures enable better concept clustering when processing data. Sometimes it is difficult to identify the ontology from raw data and other times it is hard to use ontologies to cluster or reason with data. For example, measures of semantic similarity based on WordNet (an ontology of the English language from Princeton) have been widely used in Natural Language Processing. The measures rely on the hierarchical structure of WordNet to produce a numeric score that represents how two concepts (represented by a sense or synset) are similar (or not). In their simplest form these measures use path length to identify concepts that are physically close to each other and therefore considered to be more similar than concepts that are further apart. All the measures in the literature generally rely to varying degrees on the idea of a least common subsumer (LCS); this is the most specific concept that is a shared ancestor of the two concepts. None of the measures are nonlinear and relativistic. For example, all the measurement approaches in the literature have measures from any concept in the hierarchy to the Top as being greater than and not equal to zero.
A good distance measure greatly improves clustering and data-mining processes, reduces false positives and false negatives. A good ontology provides precision, accuracy and coverage of data in the broadest to most specific and detailed levels in the form of a partitioning of the attributes and features or primitive concepts that compose other more complex or hierarchical concepts.
The “background” description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent it is described in this background section, as well as aspects of the description which may not otherwise qualify as prior art at the time of filing, are neither expressly or impliedly admitted as prior art against the present invention.