1. Field of the Invention
The present invention relates to computer network-based structured data and, more specifically but not exclusively, to semantic similarity measures for structured data.
2. Description of the Related Art
This section introduces aspects that may help facilitate a better understanding of the invention. Accordingly, the statements of this section are to be read in this light and are not to be understood as admissions about what is prior art or what is not prior art.
Data on a computer-based network, such as the World Wide Web, can be linked using structured meta-data. Structured data enables many applications to be developed and to interact using machine-readable and machine-understandable vocabularies. For instance, in the case of network management, structured data of different equipment can be compared to detect failure and to propose recovery solutions. In a banking context, so-called “structured big data” can represent banking transactions and user profiles that an analysis can turn into assets such as proposing targeted products or advertisements to customers.
In the context of data structured using Semantic Web principles, data is annotated by concepts and properties having been formally defined in an ontology, i.e., defined using logical constructors of a given description logic. The comparison of such semantically enriched structures is usually done by applying one or more similarity measures that attempt to characterize how different structures are similar or how they relate to each other. A plethora of similarity measures applied to ontological data have been designed that rely on different points of view to interpret data descriptions (e.g., based on the main concept that they embody, taking into account all their features, etc.).
The problem with existing similarly measures is that the methodology used to compute similarities may easily lead to poor results when complex semantic descriptions are based on highly expressive description logics. In particular, either the similarity measures ignore most of the semantics (i.e., the logical constructs used to represent concepts and properties mapped on data) or they take such semantics into account too strongly, leading to weak similarity measurements for two concepts that would be considered close from a human point of view.