1. Technical Field
The embodiments herein generally relate to data mining and particularly relates to extracting and resolving relationships from a large collection of data. The embodiments herein more particularly relates to a method and system for building a relationship hierarchy from the extracted relationships.
2. Description of the Related Art
The big data is a large collection of information which derives its data content from a plurality of structured, unstructured and semi-structured data sources. The big data calls for paradigm shift in the way the data is looked at in the past. Data cannot anymore reside in pockets and cannot talk to each other. It is imperative that data should be processed by considering all of it as one. Recognizing the entities and the relationships the entities share is a first step toward understanding big data.
An entity is an unit of data which has an independent existence and is also referred as an object that makes an independent sense. A relationship describes an association between two or more entities. The relationship between two or more entities helps in understanding the characteristics and behavior of the entities. In a database context, the entities and relationships help in structurally storing the contents of data. While relationships are rather explicit in structured data, it is much implicit in unstructured data.
Identifying relationships in text data itself is a fairly complex problem and requires multi stage resolution techniques to come-up with fairly semantically resolved relationships. These resolutions can be at the grammar level such as resolution of anaphora or even at a contextual level specific to domain. However, these relationships need to be further resolved into generic groups of similar relationships and a hierarchy among relationships must be established in order to bring out the implied inferences while querying.
The existing technology discusses relationship extraction by either Natural language processing (NLP) techniques or by Machine Learning (ML). Both of the NLP and ML techniques are not self-learning and require human intervention. The ML models require preparation of tagged data for training. Ideally separate models need to be built for identifying each type of relationships. This necessitates that the number and types of relationships are first zeroed-in and work within that limited space. The open information extraction models on the other hand find infinite number of relationships where different segments of the sentences are parsed and the involved entities are identified along with suggesting relationship between them. However, this leads to many forms of the same relationships and so in-numerous unresolved relationships.
Hence, there is a need for a system and method for building a relationship hierarchy which provide accurate response to a query from a big data. Further, there is a need for a method and system for relationship extraction and resolution without any human intervention. Moreover there is a need for a method and system for resolving relationships using NLP techniques.
The above mentioned shortcomings, disadvantages and problems are addressed herein and which will be understood by reading and studying the following specification.