1. Field of the Disclosure
The present disclosure relates to the field of text analytics. More particularly, the present disclosure relates to a method and system for natural language processing and monitoring within groups of unstructured data, i.e. text data.
2. Description of the Related Art
According to the Butler Group, approximately 85% of all information is text data stored in an “unstructured” format. Further, according to the Gartner Group, over 7 million web pages having additional text data are added every day. In addition, other steams of data such as social networking sites, e.g., Facebook and Twitter, offer an ever-increasing amount of unstructured text data each day.
Analyzing this unstructured data is of particular interest to the field of text analytics. The unstructured data provides unparalleled data sources for analysis such as feedback on a particular product or service, an area of interest for a particular community, or predicting trends according to opinion sentiment representative of a particular community.
Conventional techniques to analyze data are directed at analysis of data within a record, e.g., a file. Such techniques identify a subject of interest and also related key terms, e.g., verbs or adjectives, to predict and provide an overall sentiment, e.g., positive or negative, of the subject of interest.
Additional techniques offer a simplistic approach and require user input to create a Boolean search, i.e., search for and count presence of “X AND Y NOT Z”.
All of the above-mentioned techniques suffer from a narrowly defined set of rules. That is, each of the above-mentioned techniques search data within a single set of data, e.g., a comment, a sentence, a paragraph, or a document (referred to as file or record), but, ultimately, each technique fails to analyze the context of the data such as relative context provided by comparison amongst different groups of records.
In addition, the above-mentioned technique of a Boolean search requires a user to input a term or terms of interest. In instances of a new, or an un-encountered slang expression, the user will overlook and fail to analyze the significance of slang term occurrence. Moreover, a user may overlook an occurrence of a known term or terms if the user fails to contemplate such term or terms at the time of analysis.
Accordingly, there is a need for a system and method for text analytics that overcomes, alleviates, and/or mitigates one or more of the aforementioned and other deleterious effects of prior art.