This invention relates to autonomously categorizing textual data. More specifically, statements are extracted from the textual data and classified based on a taxonomy.
Text analytics is essential for the understanding of unstructured and semi-structured data. Standard methods are used to classify and categorize large amounts of textual data e.g. call-center data. A conventional approach towards text analysis includes a determination of relevant facts to be extracted from a source e.g. a company name, or stock price, a determination of a relationship shared by the relevant facts, and the development of extractors to extract the predefined facts and relationships from the source. With this approach, it is difficult to predetermine relevant facts and relationships.
Some text analytics utilize a parse tree generated for each sentence to extract data from a source. The text analytics are based on their word form without disambiguation or further classification. Specifically, verb usage is not disambiguated to ascertain different meanings or classify the facts or relationships into different categories. Accordingly, a complete understanding of the sentiment from the extracted data cannot be attained.