Many approaches to automated text analysis require a set of keywords which, by applying Boolean matching rules to a large text corpus, define a set of documents for further study (as in search or e-discovery), counting (for sentiment and topic analysis), or as a first step in most sophisticated approaches to automated text analysis. In most areas, users choose these keywords and document sets by hand. In some applications, mostly involving search and advertising, methods exist to suggest new keywords based on analyzing structured databases, such as from weblogs, meta-tags, or search queries. However, such approaches generally require pre-structured databases that necessarily limit the scope of new keywords and directions of further analysis because of their reliance on the original keywords utilized to structure the data. Thus, these approaches are often unreliable or provide incomplete information when utilized with rapidly evolving bodies of text or discussions. In addition, they involve the time and complexity required to produce the structured data from which new keywords are developed. While some existing techniques mine keywords directly from textual sources, these techniques are quite limited and are typically based on looking up keywords in a thesaurus to generate new keywords.
In view of the foregoing, there is a need for systems and methods for analyzing text by extracting keywords from a larger range of information in unstructured data.