This invention relates generally to searching of documents, and more specifically, to methods and apparatus using sets of semantically similar words for text classification.
Many text classifiers in use today rely on words or combinations of words, which are sometimes referred to as keywords, for retrieval of information or documents. Such text classifiers are typically used in computer search engines and utilize word matching processes. The process of word matching allows for the identification of documents, paragraphs or sentences. The assumption is that by identifying and searching several selected key words, which are likely used together in a document, paragraph or sentence, the content of the retrieved document will contain information of interest to the user.
The existing solutions retrieve information based on one or more of these keywords and, while effective, they often retrieve tens of thousands and sometimes millions of documents. To find the best available information, the user has to manually inspect the documents. In other words, the user is left with the task to inspect those retrieved documents looking for the specific information requested.