The present invention relates to document classification, and more specifically, this invention relates to analyzing and classifying textual data within a plurality of documents.
Data classification is an important element in performing data analysis and management. There are a large number of websites that may be served by computers on the Internet, accessible by many devices. There is a large amount of textual data contained in all the web pages, as well as textual data stored in offline or local network data storage, and there is a need for indexing and classification of this text for data retrieval purposes. However, the sheer number of documents containing such textual data may make it difficult for users to find what they are looking for in a reasonable time and in a logical way.