Text classification is an important component of text mining. It is based on predefined subject categories, and files are each assigned to a category. This automatic text classification system can help people find needed information more effectively. In one aspect, classifying information is one of the most fundamental cognitive processes. Moreover, conventional classification researches have produced enriching results and practical uses. Nevertheless, with the rapid growth of text messaging, especially the proliferation of online text messages, text classification is considered the key technology in processing and organizing large quantities of data. At present, text classification is widely used in various fields. However, due to the general increase in web-based information, demands for higher accuracy, and better verification, the demand for text classification technology is also growing. Accordingly, constructing an effective text classification system is still one of the main areas of research in the field of text mining.
In the field of natural language processing, texts are mainly represented using the vector space model (VSM). This method considers that each text contains a working concept used to express its independent attribute, and each attribute can be regarded as a dimension of the concept. These independent attributes are called text features wherein the text can express a set of characteristics. Furthermore, vectors often use the cosine of the angle method to measure a degree of similarity. Then the degree of similarity between the text vector and the vector of the candidate category is used to categorize the text.
With current technologies, one has to calculate the degree of similarity between the text vector and the candidate category. Each computation is quite lengthy and uses the angle of cosine to arrive at a measurement. Furthermore, the semantics of current technologies do not have any specification, and the classification is not very accurate.