1. Technical Field
The embodiments herein generally relate to text fragment classification systems, and, more particularly, to a system and method for rule based classification of a text fragment from a multimedia content.
2. Description of the Related Art
Automated text classification is the task of automatically mapping input text to one or more classes. With the increasing availability of large collections of text, classification plays a critical role in managing information and knowledge. Conventional algorithms (e.g., a machine learning algorithm) use statistical characteristics of raw corpus of relevant language vocabulary for classifying sentences in the same language. These methods transform the raw corpus into numbers based on statistical characteristics of the language. However, such methods can be implemented for generic use where statistical usage characteristics of the language yields adequate normalization. For example, stemming algorithm such as Porter's stemming use English language vocabulary statistics to stem a text in general use. However, for a bounded context in a specialized domain rule, where each word has a specific meaning, such an algorithm may not work since the algorithm may give multiple contexts for each such word. Accordingly, there remains a need for a classification system to classify text fragments that implements domain rules which is different than standard natural language processing (NLP) technique that uses primarily the probabilistic characteristics of the language concern.