Rule-based named entity extraction is a natural language processing technique that identifies one or more named entities present in unstructured text data based on one or more predefined rules. Examples of named entities include, but are not limited to, person names, products, organizations, locations, email addresses, vehicles, computer parts, currencies, temporal entities such as dates, times, days, years, months and weeks, and numerical entities such as measurements, percentages and monetary values. Rules are regular expressions formulated by domain experts based on writing style and terminologies of target domain of unstructured text data.
While developing a rule-based named entity extraction system, the rules defined by domain experts are implemented in a high level programming language such as Java, C++, FORTRAN, and PASCAL etc. by users who possess software coding skills. The users who have software coding skills are hereinafter referred to as specialists. The domain technology experts may not possess software coding skills and may require technical specialists at the time of implementation of the rules. The dependency of domain experts on technology specialists increases the cost of development of a rule-based named-entity extraction system due to involvement of extra resources/manpower. Further, the development time of the extraction system increases due to additional communication required between domain and technical specialist. Furthermore, the technology specialists need to make changes at source code level of the rule-based named entity extraction system whenever the rules have to be modified/updated. The process of making changes at source code level for modifying rules is arduous and time consuming. As a result, the productivity and efficiency of technical specialists is decreased. Also, the cost and efforts involved in procuring such resources for different environment makes the entire process arduous and infeasible.
Further, regression testing is an essential aspect of developing a rule-based named entity extraction system and is conducted to verify source code of the rule-based named entity extraction system whenever the source code is modified. Presently, regression testing is conducted by either a customized external regression testing tool or by adding a custom regression testing code to the source code of the extraction system. However, conducting regression testing through testing code and external regression testing tool is arduous, time consuming and requires involvement of technical specialists.
In light of the above-mentioned disadvantages, there is a need for advanced rule modeling and regression testing tools that facilitate quick and easy development of a rule-based named entity extraction system. The rule modeling and regression testing tools should facilitate domain experts who do not possess requisite software coding skills to develop a rule-based named entity extraction system. Further, the rule modeling and regression testing tools should enhance the productivity/efficiency of the technical specialists.