Various sequence tagging approaches have been proposed over the course of the recent decades. Such approaches range from tagging tokens with Part-of-Speech (POS) tags to tagging tokens with supertags encoding complex grammatical relations. Conventional tagging approaches can typically be categorized in two dimensions: (a) domain specificity and (b) complexity of the encoded semantic information.
The task of POS tagging is defined as providing categories for each token with grammatical tags according to their syntactic function in a sentence. Accordingly, this tagging task is very much domain independent and no semantic information is encoded. State-of-the-art POS tagging systems can include those based on Maximum Entropy models and deep learning architectures, such as LSTM. However, deploying available POS tagging systems as off-the-shelf solutions often results in a significant performance drop. To compensate for such decreases in performance, domain adaption approaches have been proposed to achieve decent performance for POS tagging across different domains.
One example of encoding tags with complex grammatical relationships can be the task of Semantic Role Labeling (SRL). The task of SRL goes beyond syntactic labels by adding semantic information such as AGENT or RESULT based on thematic roles. Automatic approaches to SRL have been developed over the last few decades, including some recent approaches that adopt neural network architectures.
Natural language processing (NLP) applications typically utilize various tagging approaches to facilitate interactions between users and machines. For example, NLP applications can provide an interface between users and machines to allow users to interact with the machines in the users' natural language, where the machines rely on tagged data to understand, interpret, and/or predict what the natural language input means and/or what actions the users want the machines to take. For example, NLP applications can provide an interface between a user and an information retrieval system. In such applications, the information retrieval system retrieves data from its underlying inverted index in response to requests from users. Using NLP, a user's input can be structured according to the user's natural language or way of speaking. To ensure that the information retrieval system retrieves the most appropriate, most relevant, most meaningful, and/or accurate data, the user's natural language input must be properly construed by the NLP application for the information retrieval system. Thus, improvements to the tagging utilized by the NLP applications can result in improvements to the ability of the NLP application and the information retrieval system to retrieve the most appropriate, most relevant, and/or most meaningful data in response to the user's natural language input.