In today's technologically-driven society, users and businesses are increasingly relying on computing systems for facilitating and providing various types of services. As the reliance on computing systems has increased, so has the need for high quality natural language processing techniques to ensure computing systems communicative effectively with users. To that end, developments in natural language processing techniques have made computing systems capable of reading text, processing speech, interpreting text and speech, determining sentiments within text and speech, and determining relationships between words in the speech. Natural language processing techniques include, but are not limited to, statistical natural language processing techniques, machine learning natural language processing techniques, rules-based natural language processing techniques, algorithmic natural language processing techniques, among other natural language processing techniques. Such techniques may be utilized by computing systems to parse text, perform part-of-speech tagging on the text, identify languages associated with the text, and identify semantic relationships. Certain natural language processing systems are capable of translating text in provided in one language into a different language, performing speech-to-text or text-to-speech conversions, generating summaries for text, analyzing sentiments of individuals that have created text, extracting contextual information from analyzed text, determining topics and subject matters associated with text, supplementing analyzed text, and categorizing text based on a variety of criteria.
Various natural language processing systems include part-of-speech taggers. Part-of-speech taggers of natural language processing systems typically comprise software that can read text provided in a certain language and can assign parts of speech to each word in the text. For example, part-of-speech taggers may identify and tag nouns, verbs, adjectives, prepositions, and other parts of speech within a particular text. Certain part-of-speech taggers can also identify the subject of a sentence, the object of a sentence, and other types of relationships associated with the words included in text. In some natural language processing systems, text may be manually annotated with concepts, which may include multiword concepts. The annotated text may then be utilized to train a natural language processing system so that when different text is parsed and analyzed by the natural language processing system, the natural language processing system may recognize and understand the concept when it appears in the different text.
While current natural language processing systems provide various advantages and useful functionality, a common problem that may occur is that a complex topic, title, or parameter provided in text that is encountered by a natural language processing system may be described by multiple words or groups of words. These multiple words may confuse a part-of-speech tagger of a natural language processing system, which may incorrectly identify words in the text as being nouns, verbs, or other parts of speech, when, in reality, the identified words should be tagged as different parts of speech based on the intent of the user or device that supplied the text. Additionally, as indicated above, while certain natural language processing systems allow for annotating text with concepts, such annotated text is typically necessary to train the natural language processing systems to recognize such concepts when analyzing other texts. Such training is often time consuming, computationally expensive and requires the increased use of memory resources, processor resources, and network bandwidth. Furthermore, to maintain accuracy and effectiveness, such training must be performed regularly and periodically to reflect the changing usage of concepts within a text, a project, a domain, a language, or languages. While application and domain specific dictionaries associated with current natural language processing solutions could theoretically be extended to improve their recognition of multi-word combinations common in domain specific applications, such an extension would still suffer quality, performance, and maintenance issues.
Based on the foregoing, current natural language processing technologies and processes may be modified and improved so as to provide enhanced functionality and features. Such enhancements and improvements may effectively decrease the effort required to parse and understand text, while simultaneously improving the accuracy of natural language processing systems. Additionally, such enhancements and improvements may provide for optimized annotating capabilities, increased autonomy, improved interactions with users or devices, improved user satisfaction, increased efficiencies, increased access to meaningful data, substantially-improved decision-making abilities, increased ease-of-use, and simplified or reduced maintenance. Furthermore, such enhancements and improvements may reduce processor, memory, and network bandwidth usage. Moreover, such enhancements and improvements may increase a natural language processing system's ability to ascertain the meaning of words in text, determine relationships between the words in the text, and tag parts of speech accurately and effectively.