Natural Language Processing (NLP) is a cognitive science discipline drawing on computational linguistics and artificial intelligence in the study of automated generation and understanding of natural human languages. In general, natural language generation systems convert information from computer databases into human languages, and natural language understanding systems convert audio/text samples of a human language into digital representations that are easier for computer programs to manipulate.
Natural language understanding is an extremely complicated problem with many contributing factors. One is that the grammar for natural languages is syntactically ambiguous. Often there are multiple possible parse trees for a given sentence. Choosing the most appropriate one usually requires semantic and contextual information. Another contributing factor to the problem of natural language understanding is sentence boundary disambiguation, which refers to the process of deciding where the beginning and ends of sentences are.
In the field of natural language processing, the most accurate systems often involve machines capable of supervised learning. These machines can, with supervised learning, extract information from text in natural language and produce text with sentence boundaries disambiguated. Specifically, a supervised learning machine may perform, via one or more language processing modules, tokenization, part-of-speech lookup, and classification by learning algorithm, etc. Prior natural language processing systems capable of sentence boundary disambiguation can take time and money to develop and require special expertise in linguistics, computational linguistics, as well as artificial intelligence.
As natural languages continue to evolve, so do natural language processing systems. Thus, there is always room for improvement.