Automated or machine processing of natural language has long been an active area of research and development. Older work often focused on developing a semantic framework, with a text-to-semantics front end processor so that a computer could read an English (or other language) passage and build a representation of its contents that was adequate to make inferences or to answer questions about it. For example, upon processing the sentence “John drove his car to work,” the framework would “understand” that John was a person, that he used a machine called a car to go to a place called work, that the car probably traveled over a road and not over water or through the air, and so on. Within certain limited universes, these semantic frameworks can be very powerful (for example, medical expert systems can improve the efficiency and accuracy of doctors' clinical diagnoses) but general reading comprehension remains out of machines' reach.
More recently, statistical natural language processing has delivered some of the most impressive results. In statistical NLP, computers simply examine huge volumes of text to develop rules about what letters, words and phrases go together, without constructing anything that could be considered an understanding of what the text means. Despite the lack of understanding, statistical NLP underlies feats such as real-time machine translation. Machine translations are usually inferior to a human's work, and occasionally computers produce absurd errors or incoherent output, but the approach is still valuable for its high capacity and low expense. In machine learning classifiers, Bayesian and other probabilistic techniques are used to “learn” words, word roots, and word combinations that match with human-classified categories. These classifiers are often used to classify text articles as “positive” or “negative” after learning patterns from human-labeled documents.
Despite the impressive accomplishments in niche applications and recent statistical NLP advances, unsolved natural-language processing problems remain. For example, there is an enormous amount textual information about financial markets (everything from Securities and Exchange Commission filings and corporate annual reports to newspaper articles and Internet message-board postings); and detailed information about actual market transactions is readily available. It seems that a machine should be able to extract information from the text and produce predictions about market developments, but neither semantic nor statistical methods have made much progress towards that goal.
New methods for machine analysis of natural-language text messages to predict financial market directions may be of great value.