The subject matter discussed in this section should not be assumed to be prior art merely as a result of its mention in this section. Similarly, a problem mentioned in this section or associated with the subject matter provided as background should not be assumed to have been previously recognized in the prior art. The subject matter in this section merely represents different approaches, which in and of themselves can also correspond to implementations of the claimed technology.
Many efforts to exploit linguistic hierarchy in natural language processing tasks like machine translation make use of the output of a self-contained parser system trained from a human-annotated treebank. A second approach aims to jointly learn the task at hand and relevant aspects of linguistic hierarchy, inducing from an unannotated training dataset parse trees that may or may not correspond to treebank annotation practices.
Most deep learning models for natural language processing that aim to make use of linguistic hierarchy integrate an external parser, either to prescribe the recursive structure of the neural network or to provide a supervision signal or training data for a network that predicts its own structure. Some deep learning models take the second approach and treat hierarchical structure as a latent variable, applying inference over graph-based conditional random fields, the straight-through estimator, or policy gradient reinforcement learning to work around the inapplicability of gradient-based learning to problems with discrete latent states.
For the task of machine translation, syntactically-informed models have shown promise both inside and outside the deep learning context, with hierarchical phrase-based models frequently outperforming traditional ones and neural machine translation models augmented with morphosyntactic input features, a tree-structured encoder, and a jointly trained parser each outperforming purely-sequential baselines.
An opportunity arises to accomplish the longstanding goal of natural language processing to take advantage of the hierarchical structure of language without a priori annotation. Improved natural language processing may result.