The principle of diminishing returns is evident in recent advances in machine translation (MT). Each new level of complexity requires significant work for little gain. Examples include Good-Turing smoothing, syntactic-tree reordering, weighted finite state machines. Each new strategy provides a small bump in quality, but also raises a question invoking memories of Ptolemy's circles on circles: Are we “adding epicycles” to an overly complicated model? Is there a different model which is sufficiently expressive to describe language, yet having lower complexity? Newton's model was better than Ptolemy's not only because it was more accurate, but because it was simpler.
The study of any complex field often begins with a model. Once proposed, the model is applied and tested. For instance, human language can be modeled with a context-free generative grammar. The model can produce any acceptable phrase in most human languages, but it also generates many unacceptable ones. In order to save the model, tweaks and complications must be added. One may add components to such a model to make it more accurate, but at the cost of becoming cumbersome. Such is true of currently leading MT systems, including but not limited to Google Translate and Skype Translate. Some leading systems also employ convolutional neural networks (AKA deep learning). Such systems have a deep weakness which makes them much more challenging to integrate into future artificial intelligence (AI) systems: they are black boxes. Their inner workings defy human analysis because of the sheer number of connections between the nodes (the computational units within a neural network).
Just as Newton sought a simpler theory for the movement of physical bodies, and his theory led to a plurality of sciences and technologies, a simpler theory of human language will lead to a plurality of language technologies, beginning with MT.
Commonly-known languages are not representative of the full spectrum of human possibility. The availability bias may lead some researchers to overemphasize linguistic features such as rules governing part of speech (PoS). Significant effort has been expended over converting one language's PoS rules to another's. Interesting and broadly-useful mathematical constructs such as tree-transducers have been soundly developed, but can concentrate focus and work on linguistic features that do not always lead to accurate parsing. Misapplied focus (such as on PoS rules) may be indicative of how we choose to see language—as opposed to how the human brain generates it. For example, in many languages, words are inflected (having alternate forms) based on other things than part of speech. It is commonly known that inflections can be determined by tense (e.g. present, past), mood (e.g. subjunctive, conditional), person (first, second), and case (e.g. genitive, accusative). It is less known, and often more associated with less-influential languages, that some of the features that can affect inflection include animacy (animate or not), and shape (e.g. ball-shaped, rod-shaped, flat-shaped). There are a host of other features as well, proposed by various linguists, including but not limited to agency, associated motion, aspect, clusivity, comparison, definiteness, evidentiality, focus, gender, honorifics, mirativity, modality, noun-class, number, polarity, specificity, telicity, topic, transitivity, valency, voice, volition, and even whether or not the subject of a sentence loves the object in the case that the object is a person.
In many cases, the salient problem is that language models have excessive expressive capacity over a space of semantically unimportant features. One strategy to avoid this is to maintain an acute awareness of the diversity among small population languages. By virtue of being less influential, they have features that didn't spread to other more commonly-spoken languages. Small-population languages can have rare and surprising features. Nahuatl has transitive nouns and split possessives. Some dialects of Euskera (Basque) have a unique suffix which is only used when the subject of a sentence loves the object of the sentence (and the object is a person). Including the study of small population languages in a larger study can indicate deeper truths about human language and can lead a researcher to not be rigid in places where the model should be flexible.
Any developer of a multilingual MT system will arrive at the same truth of combinatorics: to enable translation between N languages, the MT system must have either N(N−1) translators, or alternatively N translators with a two-step process using an interlingua, resulting in a star-schema having the interlingua at the center. Google Translate, for instance, puts a human language at the center, using English for an interlingua. German to Spanish translation requires two steps: German→English→Spanish. For some low-data languages (e.g. Catalan), Google chose to use a similar but high-data language as a secondary interlingua, in this case Spanish. Therefore, translating from German to Catalan requires three steps: German→English→Spanish→Catalan. Information is lost in each step because no single human language perfectly encodes all the information of any other language.
There is a need for improved systems and methods for performing machine translation.