Machine translation systems may be divided into rule-based machine translation systems, instance-based machine translation systems, and statistics-based machine translation systems. The statistics-based machine translation systems are a type of machine translation systems emerging in 1990s, and are also the major type of machine translation systems at present. The statistics-based machine translation systems do not require manually making rules and are applicable to all languages, and therefore have wide application.
The translation quality of the statistics-based machine translation systems largely depends on the quality of corpora. That is, a larger amount and a higher quality of data in the corpora result in a higher translation quality of the statistics-based machine translation systems. At the initial stage of corpus establishment, most corpora face the problem of data sparseness in the corpora.