Statistical machine translation engines use a log-linear framework to combine sub-models together and integrate the sub-costs (or scores) into one single cost/score to rank the translation decisions. Such frameworks are sensitive to the weights used for the log-linear style combinations, which makes the translation engine less adaptable for different genres because the error surface of a translation model is rugged and the optimization algorithms are fragile and easily suffer from any starting points (seeds). To adapt such models, the initial seeds to the optimization algorithm are playing a key role in optimization success. In existing approaches, often such initial seeds are only obtained by random perturbs of a seed already provided in a software shipment.
The translation quality of the output text of a machine translation system is typically measured via automatic metrics including BLEU (Bilingual Evaluation Understudy), TER (Translation Edit Rate), WER (Word Error Rate), METEOR (Metric for Evaluation of Translation with Explicit Ordering), n-gram precisions, and their variants. Statistical models for natural language processing (NLP) rely on initial starting points from which they optimize an objective function given the data. Finding the optimal solution is typically hard (NP-complete) and an optimizer finds a local optimum that is highly dependent on the initial seed. Thus, the quality of the results is positively impacted by a finding a good initial seed, and a need exists in making such a finding.