1. Field of Invention
The present invention relates to a method of parameters adjustment.
2. Description of Related Arts
With the rapid development of internet and in-depth economic globalization, interpersonal communications between different nations is more and more frequent. Language issues become the barrier which restricts the free communications between people. In order to use our national language for barrier-free communications with the world, more and more extensive demand for translation services is needed.
Written or spoken manual translation by human operator may not only be time consuming, but also costly. The object of machine translation is to achieve an automated translation from one language to another language. At present, a variety methods of machine translation are existed, which includes machine translation based on actual usage examples and machine translation based on statistical data. Wherein the current mainstream method is machine translation based on statistical data. For a given particular sentence in source language, machine translation based on statistical data will try to search for the best possible translation sentence in target language. Assume f refers to source language and e refers to target language, machine translation will try to find the e as follows:
                    arg        ⁢                                  ⁢        max            e        ⁢                  ⁢          P      ⁡              (                  e          |          f                )              ;in other words, among all possible translation sentence, the translation with the greatest P (e|f) is selected. In the conventional log-linear model, by introducing characteristics and weights, the factorization of P (e|f) is:
            P      ⁡              (                  e          |          f                )              =                  exp        ⁡                  [                                    Σ                              m                =                1                            M                        ⁢                          λ              m                        ⁢                                          h                m                            ⁡                              (                                  e                  ,                  f                                )                                              ]                                      Σ                      e            ′                          ⁢                  exp          ⁡                      [                                          Σ                                  m                  =                  1                                M                            ⁢                              λ                m                            ⁢                                                h                  n                                ⁡                                  (                                                            e                      ′                                        ,                    f                                    )                                                      ]                                ,where λm refers to weights, hm (e,f) refers to characteristics functions, common characteristics functions includes language model, translation model, sequence model and correctional word penalty items.
The training process of a translation system is a process of searching the optimal solution for the parameter λm, m=1, . . . , M. In this regard, many parameters optimization methods are developed, and that the most widely employed method is Minimum Error Rate Training, MERT in which the optimization criteria is:
                    λ        ^            1      M        =                            arg          ⁢                                          ⁢          min                          λ          1          M                    ⁢              {                              ∑                          s              =              1                        S                    ⁢                                          ⁢                      E            ⁡                          (                                                r                  s                                ,                                                      e                    ^                                    ⁡                                      (                                                                  f                        s                                            ;                                              λ                        1                        M                                                              )                                                              )                                      }              ;in other words, the target of MERT is to locate the parameter λm, m=1, . . . , M, such that the error rate of the training set in the translation system is the minimum.
During the process of parameters tuning in MERT, the final translation performance is considered directly, therefore a better result is obtained. Nevertheless, there are still some deficiencies in MERT. For examples, in the optimization criteria, regularization term is not used and parameters overfitting is easily caused. Also, the support to large-scale characteristics in MERT is not good. On the other hand, since the objective function in MERT is not convex in nature, MERT requires multiple initialization values for preventing the local minimum and this increases the complexity of calculation.
For parameters tuning, in addition to the above mentioned Minimum Error Rate Training MERT, online (online) training algorithm is another training method which is based on the greatest margin (margin) and conditional likelihood, CL (conditional likelihood, CL).
Even though the training method which is based on the greatest margin and conditional likelihood can resolve the problem of parameters training to a certain extent, both have their own deficiencies. In particular, in the training method which is based on the greatest margin, the objective function is not convex in nature, therefore the problem of local minimum is easily caused and it is difficult to obtain the optimal solution. In the training method which is based on conditional likelihood, the problems of local minimum is solved. However, cost functions is not integrated in the objective functions. Therefore the optimization process cannot be processed on the training set directly and the valid model parameters cannot be obtained.