Statistical parsing (see: Jelinek et al., “Decision tree parsing using a hidden derivation model”, Proc. ARPA Human Language Technology Workshop, pp. 272-277, 1994; Magerman, “Statistical decision-tree models for parsing”, Proc. Annual Meeting of the Association for Computational Linguistics, pp. 276-283, 1995; Collins, “A new statistical parser based on bigram lexical dependencies”, Proc. Annual Meeting of the Association for Computational Linguistics”, pp. 184-191, 1996; Charniak, “Statistical parsing with context-free grammar and word statistics”, Proceedings of the 14th National Conference on Artificial Intelligence, 1997; and Collins, “Three generative, lexcialised models for statistical parsing”, Proc. Annual Meeting of the Association for Computational Linguistics, pp. 16-23, 1998) has recently shown great success; in fact, close to 90% label precision and recall can now be achieved (see Collins, “Three . . . ”, supra). A statistical model is typically constructed by extracting statistics from a large human-annotated corpus. During testing, the statistical model is used to select the parses of input sentences. One issue is that if test data are different in nature from the training data, the performance of a parser will become worse than that of a matched condition.
In order to adapt a statistical model to newly-acquired data, various methods have been proposed in the area of language modeling, which range from interpolating a static model with a dynamic-cache model (see: Jelinek et al., “A dynamic language model for speech recognition”, Proc. of the DARPA Workshop on Speech and Natural Language”, pp. 293-295, February 1991; Kupiec, “Probabilistic model of short and long distance word dependencies in running text”, Proc. of the DARPA Workshop on Speech and Natural Language, pp. 290-295, February 1989; and Kuhn et al., “A cache-based natural language model for speech recognition”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(6):570-583, 1990) to more sophisticated methods using the Maximum Entropy principle (see: Lau et al., “Adaptive language modeling using the maximum entropy principle”, Proc. of the ARPA Human Language Technology Workshop, pp. 108-113, March 1993; and Rosenfeld, “Adaptive Statistical Language Modeling: A Maximum Entropy Approach”, PhD thesis, School of Computer Science, Carnegie Mellon University, 1994). These methods can be viewed as smoothing the static model given constraints imposed by, or statistics extracted from the new data. In other developments, transform-based model adaptation (see: Gales et al., “Mean and variance adaptation within the MLLR framework”, Computer Speech and Language, 10:249-264, October 1996; and Leggetter et al., “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models”, Computer Speech and Language, 9:171-185, October 1995) has been proven successful in capturing channel or speaker variations during the testing of a speech recognizer.
Generally, it has been observed that there is significant performance degradation when a statistical parser is tested on material whose style is different from that of its training material. A straightforward way of improving parsing accuracy is to collect more training data similar to test material and re-train the parser. However, the approach is not appealing in that collecting and annotating data is labor- and time-intensive.
Accordingly, a need has been recognized in connection with improving the performance of a statistical parser by adjusting or adapting the model parameters such that the adapted model can better capture the underlying regularity of test material.