Conditional random field (CRF) models are commonly used in sequential labeling tasks, such as part-of-speech tagging and information extraction. In an application phase of operation, a CRF model accepts an input sequence x having T tokens, e.g., x=(token1, token2, . . . tokenT). The CRF model determines a series of labels y=(label1, label2, . . . labelT) that are most likely associated with the tokens in the input sequence. For example, a CRF model can assign part-of-speech labels to words of an input sentence.
In a training phase of operation, one or more analysts may be asked to manually annotate data in a training set with labels. Based on the manually-labeled training set, a training module then determines model parameters which maximize an identified training objective. However, in some cases, it may not be feasible to provide a training set that is large enough to produce a CRF model with desired accuracy. There may be additional shortcomings in known CRF training approaches.