The problem of predicting a structure hidden behind certain information is called a structured prediction problem. An apparatus (or program) for predicting an output structure with respect to an input structure is called a structured prediction system. The input structure and the output structure are certain discrete structures, and the structures can be expressed by so-called graphs (structures constructed by a set of nodes and a set of edges). The input and output structures can further be expressed by a labeled graph (a graph with label at nodes and/or edges). A model used in the structured prediction system is called a structured prediction model. The structured prediction model is used to predict the likeliest output structure with respect to the input structure.
Structured prediction problems in the real world include the following, for example: (1) the problem of predicting a grammatical or semantic structure from text data; (2) the problem of predicting a protein structure from genetic sequence data; (3) the problem of predicting (recognizing) an object included in image data; and (4) the problem of predicting a network structure from data expressing person-to-person relations or object-to-object relations.
Some problems (such as the problems (1) to (4) listed above) in the real world processed on a computer can be formulated as structured prediction problems when they are converted into such a form that the computer can easily handle. Examples are shown in FIGS. 1 to 3. In the mathematical expression, here, the input structure is denoted by x, and the output structure is denoted by y. The input structure x is one element of a set X of all possible inputs, or xεX. The output structure y is one element of a set Y of all possible outputs, or yεY. Since the output structure y depends on the input structure x, y is one element of a set Y(x) of all possible outputs with respect to x, or yεY(x). In addition, Y(x)⊂Y.
FIG. 1 shows a sequence structured prediction problem of extracting a named entity from English-language text. In the figure, a proper noun is given a label indicating the type of the proper noun.
The shown input structure x, “U.N. official John Smith heads for Baghdad on July 4th.” is segmented into eleven tokens (or words). Six tokens “U.N.”, “John”, “Smith”, “Baghdad”, “July”, and “4th” are labeled ORG, PER., PER., LOC., DATE, and DATE respectively: PER. stands for a person name, LOC. stands for a location name, and ORG. stands for an organization name.
FIG. 2 shows a tree-structure prediction problem for analyzing the dependency structure in English-language text. FIG. 2 shows an example of assigning labels indicating grammatical linking relationships to tokens (or words). The input sequence x, “U.N. official John Smith heads for Baghdad on July 4th.” is tokenized into eleven units. Each token is given a label indicating a grammatical linking relationship: The label given to “U.N.” has a link from “Smith” (“x1←x4”); the label given to “official” has a link from “Smith” (“x2←4”); the label given to “John” has a link from “Smith” (“x3←4”); the label given to “Smith” has a link from “heads” (“x4←5”); the label given to “heads” has no link since “heads” is the head word of this sentence; the label given to “for” has a link from “heads (“x6→x7”); the label given to “Baghdad” has a link from “for” (“x7→x8”); the label given to “on” has a link from “Baghdad” (“x8→x9”); the label given to “July” has a link from “on” (“x9→x10”); the label given to “4th” has a link from “July” (“x10→x11”); and the label given to “.” has a link from “heads” (“x11←x5”).
FIG. 3A shows a sequence structured prediction problem of estimating a gene region from a DNA base sequence. Base sequences (codons) which consist of three bases with four kinds, T, C, A, G, are given labels representing amino acids: The codon “ATG” is labeled “M”, which stands for the amino acid Methionine; the codon “TGA” is labeled “H”, which stands for the amino acid Histidine; the codons between “ATG” and “TGA” are labeled “R”, “D”, “W”, and “Q”; letters before “ATG” and letters after “TGA” are labeled “O” to indicate that there are no corresponding amino acids. The label “M” indicates the start codon of protein translation and the label “H” indicates the stop codon of protein translation.
FIG. 3B shows a problem of predicting a network structure from data expressing person-to-person relations or object-to-object relations. In the shown example, the input structure is combinations of a person's name and the person's purchasing history of certain products, and each person is labeled the name of a different person having the same preference. The shown input structure is: (Smith, (A, B, E)), (Johnson, (F, G, J)), (Williams, (A, C, D)), (Brown, (A, B, C, D, E)), (Jones, (A, C, D)), (Miller, (D, F, G, J)), (Davis, (A, F, G, H, J)). Each node (person's name) is given a label indicating a person having the same preference: Smith is labeled Brown; Johnson is labeled Miller, Davis; Williams is labeled Brown, Jones; Brown is labeled Smith, Williams, Jones; Jones is labeled Williams, Brown; Miller is labeled Johnson, Davis; and Davis is labeled Johnson, Miller.
One choice of the prediction of a correct output structure with respect to an input structure is to make use of the structured prediction model made by machine learning method. Methods for learning structured prediction models which structured prediction systems use in machine learning are generally classified into three major groups. A first type of learning uses so-called supervised data, which indicates a correct output structure with respect to an input structure. This method is called supervised learning since the data is used as a supervised signal. The supervised signal is an output structure considered to be ideal for a given input structure. Here, the supervised data is given as a set of combinations of an input structure and a supervised signal (ideal output structure). Supervised data having J samples is expressed asDL={(x(j), y(j))}Jj=1 An advantage of supervised learning based on supervised data is a high-performance structured prediction model can be learned. A difficulty of predicting (estimating) an output structure is the output structure y has interdependent relations that can be expressed by a labeled graph. Accordingly, the relation in the entire output structure should be considered when the data is created. Expert knowledge about the task is needed in many cases. The cost of creating a large amount of supervised data required to learn the structured prediction model is extremely high in terms of manpower, time, and expense. The performance of supervised learning depends largely on the amount of supervised data. If a sufficient amount of supervised data cannot be prepared, the performance of the structured prediction model obtained by supervised learning with the supervised data would be low.
A second type of learning is unsupervised learning, which uses data without a known output structure (hereafter unsupervised data) alone. Unsupervised learning is superior to supervised learning in that there is no need to worry about the cost of creating supervised data. Unsupervised learning, however, requires some types of prior knowledge, such as a hypothesis and similarity measure between input structures, to provide sufficient prediction performance. If the prior knowledge is not known or hard to implement into computer, the structured prediction model obtained from unsupervised learning does not provide sufficient prediction performance. Generally, since it is often hard to implement the prior knowledge in computer, structured prediction models obtained from unsupervised learning often have lower prediction performance than those obtained from supervised learning.
A third type of learning is semi-supervised learning, which uses both supervised data and unsupervised data. Semi-supervised learning is a method of improving the prediction performance of supervised learning by using together with unsupervised data when the amount of supervised data is limited. Therefore, semi-supervised learning has a possibility to provide a high-performance structured prediction model at low cost.
One known method of learning a structured prediction model by semi-supervised learning is described in J. Suzuki and H. Isozaki, “Semi-Supervised Sequential Labeling and Segmentation Using Giga-word Scale Unlabeled Data”, Proceedings of ACL-08, 2008, pp. 665-673 (hereafter non-patent literature 1). This method is obtained by extending supervised learning of a structured prediction model called a conditional random field (refer to J. Lafferty, A. McCallum, F. Pereira, “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data”, Proceedings of 18th International Conf. on Machine Learning, 2001, pp. 282-289), to semi-supervised learning. The structured prediction system using the structured prediction model learned in this method shows very good prediction performance with real data.