Reproduction of selected plant varieties by tissue culture has been a commercial success for many years. The technique has enabled mass production of genetically identical selected ornamental plants, agricultural plants, and forest species. The woody plants in this last group have perhaps posed the greatest challenges. Some success with conifers was achieved in the 1970s using organogenesis techniques wherein a bud, or other organ, was placed on a culture medium where it was ultimately replicated many times. The newly generated buds were placed on a different medium that induced root development. From there, the buds having roots were planted in soil.
While conifer organogenesis was a breakthrough, costs were high due to the large amount of handling needed. There was also some concern about possible genetic modification. It was a decade later before somatic embryogenesis achieved a sufficient success rate so as to become the predominant approach to conifer tissue culture. With somatic embryogenesis, an explant, usually a seed or seed embryo, is placed on an initiation medium where it multiplies into a multitude of genetically identical immature embryos. These can be held in culture for long periods and multiplied to bulk up a particularly desirable clone. Ultimately, the immature embryos are placed on a development medium where they are intended to grow into somatic analogs of mature seed embryos. As used in the present description, a “somatic” embryo is a plant embryo developed by the laboratory culturing of totipotent plant cells or by induced cleavage polyembryogeny, as opposed to a zygotic embryo, which is a plant embryo removed from a seed of the corresponding plant. These embryos are then individually selected and placed on a germination medium for further development. Alternatively, the embryos may be used in artificial seeds, known as manufactured seeds.
There is now a large body of general technical literature and a growing body of patent literature on embryogenesis of plants. Examples of procedures for conifer tissue culture are found in U.S. Pat. Nos. 5,036,007 and 5,236,841 to Gupta et al.; U.S. Pat. No. 5,183,757 to Roberts; U.S. Pat. No. 5,464,769 to Attree et al.; and U.S. Pat. No. 5,563,061 to Gupta. Further, some examples of manufactured seeds can be found in U.S. Pat. No. 5,701,699 to Carlson et al., the disclosure of which is hereby expressly incorporated by reference. Briefly, a typical manufactured seed is formed of a seed coat (or a capsule) fabricated from a variety of materials such as cellulosic materials, filled with a synthetic gametophyte (a germination medium), in which an embryo surrounded by a tube-like restraint is received. After the manufactured seed is planted in the soil, the embryo inside the seed coat develops roots and eventually sheds the restraint along with the seed coat during germination.
One of the more labor intensive and subjective steps in the embryogenesis procedure is the selective harvesting from the development medium of individual embryos suitable for germination (e.g., suitable for incorporation into manufactured seeds). The embryos may be present in a number of stages of maturity and development. Those that are most likely to successfully germinate into normal plants are preferentially selected using a number of visually evaluated screening criteria. A skilled technician evaluates the morphological features of each embryo embedded in the development medium, such as the embryo's size, shape (e.g., axial symmetry), cotyledon development, surface texture, color, and others, and selects those embryos that exhibit desirable morphological characteristics. This is a highly skilled yet tedious job that is time consuming and expensive. Further, it poses a major production bottleneck when the ultimate desired output will be in the millions of plants.
It has been proposed to use some form of instrumental image analysis for embryo selection to supplement or replace the visual evaluation and classification described above. For example, International Patent Application No. PCT/US99/12128 (WO 99/63057), explicitly incorporated by reference herein, discloses a method for classifying somatic embryos based on images of embryos or spectral information obtained from embryos. Generally, the method develops a classification model (or a “classifier”) based on the digitized images or NIR (near infrared) spectral data of embryos of known embryo quality (e.g., potential to germinate and grow into normal plants, as validated by actual planting of the embryos and a follow-up study of the same or by the morphological comparison to normal zygotic embryos). A “classifier” is a system that identifies an input by recognizing that the input is a member of one of a number of possible classes. The classifier in this case is thus applied to an image or spectral data of an embryo of unknown quality to classify the embryo according to its embryo quality.
Various classification models, or classifiers, are available, such as Fisher's linear and quadratic discriminant functions, classification trees, k-nearest-neighbors clustering, neural networks, and SIMCA. All of these models have been successfully used in many applications, but have been found to perform below expectations when classifying embryos because they either fail to be fast enough or the data from the embryos do not meet the requirements for these classifiers to work.
Fisher's linear discriminant function basically rotates data until it finds the best straight dividing line between groups, assuming that the original data have a Gaussian distribution (i.e., bell-shaped curve). Fisher's quadratic discriminant function is the same, except that it allows for a curved dividing line. Data from embryos are not from a Gaussian distribution and often the boundaries between groups are not straight lines or simple curves, so these two methods do not always work well.
Classification trees divide data into many little blocks or categories. At first, all of the data are divided into two blocks, and then each of these blocks is further divided, and so on. Each block is divided in a way that makes the data in each smaller block more homogenous in the sense that the data points are close together geometrically or the data values are more similar. This method has not worked well for embryo classification using measures of data homogeneity, and it fails using probabilities because it does not always leave enough data points in some blocks so that the probabilities can be estimated well. Also, this method uses many straight lines to approximate curved boundaries between groups. As a result, the misclassification error rate has gone up because of the stair-step nature of the resulting classification boundary.
K-nearest-neighbors clustering classifies embryos by finding how much the statistics from a new embryo image differs from those of previous embryo images whose quality is known. Which class has the majority of the k closest points determines the classification of the new embryo. This is a very simple method but can be very slow in practice because all of the differences between the statistics from the new embryo and all of the statistics of the embryos in the library (i.e., the embryos of known quality) must be calculated. Thus, the method is not suitable for rapidly classifying embryos, for example, at the rate of several embryos per second.
Neural networks classify embryos by finding a lot of functions which are combined into a single curved boundary that best divide the data into desired groups. The difficulty is in determining how many functions are needed and estimating the coefficients in these functions. Often, a lot of work and time are required to find such a combined model. Classification of a new embryo occurs by passing its statistics to the combined model and calculating its group membership. The difficulty in finding the combined model, as well as the sensitivity of the model to how well the original training data represent all future data, limit the application of this method.
SIMCA is a classification method originally developed for classifying chemicals. For each group, principal components are calculated based on statistics. A new embryo is classified by determining which group's principal components best predict the values of the embryos' statistics. It works well, but requires a lot of data preparation. The additional data preparation will make this method too slow in a production environment.
Additionally, PCT/US99/12128 (WO 99/63057), incorporated above, discloses an embryo classifier using a Lorenz curve and a Bayes optimal classifier, termed “Lorenz-Bayes” classifier, to be described in detail below. While this method has been successful in rapidly and accurately classifying embryos according to their embryo quality, there is a continuing need to further increase the classification speed and accuracy in order to achieve mass classification required for mass production of manufactured seeds. The present invention addresses this continuing need.