Genotyping assays configured as a duplex reaction is well known in the art. In such a duplex reaction, two dyes having emissions at different wavelengths can be associated with each of a probe directed to one of two alleles of a target diploid genomic locus in a biological sample. In such duplex reactions, a discrete set of signals for each of three possible genotypes is produced by combinations of a first dye signal (signal 1) and a second dye signal (signal 2), which yield three discrete sets of signals given as (signal 1, signal 1), (signal 1, signal 2), and (signal 2, signal 2). Such signals may be collected as a data set that may include a plurality of data points, where each data point corresponds to one of the three discrete sets of signals for each sample in a plurality of samples. Such a data set of data points may be stored in a variety of computer readable media, and may be analyzed either dynamically during analysis or post analysis.
In that regard, the three discrete sets of signals that may be produced for each of three possible genotypes may be displayed in a Cartesian coordinate plot. The axes of such a plot may be displayed as a first dye signal versus a second dye signal, where each discrete set of signals for each sample may be represented as a data point in such a plot. Then, for a plurality of samples representative of a diploid genome, anywhere from 1-3 clusters of points may occur in such a Cartesian coordinate plot. Often, in such approaches, an angle in the Cartesian plot for each data point is determined, so that the data may be expressed in an angular format. Such data has typically been analyzed in the art by using cluster analysis to define discrete clusters, and assign a genotype based on cluster fit alone.
Such approaches may fail to accurately assign a genotype to a sample for a variety of reasons. First, the angle configuration of the three angles for a variety of genotype assays may be significantly different and additionally, the angle configuration may vary from run-to-run for any particular genotype assay. In that regard, the angle information alone is not sufficient to assign a genotype. Second, for a plurality of biological samples analyzed, it is possible to have the data clustered in only one or two clusters. For data in which all three clusters are present, a fit to a model may be more easily achieved, as the angle space is bounded by three possible solutions. However, for data sets obtained from a plurality of biological samples in which only one or two clusters occur, a fit to a model may be more difficult, resulting in incorrect genotype calls to be made for at least some samples. For example, a final call in such data sets may depend on the angle of a control sample. In that regard, if the control sample is contaminated, for example, or in any way falsely identified with an incorrect cluster, erroneous calls will be made for every member of that cluster.
There is a need in the art for a robust analysis of genotype data, in which the optimization is well-defined, and yields a suitable confidence in a final result of assignment of genotype for samples in data sets, where the data sets may be represented by a finite number of clusters of data points based on the ploidy state of the genome of an organism.