Single nucleotide polymorphism (SNP) has been used extensively for genetic analysis. Fast and reliable hybridization-based SNP assays have been developed. (See, Wang et al., Science, 280:1077-1082, 1998; Gingeras, et al., Genome Research, 8:435-448, 1998; and Halushka, et al., Nature Genetics, 22:239-247, 1999; incorporated herein by reference in their entireties). Computer-implemented methods for discovering polymorphism and determining genotypes are disclosed in, for example, U.S. Pat. No. 5,858,659, incorporated herein by reference in its entirety for all purposes. However, there is still need for additional methods for determining genotypes and displaying the large amount of genetic information obtained from such experiments in a user-friendly interactive computer application.
Users often require that a genetic segment be of a certain size to be of interest and accepted as a true positive, e.g. reflective or indicative of a biologically relevant genetic event. This demand has triggered the ability to computationally smooth and/or join and/or otherwise mathematically manipulate genetic data.
The interpretation of gene copy number abnormalities, e.g. gains or losses of the number of copies of a specific gene relative to a normal population reference, may be performed in a qualitative manner using ratios of changes of sequence per chromosome that are aligned with a linear position. These qualitative assessments, to date, do not incorporate a measurement for a typical population variation, e.g. Copy Number Variation (CNV). The software applications disclosed herein are capable of identifying the start and stop linear positions for each segmental aberration, quantifying the number of interrogating markers, e.g. number of genetic markers being analyzed, comprised of the CN aberration, combining the start and stop linear positions into a CN aberration, or segment size estimate, which includes the density of markers within the region. and estimating the percentage of reported population CNVs within the segment based on external database information. These features allow for the assessment of disease-related copy number aberrations versus copy number abnormalities which are normally found in the population but do not cause a phenotype or biological effect.
This disclosed software addresses these issues by allowing the user to define “CytoRegions” of special interest within genetic data displayed by the software program of the invention, which allows the user to modify the results by an additional data filtering process, which can be of greater or lesser stringency than the data filtering applied to the rest of the genome outside of the user-defined CytoRegions.