Comparative Genomic Hybridization (CGH) allows the comparison of at least two samples of nucleic acids based on simultaneous hybridization to a set of target nucleic acids. The target nucleic acids are typically immobilized, e.g., in metaphase or interphase chromosomes or, more conveniently, in a nucleic acid array. The sample nucleic acids are typically labeled, with a different label for each different sample. In one embodiment, array CGH typically involves the simultaneous hybridization of genomic DNA from two cell populations to an array of elements containing DNA sequences from different locations in the genome. The two genomic DNA samples are differentially labeled, and the ratio of the intensities of the hybridization to an array element is proportional to the relative copy number of sequences in the two genomes that bind to the element. Comparison of ratios among the elements allows detection of variations in relative DNA copy number among the different sequences on the array.
The degree of identity of sequence between two DNA fragments affects their ability to hybridize, so that hybridization of fragments with significantly different sequences can be strongly discriminated against by choosing appropriate hybridization conditions. For example arrays designed to detect specific base changes typically use oligonucleotides of about 20 nucleotides in length with a base change in the middle. This is about a 5% sequence difference and specific oligonucleotides need to be designed for each difference that it is desired to detect. Specifically designed arrays are used to detect many of these differences at the same time.
In the present invention, array CGH is employed to identify sequence differences between two nucleic acid samples. In particular, sequence differences between the two genomic DNAs on the order of 1 nucleotide every 100 bases, or even fewer, <1% sequence difference can be detected using a generic array made from large genomic (e.g., BAC) clones. Accordingly, one embodiment of the invention provides a rapid method of mapping the genomic constituents, such as genes that influence risk of disease. Current mapping procedures are very labor intensive, requiring individual analysis of each locus or development of specific arrays based on known sequence differences.