Comparative Genomic Hybridization (CGH) and location analysis are important applications, which allow scientists to improve their understanding of the expression and regulation of genes in biological systems. Both CGH and location analysis entail quantifying or measuring changes in copy number of genomic sequences. CGH, is particularly important in developmental biology as well as the causes of cancer and offers great potential in the diagnostics of cancer and developmental diseases. Recently, cDNA microarrays have been used for CGH studies. An oligo-array based approach has several substantial advantages over other technologies, in that it allows the designer to position the probes anywhere within the genomic or polynucleotide sequence of interest. The probes can be placed at whatever density is commensurate with the real-estate or area available on the microarray (in terms of number of features) and the genomic regions of interest can be evaluated by analyzing the hybridization of target sequences to the surface-bound probes. The oligonucleotide probe approach also offers the flexibility of focusing in on regions within exons or introns of expressed sequences, or intergenic regions and regulatory regions for location analysis, as well as any desirable admixture of the aforementioned.
Probes that work well on microarrays for gene expression generally do not work well for CGH arrays and are not appropriate for location analysis arrays. The overall performance of probes for CGH and location analysis arrays entails different optimization of their properties than probes utilized for gene expression. Most notably, these differences relate to the substantially increased complexity of the labeled target mixture for CGH and location analysis than for expression analysis which demands a greater specificity of the probes in discriminating against non-specific binding to competing targets. For comparison, the total number of nucleotide bases in the human transcriptome is approximately 108, while the human genome contains over 3×109 bases. Additionally, probes selected for gene expression come from within message sequences that are transcribed as RNA, i.e. exons, while probes for CGH need be complementary, or nearly so, to contiguous targets selected from within a genome sequence e.g. introns and/or exons.
With increased target complexity comes increased flexibility in the choice of probes. For example, many methods for gene expression restrict probe design to several hundred bases of the 3′-end of the target (message) sequence. Thus, limiting the probe designer to a choice of one in about 500-1000 discrete positions where a probe can be started within any given gene (or transcript). However, for CGH probe design, scientists have a much broader region in which to chose a probe for any given gene. This region may include introns as well as exons and is typically hundreds of thousands of bases long, and in some cases even millions of bases in length.
For location analysis probe design, scientists have a specific region in which to identify and design probes. While the probe designer is constrained to selecting probes within regulatory regions, regions upstream of genes and/or specific locations of interest, the overall number of bases which must be screened is much larger and broader than the region analyzed for gene expression probe design.
Despite great interest in CGH technology, methods for evaluating probes in silico and also empirically for use in this technology are limited. A rigorous method would be to measure signals (e.g. ratios) from each polynucleotide in controlled experiments with test samples containing known copy numbers for each sequence on the array. For example, a method used by several probe designers for measuring array performance for sets of polynucleotides specific for sequences on the X chromosome, is to use a series of cell lines with known variable copies of the X chromosome for CGH experiments. These cell lines (X series) contain intact copies (e.g. 1 to 5) of the X chromosome permitting a rigorous measure of the relationship between copy number and signal intensities for each X chromosome specific polynucleotide on an array.
However, cell lines containing known variable numbers of intact copies of other chromosomes besides for the X chromosome in the genome are not readily available. Furthermore, the aberrant X series cell lines are slow growing and can spontaneously vary in ploidy under standard culturing conditions. Such methods are complex and time-consuming and cannot readily be used to assay the relationship between the hybridization signal of polynucleotides on an array and the genomic copy number of sequences from each chromosome in a cell.
Accordingly, a great need exists for methods for designing and evaluating surface-bound CGH probe nucleic acids (i.e. probes) as well as microarrays comprising these probes which have been identified to have probe properties which make them well suited for CGH and location analysis. This invention meets this, and other, needs.