The past years have seen a dynamic change in the ability of science to comprehend vast amounts of data. Pioneering technologies such as nucleic acid arrays allow scientists to delve into the world of genetics in far greater detail than ever before. Exploration of genomic DNA has long been a dream of the scientific community. Held within the complex structures of genomic DNA lies the potential to identify, diagnose, or treat diseases like cancer, alzheimers or alcoholism. Answers to the world""s food distribution problems may be held within the exploitation of genomic information from plants and animals.
It is estimated that by the Spring of 2000 a reference sequence of the entire human genome will be sequenced allowing for types of genetic analysis that were never before possible. Novel methods of sample preparation and sample analysis are needed to provide for the fast and cost effective exploration of complex samples of nucleic acids, particularly genomic DNA.
The present invention provides a flexible and scalable method for analyzing complex samples of nucleic acids, such as genomic DNA. These methods are not limited to any particular type of nucleic acid sample: plant, bacterial, animal (including human) total genome DNA, RNA, cDNA and the like may be analyzed using some or all of the methods disclosed in this invention. The word xe2x80x9cDNAxe2x80x9d may be used below as an example of a nucleic acid. It is understood that this term includes all nucleic acids, such as DNA and RNA, unless a use below requires a specific type of nucleic acid. This invention provides a powerful tool for analysis of complex nucleic acid samples. From experimental design to isolation of desired fragments and hybridization to an appropriate array, the invention provides for faster, more efficient and less expensive methods of complex nucleic acid analysis.
The present invention provides for novel methods of sample preparation and analysis comprising managing or reducing, in a reproducible manner, the complexity of a nucleic acid sample. The present invention eliminates the need for multiplex PCR, a time intensive and expensive step in most large scale analysis protocols, and for many of the embodiments the step of complexity reduction may be performed entirely in a single tube. The invention further provides for analysis of the sample by hybridization to an array which may be specifically designed to interrogate fragments for particular characteristics, such as, for example, the presence or absence of a polymorphism. The invention further provides for novel methods of using a computer system to model enzymatic reactions in order to determine experimental conditions and/or to design arrays. In a preferred embodiment the invention discloses novel methods of genome-wide polymorphism discovery and genotyping.
In one embodiment of the invention, the step of complexity management of the nucleic acid sample comprises enzymatically cutting the nucleic sample into fragments, separating the fragments and selecting a particular fragment pool. Optionally, the selected fragments are then ligated to adaptor sequences containing PCR primer templates.
In a preferred embodiment, the step of complexity management is performed entirely in a single tube.
In one embodiment of complexity management, a type IIs endonuclease is used to digest the nucleic acid sample and the fragments are selectively ligated to adaptor sequences and then amplified.
In another embodiment, the method of complexity management utilizes two restriction enzymes with different cutting sites and frequencies and two different adaptor sequences.
In another embodiment of the invention, the step of complexity management comprises performing the Arbitrarily Primed Polymerase Chain Reaction (AP PCR) upon the sample.
In another embodiment of the invention, the step of complexity management comprises removing repeated sequences by denaturing and reannealing the DNA and then removing double stranded duplexes.
In another embodiment of the invention, the step of complexity management comprises hybridizing the DNA sample to a magnetic bead which is bound to an oligonucleotide probe containing a desired sequence. This embodiment may further comprise exposing the hybridized sample to a single strand DNA nuclease to remove the single stranded DNA, ligating an adaptor sequence containing a Class II S restriction enzyme site to the resulting duplexed DNA and digesting the duplex with the appropriate Class II S restriction enzyme to release the magnetic bead. This embodiment may or may not comprise amplification of the isolated DNA sequence. Furthermore, the adaptor sequence may or may not be used as a template for the PCR primer. In this embodiment, the adaptor sequence may or may not contain a SNP identification sequence or tag.
In another embodiment, the method of complexity management comprises exposing the DNA sample to a mismatch binding protein and digesting the sample with a 3xe2x80x2 to 5xe2x80x2 exonuclease and then a single strand DNA nuclease. This embodiment may or may not include the use of a magnetic bead attached to the mismatch binding protein.