Inherited and acquired genetic disorders account for a large percentage of today's health care costs. Early diagnosis of such diseases is not only important for successful treatment, but also contributes to lower overall costs to the public. Lower costs result, for example, because many diseases can be averted or even treated before patients present chronic symptoms, which require expensive procedures and/or hospitalization. A "genetic disorder" refers to any disease that has as an underlying basis an abnormality in one or more genes of an afflicted individual. Detecting a genetic disorder can require extensive information processing. A brief background is provided here describing basic molecular biology as it relates to sources of data significant in diagnosing genetic disorders.
It is well established that a long, polymeric molecule known as DNA (deoxyribonucleic acid) is the genetic material that stores and encodes biological information, and through which inherited traits are passed from one generation to the next. An individual whose DNA contains no irregularities in a particular gene responsible for a given trait may be said to possess a wildtype genetic complement (genotype) for that gene, while the presence of irregularities known as mutations in the DNA for the gene indicates a mutated or mutant genotype.
DNA is a linear polymer formed from four types of subunits called nucleotides, each consisting of a nitrogen-containing carbon ring structure (the nucleotide "base") linked to a 5-carbon sugar attached to a phosphate group (see, e.g., Watson, Hopkins, Roberts, Steitz & Weiner, The Molecular Biology of the Gene, 4th ed., 1987, Benjamin/Cummings Publishing, Menlo Park, Calif.). The four subunits, which differ in the particular base structure present, are adenine, guanine, cytosine and thymine (abbreviated, respectively, as A, G, C and T). A segment or region of a DNA molecule may encode a particular protein by specifying the production of a ribonucleic acid (RNA) intermediate molecule known as a messenger RNA (mRNA), which directs the synthesis of the particular protein. Other regions of a DNA molecule may encode other types of functionally important RNA, or may operate by a variety of mechanisms that do not involve the encoding of RNA.
A DNA molecule generally occurs as a double helix formed by one pair of linear strands oriented so that the bases in each strand structurally align and form hydrogen bonds in "complementary" fashion, such that A pairs with T, and G pairs with C. This reversible assembly of double stranded DNA is known as base-pairing. Thus, the sequence of nucleotide bases on one strand dictates what the sequence of bases on the complementary strand must be, in order for the double stranded or "duplex" DNA helix to form. The double helix is a dynamic structure, and transient disruption of complementary base pairing accompanies processes such as expression of genes to produce proteins.
Proteins, which are biological molecules of enormous importance, include major structural materials of the animal body, enzymes that catalyze numerous biochemical reactions, many hormones, and other biologically significant structural and functional components. Proteins are assembled from one or more large, linear polymeric chains of covalently linked building blocks known as amino acids. The chemical bonds linking amino acids together are called peptide bonds, hence proteins may also be referred to as peptides or polypeptides. There are 20 amino acids that may be strung together in assorted lengths and orders to generate the hundreds of thousands of naturally occurring proteins having distinct amino acid sequences.
DNA molecules do not directly participate in synthesizing proteins. Instead, DNA acts as a permanent blueprint of genetic information within a cell. DNA exists as extraordinarily long strands in the chromosomes present in the cell nucleus, and is also present in the form of a much smaller, circular DNA molecule in subcellular structures responsible for cellular energy production, known as mitochondria. The region of DNA that codes for a sequence of a single polypeptide is called a gene, which must first be copied into an RNA molecule before the polypeptide (i.e., a protein) can be produced.
The structure of RNA closely resembles that of DNA, except the 5-carbon sugar (ribose) replaces the deoxyribose of DNA, and instead of the base thymine, RNA contains the structurally related nucleotide base uracil (U). Synthesizing an RNA copy of a portion of DNA is called transcription, wherein a region of double stranded DNA transiently separates into single strands, and RNA nucleotides are incorporated, according to nucleotide base-pairing in a manner complementary to the DNA template. This mechanism permits assembly of RNA nucleotides into a linear polymer in a sequence governed by the DNA template coding sequence. The resulting mRNA is translated into protein by cellular machinery that reads the mRNA as a series of three-nucleotide "words" known as codons. Each codon specifies the addition of a particular amino acid to a protein (polypeptide) chain, such that the linear order of codons in the mRNA predicts the linear order of amino acids in the protein. Additionally, certain codons known as stop codons do not specify any amino acid, but instead provide a signal to terminate the synthesis of a polypeptide chain.
Occasionally, a natural or non-natural influence may chemically change a specific location in a DNA sequence in a process called mutation, leading to a change in the nucleotide base situated at a particular position in the DNA molecule. This mutation, which may be manifested as a deletion (removal of one or more nucleotides from the DNA chain), an insertion (addition of one or more nucleotides to the DNA chain) or a substitution (switch from one base to another with no net change in chain length, e.g., a C is replaced with a T), will lead to a corresponding change in the RNA transcribed from the DNA. As such, a mutation may result in codon changes that cause a significantly altered protein amino acid sequence to be produced. Thus, it is apparent that heritable alterations in a DNA nucleotide sequence (i.e., a gene) can result in dramatically abnormal gene products (e.g., proteins). Similarly, mutations in DNA regions that do not encode mRNA (for translation into proteins), but that do encode other functional RNAs, or in DNA regions that function without being transcribed into RNA, are nevertheless passed on to progeny. In any case, detecting a genetic disorder at the level of a DNA mutation may be highly advantageous.
A wide variety of methods in molecular biology are known for detecting mutations in nucleic acids such as DNA or RNA. These include, for example, restriction fragment length polymorphism analysis, direct nucleic acid sequencing, polymerase chain reaction, oligonucleotide primer extension assay, ligase chain reaction, and other techniques based on detecting a signal derived from specific base-pairing between an oligonucleotide probe and a complementary region of a nucleic acid target. Such specific base-pairing or duplex formation is known as hybridization, and those familiar with the art are readily able to devise suitable hybridization conditions that permit detecting even a single nucleotide mismatch (e.g., a mutation) between a probe and its target (see, e.g., Ausubel et al., Current Protocols in Molecular Biology, Greene Publishing, 1987; Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Press, 1989).
Recent technological advances in molecular biology and related arts permit rapid and sensitive detection of nucleic acid sequence information (including point mutations) in increasingly large numbers of samples, where smaller quantities of samples are required. These advances provide high throughput screening formats capable of generating large volumes of data. Such increases in the ability to generate data, however, have not been matched by suitable methods for processing and analyzing such data.
For example, most nucleic acid detection methods, including methods for detecting point mutations, rely on quantifying a reporter signal generated directly by a labeled molecule incorporated into the assay components, such as a fluorophore, a chromophore, a radionuclide or a detectable enzyme, or indirectly by quantification of an image generated by such reporter signals using image analysis routines (see, e.g., www.dnapro.com). Further, quantitatively analyzing such reporter signals typically requires labor-intensive and often tedious data manipulation, often involving serial routines whereby organizing and calculating various data sets must be performed manually. Given the dramatic increase in the ability to generate genetic data, there is a need for improved devices and methods capable of flexibly extracting, calibrating, formatting and manipulating raw data derived from multiparameter analysis of nucleic acid samples for detecting mutations.