DNA sequencing, i.e. determining the sequence of nucleotides in a gene or in a segment of DNA, commonly involves several sequential steps aimed at:
Isolating genetic material from biological material. PA1 Amplifying the gene of interest using polymerase chain reaction (PCR) so that sufficient material of the gene of interest is available for sequence analysis. PA1 Performing sequencing reactions using the principles of Sanger. This step enzymatically generates a large number of differently elongated complementary copies of the gene. By introduction of base specific elongation terminators, each elongated copy of the gene will terminate with a specific type of nucleotide. Each reaction corresponds to one specific type of nucleotide, Adenine (A), Thymine (T), Guanine (G) or Cytosine (C), i.e. only one type of elongation terminator, will be incorporated in each reaction. To enable detection of these elongated gene copies, a fluorescently labelled molecule is introduced in the gene copy during the enzymatic reaction. Thus, all elongated gene copies will be fluorescently labelled to facilitate their detection. PA1 Separating the mixture of differently elongated complementary copies of the gene according to their molecular size using gel electrophoresis. PA1 Sequentially detecting the differently elongated complementary copies of the gene, separated according to molecular size during gel electrophoresis, using a DNA sequencer system, e.g. the DNA sequencer marketed under the trademark ALF by Pharmacia Biotech AB, Uppsala, Sweden. During the electrophoresis, any fluorescent molecules passing a perpendicularly oriented laser beam, will be activated and the fluorescence from each molecule will be detected by light sensitive detectors, each representing one type of nucleotide in the sample. PA1 Determining the nucleotide sequence by superimposing the signals from four detectors representing the four different nucleotides of the sample gene.
An example of such signals obtained from four such detectors is shown in FIG. 1 on the appended drawing. As apparent, the diagram in FIG. 1 is divided into a sequence region, containing sequence data, and a run-off region, containing the so called "run-off peak" which is described more in detail below.
Sequence data obtained by processing and sequencing DNA samples from tumour tissue in an automatic sequencer, can be used to detect inherited or induced mutations in genes related to the occurrence or progression of the tumour. When mutated, the sample sequence obtained from the sequencer will often consist of a mixture of two superimposed sequence components, namely the wild type component and a mutated component. This could be due either to a mixture of two cell populations in the sample or to a mutation in one of the two copies of the gene, if both are present in the sample. In cases where the mutated sequence component is the predominant component, insertion and deletion mutations as well as point mutations can be readily detected by aligning the sequence data obtained from the sample with the expected wild type sequence using standard alignment algorithms (see e.g. S. Needleman and C. Wunsch, J. Mol. Biol. 48, 444 (1970)), and W. R. Pearson and W. Miller, Methods in Enzymology, 210, 575 (1992)). Often, however, the mutated sequence material is mixed up with an equally large amount of non-mutated material. In some cases, the non-mutated material will even be predominant. In these cases ordinary alignment algorithms fail to resolve the mutation.