A nucleic acid sequence is usually determined with a combination of conventional electrophoresis and chemical methods to label and identify individual nucleotides. Such methods, like the Maxam-Gilbert method or the Sanger method, are used to determine the order of nucleotides in a nucleic acid sequence.
The determined nucleic acid sequences may be studied for various reasons. One important area is to analyze the nucleic acid sequences with respect to the possible presence of mutations.
Mutation analysis has many applications. A typical case is to analyze a sample extracted from a group of cells from a tumor in order to identify mutations that indicate the presence of cancerous growth. Other cases include investigations to determine the presence of mutations inherited from the male and/or female parent.
A method to determine mutations includes the steps of:
i) electrophoresis separation of a prepared sample and monitoring, for example, the fluorescence activity of certain labeled components added to the sample and converting these fluorescence activities to electrical signals; PA0 ii) identification of signals as representing nucleotide sequences, for example by using specific software; PA0 iii) alignment of the sample nucleotide sequence with respect to a reference sequence wherein the nucleotide sequence is known and wherein further each nucleotide is associated with a position number, in order to assign proper position numbers to the nucleotides of the sample sequence; PA0 iv) identification of sequence positions where deviations between the sample and the reference sequence occur, and, where said deviations indicate potential mutations; PA0 v) a close manual examination of raw data for all identified potential mutations and a subsequent classification of the positions investigated as "mutations" or "non mutations".
Step i) above may be performed manually with relatively simple equipment or by highly automated instruments, such as the Pharmacia Biotech ALFexpress equipment (Pharmacia Biotech, Sweden).
The evaluation according to step ii) is also known as "base calling". It is done manually or preferably by computerized algorithms, often included in an automated equipment used in step i). Such algorithms typically have certain features in the signals, such as local minimum or maximum intensities, as input and then provide output in the form of nucleotide sequences, such as "CCTGAAGCTC", (as shown in SEQ ID NO:1) where the letters A, C, G, and T designates the purine base adenine, the pyrimidine base cytosine, the purine base guanine, and pyrimidine base thymine, respectively.
The output is normally presented as printouts or binary files.
However, the raw data signals from the nucleic acid sequencing equipment contain disturbances, for example originating from fluctuations in the properties of the separation media used, e.g. an electrophoresis gel, or anomalies originating from the previous steps of preparing the sample. Such disturbances may cause the algorithms to interpret the signals in a wrong way, and consequently indicate false mutations or hide a mutation by incorrectly indicating the expected nucleotide.
Methods for reducing such incorrect interpretations have been suggested. For example, Tibbetts et al have in U.S. Pat. Nos. 5,365,455 and 5,502,773 disclosed the use of neural networks for automatic nucleic acid sequence determination to significantly reduce the misinterpretation rates, wherein a neural network is fed with information from the neighboring nucleotides in order to achieve a very high base calling accuracy.
Steps iii) and iv) above refer to a simple comparison and correlation between the sample sequence and the reference sequence. As is well known in the art, this step is well suited for automation.
Step v) is performed manually by a specially trained operator, since reliable automated methods, hitherto, have not been present.
The conventional manual procedure to classify the deviating nucleotide position as true mutation or false indication, according to step v) above, presents problems.
The graphs obtained when measuring the fluorescence signals suffer from normal variations due to disturbances in the raw signals, as described above. This and other factors, such as coexistence of both mutated and non mutated polynucleotides within a sample, tend to make the raw data ambiguous and consequently the interpretation becomes difficult.
The interpretation will therefore depend on the skill and experience of the examiner, which means that the decision between "mutation" or "non mutation" may differ between different examiners.
Furthermore, the manual examination is a time consuming and tedious task. There is therefore a considerable risk that a tired examiner may misinterpret the data.