1. Field of the Invention
The present invention relates to an automatic vector unit removing method for automatically removing a part of the vector inside a fragment of an object DNA when it is taken out of a proliferated vector in a DNA cloning. The DNA cloning process is performed to proliferate a fragment of the object DNA by chemically bonding a clone, that is, a fragment of a DNA containing a gene to be proliferated to a DNA molecule called a vector, and then proliferating the vector in cells such as Escherichia coli, etc.
2. Description of the Related Art
A nucleic acid is formed by nucleotide composed of a base, pentose, and phosphoric acid. The nucleotide is a compound of a nucleoside and a phosphoric acid. The phosphoric acid forms a polymer through the nucleoside to produce either a deoxyribonucleic acid (DNA) or ribonucleic acid (RNA).
The bases forming part of the nucleic acid can be a purine or a pyrimidine. The purine can be an adenine A or a guanine G while the pyrimidine can be a cytosine C or a thymine T.
The DNA having the composition called a polynucleotide strand is formed by a strand of the above listed four bases, that is, the adenine A, guanine G. cytosine C, and thymine T, bound in a series. For example, if a DNA is extracted from the chromosome in the cell of a human being and is arranged as a sequence, it can be as long as 1 meter and contain 3 billion bases.
Thus, a DNA has a strand of bases, that is, a base sequence linked in the form of a strand. The strand is normally very long. In genetic engineering, a DNA including venous genes is cleaved for gene recombination, and a DNA fragment having specific genetic information is extracted from a number of the cleaved DNA fragments. The extracted DNA fragments, that is, object DNA fragments, should be normally proliferated.
Normally, a unique sequence of object DNA fragments is very small in volume, and the object DNA fragments are combined with a vector to perform the cloning for DNA sequencing.
To attain this, the object DNA fragments are chemically bound to the DNA called a vector which is normally a circular DNA. The combination of the object DNA fragment and the vector DNA, that is, a recombinant DNA, is integrated into an appropriate cell such as a colibacilli (colon bacillus), and the cell is proliferated to produce a large volume of the recombinant DNA. As a result, the cloning process of generating a large volume of the object DNA fragments is successfully performed.
A vector is commonly a DNA having a double helix structure a specific portion of which is cleaved using some restriction enzymes. The object DNA fragment is integrated into the cleaved portion. Before describing the cleaving of the DNA using the restriction enzyme, the structure of the DNA is explained below.
A DNA has the structure of a base sequence, that is, a sequence of bases bound in the form of a strand. Since the DNA strand is directional, ATGCACGAxe2x86x92 is different from ATGCACGA← (which equals AGCACGTAxe2x86x92).
Both ends of the DNA strand are named. Accordingly, the end provided with a hydroxyl group at the position of 3xe2x80x2 of a saccharum is called a 3xe2x80x2 end. The other end, that is, the end provided with a phosphate group at the position of 5xe2x80x2 of a saccharum is called a 5xe2x80x2 end. When the DNA strand is described, the 5xe2x80x2 end is positioned on the left while the 3xe2x80x2 end is positioned on the right.
A DNA normally exists in a double-stranded state as two complementary and anisotropic base sequences. In the two base sequences, the facing bases have a fixed relationship, and the adenine A faces the thymine T, while the guanine G faces the cytosine C. An example of a DNA double-strand is shown as follows:
Strand B is complementary to strand A and is represented as a single strand as follows:
Thus, the DNA represents a genetic meaning with the two complementary base sequences as a pair. The base sequences unique to the restriction enzyme are identified to shear the DNA at the identified points.
FIG. 1 shows how the DNA base sequence is cleaved using the restriction enzyme. In FIG. 1, the restriction enzyme called HpaI shears the DNA at the same positions on the two strands of the DNA whereas the restriction enzymes EcoRI and Hind III shear double strands at different points on the two strands.
As shown in FIG. 1, a number of restriction enzymes can identify nucleotide sequences formed by 6 pairs of bases. The two nucleotide strands in the identification area, that is, the site of the restriction enzyme, are arranged in the opposite directions. Most restriction enzymes indicate different cleaving positions on two strands, thereby forming uneven ends, that is, cohesive ends. The above described object DNA fragments are integrated into the positions where the DNA is cleaved based on the restriction enzymes.
FIG. 2 shows how to mount the object DNA fragments in the vector. In FIG. 2, the circular plasmid DNA molecule is cleaved by the restriction enzymes to obtain linear plasmid DNA molecules having cohesive ends. A plasmid is contained in, for example, bacteria, and can autonomously proliferate unlike the chromosome DNA. The linear plasmid DNA molecule and the object DNA fragments, that is, one of various DNA fragments obtained by cleaving the chromosome DNA using the restriction enzymes, form base pairs. This is referred to as annealing at cohesive ends, thereby forming a circular DNA.
Thus, the cohesive ends generated by the restriction enzymes are required for the recombinant DNA technology. Actually, any DNA fragments can be bound to a plasmid DNA by cleaving the DNA using the restriction enzymes used in generating the object DNA fragments. The linear plasmid DNA molecule is bound to the object DNA fragments through the DNA ligase for repair of the cleaved portion in a single strand of the double stranded DNA, thereby generating a chromosome-DNA-integrated plasmid DNA molecule.
The generated plasmid DNA molecule can be proliferated in bacteria or enzymes. The process is called a DNA cloning technology.
FIG. 3 shows the vector used in the DNA cloning process and the multiple cloning site in the vector. A number of restriction enzyme sites to be cleaved by various restriction enzymes is concentrated in the multiple cloning site.
When an object DNA fragment is taken out of a large amount of the plasmid DNA molecules generated as a result of the DNA cloning process, the nucleotide sequence of the DNA fragment processed in the cloning operation should be correctly determined and the bases in unnecessary portions are deleted to take out a DNA fragment having a correct structure. To determine the nucleotide sequence in the DNA fragment, a DNA sequencer is used to automatically read the DNA base sequence.
To know the sequence of A, G, T, and C in the DNA is to understand the genetic information. The sequence technology for determining the base sequence has advanced with the technologies of other fields, and is closely related to the discovery of the restriction enzymes and nucleic acids, and the development in technology for DNA cloning, nucleic acid chemistry, etc.
Recently, computer technology has been utilized as one of the sequence methods, thereby enabling an enormous volume of data to be input end accumulated. Thus, computers are required in determining the base sequence.
With the DNA sequencer for automatically reading the base sequence of the DNA, the dideoxy method or the Sanger method is used to determine a base sequence. Normally when a part of one of the two complementary DNA strands is used as a primer, which can be a trigger in synthesizing a DNA, the DNA synthesis is stopped when a dideoxynucleotide is integrated, and the DNA fragments with variations in length can be obtained. If the dideoxynucleotide is applied corresponding to each base of G, A, T, and C in the DNA synthetic reaction using a primer, then the DNA fragments with variations in length can be obtained with the growth of strands stopped at the position of each base.
FIG. 4 shows a specific nucleotide, and how to obtain the DNA fragments by cleaving the DNA at the adenine A. In this case, a moderate chemical process of removing a piece of nucleotide, that is, the adenine A, from a DNA strand is performed. Only the left fragments provided with a phosphate group at the 5xe2x80x2 end are radioactive. If these fragments are processed in a gel electrophoresis, the radioactive fragments are detected by the length of the fragment, that is, at the position corresponding to the molecular weight.
With the DNA sequencer, the DNA fragment is fluorescent-identified as a reactive product in the dideoxy method. As a result, the fluorescent-identified DNA fragments having strands with variations in length are separated through the gel electrophoresis. A fluorescent color element is excited and luminous at a point on the gel by irradiation of a laser light on the DNA fragment in the gel electrophoresis. The fluorescent light is detected by a light detector. By detecting the fluorescent light with the time of the electrophoresis, the data of the electrophoresis pattern of the DNA fragments corresponding to each base of the G, A, T, and C can be obtained. The obtained data is analyzed by the computer and converted into base sequence data.
Usually, the output data of the DNA sequencer includes a DNA base sequence itself and the waveform data used in determining the sequence. The waveform data corresponds to the data of the gel electrophoresis pattern. In each waveform of the G, A, T, and C, the position of the peak of the fluorescence intensity of the waveform corresponds to the position of the base.
However, the number of bases in the DNA base sequence is normally large as described above. Therefore, the DNA sequencer cannot simultaneously determine all base sequences. Therefore, an object DNA whose sequence is to be determined is divided into a plurality of fragments. Then, the base sequence of each fragment is determined and they are bound to each other, thereby determining the entire base sequence.
When an object DNA fragment is generated by the above described method, the sequence result as an output of the sequencer contains a part of the base sequence of the vector used in the cloning process in addition to the object DNA fragment. The prior art technology has the problem that it is very important to delete the part of the base.
The vector unit is a part of the bases in the vector, and it is probably contained in the 5xe2x80x2 end portion and 3xe2x80x2 end portion obtained as sequence results. To generate a correct object DNA fragment, the vector unit should be completely removed. Conventionally, the vector unit has been removed through a homology search, which is a retrieval method for outputting a retrieval result using the base sequence of the vector unit possibly positioned before or after the object DNA fragment base sequence, even if all bases do not completely match. However, this method has the problem that the vector unit cannot be successfully detected because the base sequence of the vector unit may be short, or a mis-sequencing operation, etc. at the 3xe2x80x2 end badly affects the vector-unit detection.
The present invention aims at providing a method and device for retrieving the vector unit mixed in the DNA sequence result and automatically deleting the vector unit from the retrieval result.
A vector unit base sequence removing method according to the invention is used for removing a vector unit base sequence from a DNA base sequence which is obtained as a result of performing a cloning process by integrating an object DNA fragment into a vector, and includes the vector unit base sequence as a part of a base sequence of the vector and the object DNA fragment. The method includes the steps of: generating a retrieval base sequence as a retrieval key for use in retrieving the vector unit base sequence from the DNA base sequence based on the vector, a restriction enzyme used to cleave the vector for cloning the cloning process, and a restriction enzyme used to obtain the object DNA fragment; specifying the vector unit base sequence using the retrieval key; and removing the specified vector unit base sequence to specify the object DNA fragment.
The DNA base sequence may be obtained as an output from a sequencer for determining the DNA base sequence.
The retrieval key may include a forward (leading) retrieval key and a backward (following) retrieval key for respectively identifying areas before and after the object DNA fragment in the DNA base sequence. The forward and backward retrieval keys may indicate the base sequences corresponding to restriction enzyme sites including parts of the vector cleaved by a restriction enzyme for the cloning process and ends of the object DNA fragment.
Base sequences of the forward and backward retrieval keys may be generated by base sequence data of the vector entered in a vector data base, data of a multiple cloning site in the vector, and data of a restriction enzyme site in the multiple cloning site.
The method according to the present invention may further include the steps of: performing homology retrieval on condition that a similarity value indicating a matching rate between the retrieval base sequence and the DNA base sequence is equal to or larger than a predetermined value in retrieval using the retrieval key for the DNA base sequence; and obtaining a candidate for a base sequence at a junction between the vector in the DNA base sequence and the object DNA fragment according to a result of the homology retrieval.
The method according to the present invention may further include the steps of: generating a second forward retrieval key by adding to the forward retrieval key a portion that should be existing before the multiple cloning site of the vector; performing a second homology retrieval on condition that a second similarity value indicating a matching rate between a base sequence corresponding to the second forward retrieval key and a base sequence including a base sequence at a junction of the DNA base sequence is equal to or larger then a predetermined value; end obtaining as a vector unit candidate for the vector unit base sequence an area specified as a result of the second homology retrieval and an area or areas before the specified area.
The method according to the present invention may further include the stops of: generating a second backward retrieval key by adding to the backward retrieval key a portion that should be existing after the multiple cloning site of the vector; performing a second homology retrieval on condition that a second similarity value indicating a matching rate between a base sequence corresponding to the second backward retrieval key and a base sequence containing the base sequence at the junction of the DNA base sequence is equal to or larger than a predetermined value; end obtaining as a vector unit candidate for the vector unit base sequence an area specified as a result of the second homology retrieval and an area or areas after the specified area.
The vector unit candidate may be removed from the DNA base sequence when the number of the area specified by the second homology retrieval is one.
The method according to the present invention may further include the steps of: generating a second forward retrieval key by adding to the forward retrieval key a portion that should be existing before the multiple cloning site of the vector; generating a second backward retrieval key by adding to the backward retrieval key a portion that should be existing after the multiple cloning site of the vector; performing a second homology retrieval on condition that a second similarity value indicating a matching rate between a base sequence corresponding to the second forward retrieval key and a base sequence including a base sequence at a junction of the DNA base sequence is equal to or larger than a predetermined value, and a third similarity value indicating a matching rate between a base sequence corresponding to the second backward retrieval key and a base sequence including the base sequence, at a junction of the DNA base sequence is equal to or larger than a predetermined value; obtaining as a forward vector unit candidate for the vector unit base sequence a forward area specified as a result of the second homology retrieval and an area before the forward area; and obtaining as a backward vector unit candidate for the vector unit base sequence a backward area specified as a result of the second homology retrieval and an area after the backward area.
The forward vector unit candidate and the backward vector unit candidate may be removed from the DNA base sequence when there is only one candidate respectively for the specified forward and backward vector units, and the specified forward and backward vector units do not overlap each other.
A vector unit base sequence removing device according to the invention is for removing a vector unit base sequence from a DNA base sequence which is obtained as a result of performing a cloning process by integrating an object DNA fragment into a vector and includes the vector unit base sequence as a part of a base sequence of the vector end the object DNA fragment. The device includes: a first unit for generating a base sequence as a retrieval key for use in retrieving the vector unit base sequence from the DNA base sequence based on the vector, a first restriction enzyme used to cleave the vector for the cloning process, and a second restriction enzyme used to obtain the object DNA fragment; a second unit for specifying the vector unit base sequence using the retrieval key; and a third unit for removing the specified vector unit base sequence to specify the object DNA fragment.
The device according to the present invention may further include: a vector list storage unit for storing a vector list; and a restriction enzyme list storage unit for storing a restriction enzyme list. The vector is specified in the vector list, and the first and second restriction enzymes are specified in the restriction enzyme list.
The device according to the present invention may further include a display unit. The vector may be specified in the vector list displayed on the display unit, and at least one of the first and second restriction enzymes may be specified in the restriction enzyme list displayed on the display unit.
The device according to the present invention may further include a program storage unit for storing at least one of: a program for generating the retrieval key by controlling the first unit; a program for specifying the vector unit base sequence by controlling the second unit; and a program for removing the vector unit base sequence by controlling the third unit.
The second unit may specify, using the retrieval key, a junction between the vector unit base sequence and the object DNA fragment, and the third unit may specify the object DNA fragment by removing the junction and a portion outside the junction from the DNA base sequence.
The second unit may specify as the junction a portion in the DNA sequence in which a number of bases matching a base sequence of the retrieval key is equal to or larger than a predetermined value.
The second unit may specify using the retrieval key a first junction and a second junction between the vector unit base sequence and the object DNA fragment, and the third unit may specify the object DNA fragment by removing from the DNA base sequence the first junction and a portion outside the first junction and the second junction and a portion outside the second junction.
The retrieval key may include a base sequence corresponding to an end portion of the object DNA fragment and a base sequence corresponding to an end portion of the vector unit base sequence, and may specify a candidate for a junction between the vector unit base sequence and the object DNA fragment.
A second retrieval key indicating a base sequence longer than the retrieval key may be generated, and the junction may be specified among the candidates for the junction using the second retrieval key.
The object DNA fragment may be specified by removing the junction and a portion outside the junction from the DNA base sequence.
A storage medium according to the invention is for embodying a program for performing, by a computer, a function of removing a vector unit base sequence from a DNA base sequence which is obtained as a result of performing a cloning process by integrating an object DNA fragment into a vector and includes the vector unit base sequence as a part of a base sequence of the vector and the object DNA fragment. The program realizes the steps of: generating a retrieval base sequence as a retrieval key for use in retrieving the vector unit base sequence from the DNA base sequence based on the vector, a restriction enzyme used to cleave the vector for the cloning process, and a restriction enzyme used to obtain the object DNA fragment; specifying the vector unit base sequence using the retrieval key; and removing the specified vector unit base sequence to specify the object DNA fragment.
The above described methods and the methods explained in the following embodiment may be realized using computer programs. The methods according to the present invention may be realized using storage media such as diskettes, CDROM, hard disks, mini-disks, RAM, etc.