Progress in the biological sciences has enabled the analysis of the chromosomal structure at the molecular level. This analysis has determined that DNA (deoxyribonucleic acid) molecules in chromosomes are continuous, singular, thread-like strands. Four types of nucleotides, i.e., adenine (a), guanine (g), cytosine (c), and thymine (t), are present in DNA, and the DNA molecules in chromosomes are composed of combinations of these nucleotides, together with sugars and phosphoric acids that are bound thereto. The DNA molecules in chromosomes encode a variety of information by virtue of differences in nucleotide sequences.
Through the analysis of DNA, it has been determined that portions of the DNA material are structural units that carry genetic information and determine genotype, e.g., genes. The genes determine primary structure, such as proteins, tRNA, and RRNA, and are specifically defined as structural genes. The gene-based mechanism of protein synthesis will be explained with reference to FIG. 18.
As shown in FIG. 18, a gene is divided into the three regions, a promotor region, a transcription region (the structural gene region containing information relating to the amino acid sequence of the protein) and a termination region. The promotor region serves to control the start of transcription, the transcription region is the region that is actually transcribed, and the termination region controls the termination of transcription. The site where transcription begins is called the transcription start site.
Protein synthesis from genes, which are structured as described above, occurs according to the following steps. First, RNA polymerase (an enzyme that transcribes a DNA sequence) binds to a DNA sequence slightly ahead of the promotor region and then moves toward the promotor region. Once this RNA polymerase has passed the promotor region and moved to the transcription-region side of the gene, it generates messenger RNA that corresponds to the DNA sequence in the transcription region. Further, when the RNA polymerase reaches the termination region, the transcription of the DNA sequence ceases. The messenger RNA subsequently moves from the cell nucleus to the cytoplasm and binds with ribosomes. The ribosomes synthesize proteins based upon the messenger RNA. The DNA sequence of a gene is transcribed in this manner, and a specific protein is synthesized based upon this transcription.
The rate of such protein synthesis depends upon the rate at which RNA polymerase transcribes messenger RNA, and the promotor sequence of the gene controls the rate at which messenger RNA is transcribed.
Therefore, it is theoretically possible to create promotors that synthesize proteins at a high rate and promotors that synthesize proteins at a low rate by manipulating the DNA sequence of the promotor region.
In recent years, there have been many attempts to actively control the protein synthesis rate by artificially altering the DNA sequence of promotor regions. By controlling the rate of protein synthesis, it would be possible to create, e.g., promoters having a high transcription activity for use in artificially expressing specific proteins in large quantities. As a result, gene therapy would be enabled at the molecular level in which, e.g., cancer cells are infected with an adequate amount of a virus (in gene therapy, viruses of reduced toxicity are used as virus vectors to transport normal genes into cells) and proteins are then specifically expressed from these normal genes (integrated with powerful artificial promoters) in amounts that are adequate to suppress the cancer.
However, at present, a method for designing artificial promotors having high transcription activity, i.e., artificial promotors containing a DNA sequence that enable the above-noted characteristics, has not yet been established. Currently, randomly sequenced nucleotides (promotor candidates) are placed ahead of structural genes (test genes), and the efficacy of these promotor candidates is evaluated by determining whether or not the test gene is expressed.
According to this method, it is extremely difficult to identify nucleotide sequences suitable as potentially highly effective artificial promoters from a virtually infinite number of nucleotide sequences (4th power exponential) because the artificial promotor candidates used in the associated experimentation have randomly determined nucleotide sequences.
Therefore, it is an object of the present invention to provide artificial promotor candidate selection methods capable of reducing the number of nucleotide sequences that are actually tested by selecting, in advance of testing, nucleotide sequences that are highly likely to function as potentially effective promotors from among a set of nucleotide sequences under consideration.