Human cells are estimated to express 100,000 kinds of genes and produce the corresponding proteins encoded thereby. Recent progress in molecular biology has revealed that every human protein plays an important role in maintaining our lives, and that many diseases result from mutation in the amino acid sequence of the protein or abnormal expression of proteins in the cell. Therefore, acquisition of whole human genes and elucidation of the structure of proteins encoded thereby would lead to elucidate the cause of many diseases. These genes and proteins are expected to be useful for diagnosis and as therapeutics of the diseases.
Conventional study on human proteins starts with isolating and purifying a protein showing a target activity. The purified protein is used to prepare probes such as oligonucleotide or antibody, by which a cDNA encoding the target protein is screened from a human cDNA library. Purification of the protein and cloning of its cDNA, however, require much time and laborious works. In fact, it is usual to take several years for cloning a cDNA encoding one target protein.
So far about 2,000 kinds of human genes have been isolated and used for investigating their relationships with diseases or their application to medical use. These genes can be used directly as a probe per se in diagnosis or for expressing encoded proteins that can be used for preparing antibodies useful as diagnostic probes. For this purpose, it is desirable to prepare genes as many as possible and to use them as probes. However, human genes elucidated so far are less than several percent of whole human genes. Since each of the genes is kept by individual researchers, it is difficult to use them together as probes.
Recently, it has been reported that cDNA clones were selected from the cDNA library prepared from human brain, partially sequenced, and used for genome mapping [Adams et al., Science 252:1651-1656, 1992; Adams et al., Nature 355:632-634, 1992]. Since the cDNA library used was prepared by a random primer method, each clone contained only a fragment of cDNA. Thus, it is impossible to judge which part of mRNA the cDNA originated from, and even whether the cDNA encodes a protein.
In fact, the functions of proteins encoded by most of the reported cDNAs are unclear. Even if the cDNA possesses a part of coding region, it requires complicated steps such as screening a clone containing an intact coding region from a library and subcloning the coding region into an expression vector to produce the protein. Their cDNAs have another problem that some of the obtained sequences are not derived from a single species of mRNA, because it has been pointed out that they contain some artifact (Burglin et al., Nature 357:367, 1992).