Recently, DNA sequences of various species have been determined rapidly, and “structural genomics” is recognized as an important research area. With respect to a large number of genes selected from a mass of information about genomic sequences, structural genomics aims the systematic determination of three dimensional structures of proteins coded on each gene, as well as the comprehensive study of structure function relationships.
In the research of structural genomics, many types of proteins ranging from 30,000 to more than 40,000 in case of human's proteins, can be targets of the structural analysis. However, it is thought that the three dimensional structures of proteins encoded on the human genome consists of one or several thousand folds or domain units, and that the combination of these protein folds and domains represents the variation of protein functions (Chothia, C., et al, J.Mol.bio., 227, 799-817. (1992); Brenner, S. E., Chothia, C., Hubbard, T. J., Curr.Opin.Struct.Biol., 7, 369-376. (1997)).
In the existing methods of protein synthesis, genetic engineering methods in which full length genes cloned from cDNA or a genomic library are introduced into living cells such as E. coli have been widely used, however, it is difficult to obtain proteins which are toxic to host cells and/or degrade easily because of instability. It is also difficult to obtain proteins which aggregate easily in host cells as soluble proteins.
In eukaryotes such as human cells, most proteins are multiple complexes of relatively small functional domains, as the result of evolution by gene duplications (Orego, C. A., et al., Nucleic Acids Res., 27, 275-279. (1999)). Particularly, membrane-bound proteins and the like have partial sequences that are rich in hydrophobic amino acids which bind to cell membranes. In the case where these proteins are expressed in an intact form in heterogeneous cells, they are likely to be insoluble, and it is difficult to maintain their intact three dimensional structures and functions in vivo.
Then, experiments have been performed to prevent formation of incorrect three dimensional structures and generation of insoluble aggregates when heterogenes are over-expressed in microorganisms such as recombinant E. coli. These experiments have been focused on expressing proteins from heterogenes or from heterogenes fused with genes expressing soluble proteins. These proteins may be expressed in presence of a chaperone protein, which promotes the formation of three dimensional structures, at a low temperature or under specific medium conditions. However, in the case of particular genes, the soluble protein products cannot be obtained by any of these methods.
To solve this problem, it has been reported that green fluorescent protein (GFP) works as a “folding reporter”, when the protein of interest is fused to the N-terminal of GFP (Waldo, G. S., Standish, B. M., Berendzen, J., Terwilliger, T. C., Nature Biotechnology 17, 691-695. (1999)). In this research, the consequent formation of the GFP chromophore is directly related to the proper folding of the fused upstream protein, and through GFP fluorescence, the protein folding of the fused protein could be identified. According to this report, the functional formations of three dimensional structures of the proteins ligated to the upper region of GFP can be predicted only by measurements of fluorescence strengths of the recombinant E. coli, without any measurement of functions themselves of the proteins ligated to the terminus of GFP. Using the results as indices, mutants with rates of folding are higher than that of the wild type can be made, and mechanisms of formation of the three dimensional structure can be studied.
In general, these specialized approaches geared toward evaluating individual proteins have not been sufficient as methods for high-throughput analyses of three dimensional structures of many proteins, as well as for systematic understandings of structures and functions of the proteins.