1. Field of Endeavor
The present invention relates to obtaining information about protein and more particularly to finding 3D similarities in protein structures.
2. State of Technology
U.S. patent application Ser. No. 2002/0150906 by Derek A. Debe, published Oct. 17, 2002, for a method for determining three-dimensional protein structure from primary protein sequence provides the following state of technology information:                “While the sequencing of the human genome is a landmark achievement in genomics, it also creates the next great challenge, namely to create an accurate structural model of each protein coded by the human genome. Since the experimental determination of all of the protein structures coded would require decades, computational methods for determining three-dimensional protein structures are essential if structural genomics is going to rapidly progress . . . .        Proteins are linear polymers of amino acids. Naturally occurring proteins may contain as many as 20 different types of amino acid residues, each of which contains a distinctive side chain. The particular linear sequence of amino acid residues in a protein defines the primary sequence, or primary structure, of the protein. The primary structure of a protein can be determined with relative ease using known methods . . . .        Proteins fold into a three-dimensional structure. The folding is determined by the sequence of amino acids and by the protein's environment. Examination of the three-dimensional structure of numerous natural proteins has revealed a number of recurring patterns. Patterns known as alpha helices, parallel beta sheets, and anti-parallel beta sheets are commonly observed . . . .        The biological properties of a protein depend directly on its three-dimensional (3D) conformation. The 3D conformation determines the activity of enzymes, the capacity and specificity of binding proteins, and the structural attributes of receptor molecules. Because the three-dimensional structure of a protein molecule is so significant, it has long been recognized that a means for easily determining a protein's three-dimensional structure from its known amino acid sequence would be highly desirable. However, it has proven extremely difficult to make such a determination without experimental data . . . .        In the past, the three-dimensional structures of proteins have been determined using a number of different experimental methods. Perhaps the recognized methods of determining protein structure involves the use of the technique of x-ray crystallography . . . .        These experimental techniques all suffer from at least one significant shortcoming. Namely, they are labor intensive and therefore slow and expensive. Modern sequencing techniques are creating rapidly growing databases of primary sequences that need to be translated into three dimensional protein structures. Indeed, with more than 500 genomes including the human genome fully sequenced, three dimensional structures have only been determined for about 2% of these sequences. Every day the ratio of predicted-three dimensional structures to primary sequences is getting smaller . . . .        In order to more rapidly predict three dimensional structures from primary sequences, biochemists are turning to various computational approaches that permit structure determination to be done with computers and software rather than laborious and intricate laboratory techniques. One of the most promising of these computational approaches compares the similarity of a primary sequence for which the three dimensional structure of the sequence is sought, referred to throughout as a query sequence or a query peptide against one or more primary sequences, usually a database of such sequences, referred to throughout as template sequences or template peptides, for which the three dimensional structures are known. This is one aspect of primary sequence homology modeling . . . .        At a high level, many primary sequence homology modeling methods can be characterized in two steps. In the first step, referred to as the alignment step, the query sequence for which the three dimensional structure is sought, is aligned against one or more template sequences, contained in a database. The three dimensional structures for each of the template sequences are known in whole or in substantial part. After each alignment comparison between the query peptide and a template peptide, the method gives a score. After each comparison has been made in the database, the highest scoring alignment pair reflects the optimally aligned query sequence/template sequence(s). The optimal sequence alignment may be used to generate the most accurate structural determinations regarding the query sequence. Still, a query/template alignment producing a sub-optimal score may be used to generate useful structural information regarding the query sequence . . . .        In the second step, referred to as the modeling step, structural information of the query peptide may be predicted based upon structural information corresponding to the sequence or subsequences aligned in the template sequence. The most common of primary sequence homology methods use sequence homologies to predict the three dimensional structure of a query sequence based on the three dimensional structure of aligned template sequences. Still, other primary sequence homology modeling techniques seek to determine primary sequence homology relationships between one or more query sequences based on the primary sequences of aligned template sequences.”        
U.S. patent application Ser. No. 2003/0130797 by Jeffery Skolnick and Andrzej Kolinski, published Jul. 10, 2003, for protein modeling tools provides the following state of technology information:                “To maximize the utility of such nucleotide sequence information, it must be interpreted. Various tools have been developed to assist in this process. For example, algorithms have been developed to analyze what a particular nucleotidesequence encodes, e.g., a regulatory region, an open reading frame (ORF), particularly for protein sequences, or a non-translated RNA, based on homology with known sequences (which are presumed to have similar structures and related functions). See, e.g., “Frames” (Genetics Computer Group, Madison, Wis.; www.gcg.com), which is used for identifying ORFs. For sequences predicted or determined to be ORFs, it is possible to determine the amino acid sequence of the protein encoded thereby using simple analytical tools well known in the art. For example, see “Translate” (Genetics Computer Group, Madison, Wis.; www.gcg.com). However, to date determination of the primary structure of a protein in and of itself provides little, if any, functional information about the protein or its corresponding gene. Thus, the ability to predict the three-dimensional structure of a protein from its amino acid sequence is of great theoretical and practical importance.”        
International Patent Application No. WO 98/48270 by William Goddard et al., for a method of determining three-dimensional protein structure from primary protein sequence, published Oct. 29, 1998 provides the following state of technology information:                “Since the seminal work by C. B. Anfinsen, determining the three-dimensional structure of a protein from its amino acid sequence has been a much sought after goal in structural and computational biology. However, although progress has been made in several fronts such as secondary structure I prediction and homology modeling, a general method for ab initio structure prediction, or in other words, a solution to the so-called “protein folding problem, “has eluded investigators.”        
International Patent Application No. WO 93/01484 by David Eisenberg et al., for a method to identify protein sequences that fold into a known three-dimensional structure, published Jan. 21, 1993, provides the following state of technology information:                “A computer-assisted method for identifying protein sequences that fold into a known three-dimensional structure. The inventive method attacks the inverse protein folding problem by finding target sequences that are most compatible with profiles representing the structural environments of the residues in known three-dimensional protein structures. The method starts with a known three-dimensional protein structure and determines three key features of each residue's environment within the structure: (1) the total area of the residue's side-chain that is buried by other protein atoms, inaccessible to solvent; (2) the fraction of the side-chain area that is covered by polar atoms (O, N) or water, and (3) the local secondary structure. Based on these parameters, each residue position is categorized into an environment class. In this manner, a three-dimensional protein structure is converted into a one-dimensional environment string, which represents the environment class of each residue in the folded protein structure. A 3D structure profile table is then created containing score values that represent the frequency of finding any of the 20 common amino acids structures at each position of the environment string. These frequencies are determined from a database of known protein structures and aligned sequences. The method determines the most favorable alignment of a target protein sequence to the residue positions defined by the environment string, and determines a “best fit” alignment score, Sij for the target sequence. Each target sequence may then be further characterized by a ZScore, which is the number of standard deviations that Sij for the target sequence is above the mean alignment score for other target sequences of similar length.” International Patent Application No. WO 93/01484 is incorporated into this application by reference.        
International Patent Application No. WO 00/11206 by Jeffrey Skolnick et al., for methods and systems for predicting protein function, published Mar. 2, 2000 provides the following state of technology information:                “ . . . methods and systems for predicting the biological function(s) of proteins . . . based on the development of functional site descriptors for discrete protein biological functions. Functional site descriptors are geometric representations of protein functional sites in three-dimensional space, and can also include additional parameters, for example, conformational information. Following their development, one or more functional site descriptors (for one or more different biological functions) are used to probe protein structures to determine if such structures contain the functional sites described by the corresponding functional site descriptors. If so, the protein(s) containing the functional site(s) are predicted to have the corresponding biological function(s) . . . a library of functional site descriptors is used to probe inexact protein structures derived by computational methods from amino acid sequence information to predict the biological function(s) of such sequences and of the gene(s) encoding the same.” International Patent Application No. WO 00/11206 is incorporated into this application by reference.        