1. The Field of the Invention
This invention relates to the rapid identification of protein molecules by the systematic development for each respective type of protein molecule of a set of particular, invariant, readily-detectable distinguishing characteristics, which set of characteristics will for convenience hereinafter be referred to as a fingerprint for the corresponding type of protein molecule. The invention also relates to libraries of different protein molecules and the corresponding fingerprints therefor, as well as to systems used in the identification, or fingerprinting, of protein molecules. The present invention has particular applicability to the identification of protein molecules obtained from biological samples.
2. Background Art
There are approximately 100,000 different types of protein molecules involved in organic processes. Each protein molecule is, however, comprised of various amino acid building blocks from a group of about twenty different amino acids. Amino acids chemically connect end-to-end to form a chain that is referred to as a peptide. The amino acid building blocks in a peptide chain share as a group various of the peripheral atomic constituents of each amino acid. As a result, an amino acid in a peptide chain is not in situ a complete amino acid. Therefore, an amino acid in a peptide chain is referred to as an xe2x80x9camino acid residue.xe2x80x9d A peptide chain becomes a true protein molecule only when the constituent amino acid residues have been connected, when certain amino acid residues of the peptide chain have been modified by the addition to or removal of certain types of molecules from the functional chemical groups of these amino acid residues, and when the completed chain of amino acid residues assumes a particular three-dimensional structure determined by the sequence of amino acid residues and chemical modifications thereof.
Protein molecules do not naturally maintain a one-dimensional, linear arrangement. The sequence of the amino acid residues in a protein molecule causes the molecule to assume an often complex, but characteristic three-dimensional shape. A protein molecule that has been forced out of this three-dimensional shape into a one-dimensional, linear arrangement is described as having been xe2x80x9clinearized.xe2x80x9d
Protein molecules are involved in virtually every biological process. Aberrant or mutant forms of protein molecules disrupt normal biological processes, thereby causing many types of diseases, including some cancers and inherited disorders, such as cystic fibrosis and hemophilia. The ability of a protein molecule to perform its intended function depends, in part, upon the sequence of amino acid residues of the protein molecule, modifications to particular amino acid residues of the protein molecule, and the three-dimensional structure of the protein molecule.
Alterations to the sequence of amino acid residues, to the modifications of particular amino acid residues, or to the three-dimensional structure of a protein molecule can change the way in which a protein molecule participates in biological processes. While many protein molecules and the functions thereof in biological processes are known, scientists continue the arduous task of isolating protein molecules, identifying the chemical composition and structure of each isolated protein molecule, and determining the functions of the protein molecule, as well as the consequences of changes in the structures of the protein molecule.
The sequence of the amino acid residues in a protein molecule, which imparts to the protein molecule a unique identity with a set of unique characteristics, is difficult to detect rapidly and reliably.
The identification of a protein molecule typically involves two steps: (1) purifying the protein molecule; and (2) characterizing the protein molecule.
In isolating or purifying protein molecules, a targeted protein molecule is separated from other, different types of protein molecules. Some current purification techniques are sensitive enough to purify an aberrant form of a protein molecule from normal protein molecules of the same type. Different purification techniques are based on the different characteristics of protein molecules, such as the weight of a protein molecule, the solubility of a protein molecule in water and other solvents, the reactivity of a protein molecule with various reagents, and the pH value at which the protein molecule is electrically neutral. The last is referred to as the isoelectric point of the protein molecule. Due to the large number of different types of protein molecules and because some types of protein molecules have very similar characteristics to other types of protein molecules, extremely sensitive purification processes are often required to isolate one type of protein molecule from others. The sensitivity with which similar types of protein molecules are separated from each other can be enhanced by combining different types of these purification techniques.
In some characterization processes, individual protein molecules are studied. When characterization processes that permit one to study individual protein molecules are employed, a single protein molecule in a sample can be separated or isolated from the other protein molecules in the sample by diluting the sample.
Since many purification techniques separate different types of protein molecules on the bases of the physical or chemical characteristics of the different types of protein molecules, these purification techniques may themselves reveal some information about the identity of a particular type of protein molecule. Once a particular type of protein molecule has been purified, it may be necessary to further characterize the purified protein molecule in order to identify the purified protein molecule. This is particularly true when attempting to characterize previously unidentified types of protein molecules, such as aberrant or mutant forms of a protein molecule.
Typically, protein molecules are further characterized by employing techniques that determine the weight of the protein molecule with increased sensitivity over techniques like gel electrophoresis, or by determining the sequence of amino acid residues that make up the protein molecule. One technique that is useful for performing both of these tasks is mass spectrometry.
In order to characterize a type of protein molecule by mass spectrometry, a purified type of protein molecule or a particular segment of a purified type of protein molecule is given positive and negative charges, or ionized, and made volatile in a mass spectrometer. The ionized, volatilized protein molecules or segments are then analyzed by the mass spectrometer. This produces a mass spectrum of the protein molecule or segment. The mass spectrum provides very precise information about the weight of the protein molecule or segment. Due to the precision with which a mass spectrometer determines the weight of protein molecules and segments of protein molecules, when a protein molecule or segment is analyzed, the information provided by mass spectrometry can be of use in inferring the sequence of amino acid residues in the protein molecule or segment. Mass spectrometers are also sensitive enough to provide information about modifications to particular amino acid residues of a protein molecule or segment. When a series of segments from a certain type of protein molecule are analyzed by mass spectrometry, the information about the sequences of and modifications to the amino acid residues of each segment can be combined to infer the sequence of and modifications to amino acid residues of an entire protein molecule.
Due to the sensitivity of mass spectrometry and the resulting ability to infer the sequences of the amino acid residues and modifications thereto of a particular type of protein molecule, the differences of aberrant or mutant forms of protein molecules from a normal protein molecule in amino acid residue sequences and amino acid residue modifications can also be inferred.
Nonetheless, mass spectrometry is a time-consuming process that requires expensive equipment and reagents.
It is thus a broad object of the present invention to increase the speed and efficiency with which protein molecules can be characterized.
It is also an object of the present invention to lend to a protein molecule a characteristic set of ancillary properties that are rapidly and reliably detectable.
It is a further object of the present invention to generate a listing of known protein molecules and their corresponding fingerprints as provided and determined by the method of the present invention.
Achieving the foregoing objects will fulfill further, broader objects of the present invention of improving biochemical research and healthcare.
To achieve the foregoing objects, and in accordance with the invention as embodied and broadly described herein, systems and methods for characterizing protein molecules are provided. Also provided are protein molecules having such tags attached thereto as impart the protein molecules distinguishing characteristics that are useable as fingerprints.
In one form, a system incorporating teachings of the present invention, which is capable of characterizing a protein molecule, lends to a protein molecule a characteristic set of ancillary properties that is rapidly and reliably detectable. As these ancillary properties are as uniquely identifying of the type of the protein molecule as fingerprints are reflective of the identity of a human being, the characteristic set of ancillary properties of a protein molecule function as a xe2x80x9cfingerprintxe2x80x9d of the protein molecule that may be used to rapidly and reliably identify the type of the protein molecule.
A system according to teachings of the present invention has denaturation means for linearizing the protein molecule, labeling means for attaching a tag to each of a first type of amino acid residue of the protein molecule, and detector means for detecting a fingerprint of the tagged protein molecule. The fingerprint of the protein molecule has a first fingerprint constituent imparted to the protein molecule by the tags on each first type of amino acid residue in the protein molecule and a second fingerprint constituent imparted to the protein molecule by each second type of amino acid residue in the protein molecule.
A system according to teachings of the present invention may also include isolation means for separating the protein molecule from other protein molecules in a sample, as well as collation means for comparing the fingerprint of a protein molecule of interest to the fingerprints of known protein molecules listed in a library.
An example of the denaturation means is a detergent, such as sodium dodecyl sulfate (hereinafter xe2x80x9cSDSxe2x80x9d), which gives the entire protein molecule a negative charge and therefore pulls the protein molecule out of its three-dimensional structure. Another example of the denaturation means is xcex2-mercaptoethanol, a chemical that breaks chemical linkages between the sulfur atoms of two amino acid residues.
A protein molecule of interest is separated from the other types of protein molecules present in a sample by way of isolation means for separating the protein molecule. Examples of isolation means that are useful in the systems and methods of the present invention include, without limitation, hydrodynamic focusing apparatus, electrophoretic gels, separation plates with apertures therethrough, and dilution systems for the sample in which the protein molecule of interest is located.
In a first example of the labeling means, a fluorescent dye is attached chemically to the amino acid residues in a protein molecule of a specific chosen type, thereby forming a tag on each amino acid residue of the specific chosen type in the protein molecule. In a second example, the labeling means is a metallic tag precursor that chemically bonds with the amino acid residues in protein a specific chosen type to form a tag on each amino acid residue of the specific chosen type in the protein molecule.
Of the twenty or so types of amino acid residues in protein molecules, one type of amino acid residue, known as tryptophan, self-fluoresces when exposed to electromagnetic excitation radiation of a certain range of wavelengths.
When a fluorescent dye is used as the labeling means, an example of the detector means includes electromagnetic excitation radiation of one or more excitation wavelengths or a range of excitation wavelengths that will stimulate the tryptophan amino acid residues of a protein molecule to emit radiation of a first emitted wavelength. The excitation radiation of the detector means will also cause the fluorescent dye to emit radiation of a second emitted wavelength. In this example, the detector means also includes a detector that is sensitive to the wavelengths of emitted radiation from the tryptophan amino acid residues of the protein molecule and to the fluorescent dye.
When the tags attached to each of the specific type of amino acid residue of the protein molecule are metallic, the detector means can include a nuclear magnetic resonance apparatus or other apparatus known in the art to be capable of detecting single metal atoms.
Alternatively, tags can be attached to more than one type of amino acid residue of the protein molecule. The tags on one type of amino acid residue are differentially detected from the tags on one or more other types of amino acid residues to determine different fingerprint constituents of the protein molecule.
According to another aspect of the invention, a listing or database is generated for use with specific protocols to identify protein molecules. This listing or database is referred to herein as a library, and includes the identities of a set of known protein molecules and information about the different fingerprint constituents of each of the known protein molecules of the listing. The different fingerprint constituents are imparted to the protein molecule by the labeling means of the system and detected by way of the detection means of the system. Collation means for comparing the fingerprint of a protein molecule of interest to the fingerprints of the known protein molecules listed in the library are then used to identify the protein molecule of interest. Typically, the function of such a collation means can be performed by a computer processor.
In yet another aspect, the present invention includes protein molecules that have been labeled with tags to impart fingerprint constituents to the protein molecule. Each fingerprint constituent indicates the number of a particular type of amino acid residue in a protein molecule and the relative locations of different types of amino acid residues in the protein molecule.
The prospect of being able to rapidly and reliably identify a type of protein molecule has utility in a wide range of research and clinical applications, such as, for example, in determining whether or not selected cells of a patient have entered early stages of cancer.
Additional objects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims.