The term “molecule” is, of course, known in the art. However, it is important to note that the word “molecule” encompasses two concepts. First, a molecule can be the mental picture of a specific molecule as a structure that may be represented by a chemical formula or sequence. Second, the thing that is or comprises a specific molecule, i.e., a population of (about 100 or more) molecules. For example, 18 g of water comprises 6.023×1023 (Avogadro's number) molecules of H2O.
Although a molecule in the former sense is devoid of other components, a composition comprising an actual population of a given molecule may also comprise other molecules (e.g., contaminants). Populations of molecules are said to be “pure” or to have been “purified” if the number of types or amounts of contaminants have been removed or depleted from compositions comprising populations of specific molecules.
It should be noted that even a pure population of molecules can be heterogeneous with regards to one or more characteristics. For example, as is described in more detail below, a specific protein is glycosylated, i.e., many amino acid residues in the protein have carbohydrates chemically attached thereto. However, in a population of glycosylated proteins, some proteins may be completely glycosylated (all possible glycosylation sites have been glycosylated in all proteins), whereas some may be only partially glycosylated proteins (i.e., only some of the possible glycosylation sites have been glycosylated in all protein and/or not all of proteins are glycosylated to the same extent). In addition, the glycan structures may not be the same on all the glycation sites on the protein.
The invention provides homogeneous, or nearly homogeneous, populations of a molecule that have advantages over heterogeneous populations of the same molecule. Such molecules are useful as, for example, molecular standards.
Compositions and methods useful for the elucidation of the primary structure (i.e., sequence) of an uncharacterized protein or nucleic acid is useful in many applications in fields such as molecular biology, biotechnology, informatics, genomics and proteomics. In order to identify and/or characterize a previously unknown and/or uncharacterized molecule, molecular standards are used.
A “molecular standard” is a molecule that is used to determine a characteristic of an unknown and/or on-test molecule in an assay, such as an analytical method. For example, a molecular standard can be a protein that is used as a molecular weight marker in protein gel electrophoresis, as illustrated in the Examples provided herein.    A “set of standards” are compositions that comprises (i) two or more known molecules differing in at least one detectable characteristic and/or (ii) two or more containers having different concentrations of a known molecule. As a simple example of the latter type of molecular standard, a set of solutions is prepared as a set of serial dilutions of a solution having a known concentration of a known molecule. Although it is customary to have more than one molecule in a molecular weight standard to allow the molecular weight of an unknown molecule to be estimated by interpolation, a single reference standard may be useful for comparison directly with an unknown sample. Differing mobility or other property between the unknown and the standard will indicate lack of identity in the molecular weight or other property being assessed.
The set of standards can be used with an assay that changes in a concentration-dependent way to estimate the concentration of the molecule in the test sample, or to calibrate the settings of a device or machine that measures a characteristic of the molecule. A comparison is made of the signal from the unknown molecule to signals from known molecules using a variety of techniques known to those skilled in the art. By way of non-limiting example, proteins and nucleic acids are analyzed using techniques such as electrophoresis, sedimentation, chromatography, and mass spectrometry.
For example, one basic characteristic of a molecule is its molecular weight (MW). Comparison of an uncharacterized molecule to a set of standards of known and different molecular weights (often called a molecular weight “ladder”) allows a determination of the apparent molecular weight of the uncharacterized molecule.
For example, proteins, nucleic acids and other molecules are electrophoresed or subjected to high pressure liquid chromatography (HPLC) in order to determine basic characteristics thereof. The MW of an uncharacterized protein can be estimated using HPLC, or electrophoresis, such as polyacrylamide electrophoresis (PAGE), including SDS-PAGE, and other techniques known in the art.
A variety of protein and nucleic acid molecular weight standards are commercially available. However, the molecular weight standards may not correspond closely enough in size to the unknown sample protein to allow an accurate estimation of apparent molecular weight. Moreover, some of the standards give poorly resolved (e.g., “fuzzy”) bands. Some are not useful for hybridization techniques (Southerns, Northerns, Westerns, etc.) because they do not transfer well to nitrocellulose or PVDF membranes. Others comprise co-migrating contaminants. All of these effects operate to reduce the precision and accuracy of the analytical method.
Protein standards of higher MWs (e.g., greater than from about 180 to about 500 kD) are problematic. In addition to the potential problems that apply to MW standards in general, high MW proteins are hard to prepare, whether by recombinant DNA technology or otherwise. Many of these formed by cross-linking a single species of a protein to obtain a series of multimers of the protein that have molecular weights that are 2-fold, three-fold, etc., of the MW of the protein monomer (for example, Sigma sells cross-linked Hemoglobin, having an apparent molecular weight of 280 kDa). Cross-linkers are added by chemical reactions, and it is often difficult to establish reaction conditions wherein multimerization proceeds to the desired degree. Moreover, crosslinking is ordinarily performed using reagents that react with functional groups on a molecule. Generally there are more than one such reactive functional group on a protein, so when protein molecules are crosslinked, a variety of products results. For example, if a protein has as few as 4 reactive sites, 16 different crosslinked entities would be formed leading to inhomogeneity in the marker.
Moreover, the degree of homogeneity in a molecular population of a protein is also affected by, among other things, the different types and extents of post-translational and other chemical modifications thereof. The modifications range from amino acid changes through to the addition of macromolecules: lipid, carbohydrate or protein. Also chemical modifications such as phosphorylation, alkylation, deamidation and such can occur. Many variants of the common amino acids can occur, which can affect the structure or function of the protein. A major class of modification includes glycosylation, which may be N-linked, O-linked, or glycosylphosphatidylinositol (GPI)-linked. Such modifications have roles in protein stability and folding, targeting and recognition. Lipid modification of proteins (e.g., prenylation, myristoylation, GPI-anchoring, etc.) is also common. See Nalivaeva et al., Post-translational Modifications of Proteins: Acetylcholinesterase as a Model System, Proteomics 1:735-747 (2001).
For proteins, a non-exhaustive list of exemplary protein molecular weight standards (protein molecular weight “ladders”) includes the following:
The pre-stained Broad Range protein molecular weight standard (Bio-Rad Laboratories, Hercules, Calif., Cat. No. 16001-018), which is composed of eight proteins:
ProteinMolecular Weight (kDa)Myosin (H-chain)209beta-Galactosidase124Bovine Serum Albumin80Ovalbumin49Carbonic Anhydrase34Soybean Trypsin Inhibitor29beta-Lactoglobulin21Aprotinin7.1
Protein Molecular Weight Markers, HPLC
ProteinMolecular Weight (kDa)Glutamate dehydrogenase290.00Lactate dehydrogenase142.00Enolase67.00Myokinase32.00Cytochrome c12.40
High Molecular Weight Protein Standards (Bio-Rad):
ProteinMolecular Weight (kDa)Myosin200.00beta-Galactosidase116.25Phosphorylase B97.40Serum Albumin66.20Ovalbumin45.00
Molecular weight markers, 14C-methylated For Molecular Weights 14,300-220,000 (Sigma M8932), which is a mixture of six 14C-methylated proteins:
ProteinMolecular Weight (kDa)Myosin220.00Phosphorylase b97.40Albumin, Bovine Serum66.00Ovalbumin46.00Carbonic Anhydrase30.00Lysozyme14.30
Molecular weight markers, 14C-methylated For Molecular Weights 2,350-30,000 (Sigma M8807), which is a mixture of five 14C-methylated proteins:
ProteinMolecular Weight (kDa)Carbonic Anhydrase30.00Soybean Trypsin Inhibitor21.50Cytochrome c12.50Aprotinin6.50Insulin (Bovine)5.74**After sample preparation, the bovine insulin will probably migrate as insulin a chain (2.35 kDa) and insulin b chain (3.40 kDa)
PEPPERMINTSTICK™ phosphoprotein molecular weight standards (Molecular Probes, P-33350):
ProteinMolecular Weight (kDa)beta-Galactosidase116.25Albumin, Bovine Serum66.20Ovalbumin45.00Beta-Casein23.60Avidin18.00Lysozyme14.40
A need exists for homogeneous populations of molecules having a known value for a molecular characteristic. Such populations can be compared to an uncharacterized molecule in order to estimate or determine the presence or absence of, or value for, a molecular characteristic.
A need exists for sets of molecules (molecular standards) having a known value for a molecular characteristic, such as molecular weight, wherein the value for the molecular characteristic is precise. A particular need exists for protein standards (protein “ladders”) that comprise proteins having a high molecular weight.
Patents and Published Patent Applications of Interest:                U.S. Pat. No. 5,449,758 (Protein Size Marker Ladder).        U.S. Pat. No. 5,580,788 (Use of Immunoglobulin-Binding Artificial Proteins as Molecular Weight Markers).        U.S. Pat. No. 5,714,326 (Method for the Multiplexed Preparation of Nucleic Acid Molecular Weight Markers and Resultant Products).        U.S. Pat. No. 5,578,180 (System for pH-neutral Longlife Precast Electrophoresis Gel).        U.S. Pat. No. 6,514,938 and published U.S. Patent Application US/2002/0115103 (Copolymer 1 Related Polypeptides for use as Molecular Weight Markers and for Therapeutic Use).        Published PCT patent application WO 02/13848 and published U.S. Patent Application US/2002/0155455 (Highly Homogeneous Molecular Markers for Electrophoresis).        