Cryptosporidium parvum (C. parvum), an intestinal Apicomplexan parasite, is a significant cause of diarrheal disease worldwide (Griffiths, 1998. Adv Parasitol. 40:37-85). In immunocompetent individuals, the disease is usually self-limiting, but it can be chronic and life threatening in immunocompromised patients such as those with AIDS.
C. parvum has also been associated with diarrheal disease in children in daycare centers, travelers, animal handlers and hospital personnel. Recently, the parasite has gained notoriety as the causative agent of numerous outbreaks of waterborne diarrheal disease. There is currently no effective, specific therapy approved for disease caused by this parasite.
C. parvum infection is initiated by ingestion of oocysts which, upon exposure to favorable conditions within the host, undergo excystation. Released sporozoites attach to and invade host cells forming a parasitophorus vacuole where the parasite undergoes further intracellular development through asexual and sexual cycles eventually leading to formation of new oocysts that are capable of reinitiating the infectious cycle. The ultrastructural aspects of the processes of attachment and invasion and various factors influencing attachment have been characterized (Tzipori et al., 1998. Adv Parasitol. 40:5-36; Hamer etal., 1994. Infect Immun. 62:2208-2213). However, the molecular basis of these host-parasite interactions is not well understood (Ward et al., 1998. Adv Parasitol. 40:151-185).
The present invention is based, in part, on the discovery of a gene, gp40gp 15 (SEQ ID NO:1). The gp40gp15 cDNA described below is a 981 nucleotide sequence which encodes a 49 KDa precursor protein (SEQ ID NO:6) of C. parvum. The precursor protein is proteolytically cleaved to yield two glycoproteins, gp40 (SEQ ID NO:2) and gp15 (SEQ ID NO:8). gp40 protein is a 40 KDa glycoprotein which is present in oocysts and sporozoites and is also shed from the parasite during invasion. gp40 protein mediates sporozoite attachment and invasion of host cells and is therefore useful as a target for prevention or therapy of cryptosporidiosis.
Accordingly, in one aspect, the invention features an isolated nucleic acid molecule comprising a nucleotide sequence encoding a gp40 protein or a biologically active portion thereof, as well as nucleic acid fragments suitable as primers or hybridization probes for the detection of a gp40-encoding nucleic acid (e.g., gp40 mRNA). The gp40 nucleotide sequence, nucleotides 1-666 of SEQ ID NO:1; SEQ ID NO:3, encodes a 222 amino acid protein (SEQ ID NO:2). gp40 protein includes a signal sequence of around 30 amino acids (from amino acid 1 to amino acid 30 of SEQ ID NO:2; SEQ ID NO:4) and has a mature protein length of 192 amino acids (amino acids 31-222 of SEQ ID NO:2; SEQ ID NO:5). gp40 protein possesses a polyserine domain (at amino acids 37 to 55 of SEQ ID NO:2) with multiple predicted O-glycosylation sites. The protein also has a hydrophobic stretch of amino acids in its C-terminal region consistent with that required for GPI-anchoring.
In one embodiment, an isolated gp40 nucleic acid molecule includes the nucleotide sequence of SEQ ID NO:3, or a complement of these nucleotide sequences. In another embodiment, the isolated nucleic acid molecule of the invention includes a nucleotide sequence which hybridizes, preferably under stringent conditions, to or has at least about 60-65%, preferably at least about 70-75%, more preferably at least about 80-85%, and even more preferably at least about 90-95%, 96%, 97%, 98% or 99% sequence identity to the nucleotide sequence shown in SEQ ID NO:3, or a portion thereof. In yet another embodiment, the isolated nucleic acid molecule encodes the amino acid sequence of SEQ ID NO:2 or SEQ ID NO:5. gp40 nucleic acid molecules can encode a protein which possesses at least one of the gp40 activities described herein, e.g., the ability to bind human intestinal epithelial cells.
In another embodiment, the isolated nucleic acid molecule encodes a protein or portion thereof wherein the protein or portion thereof includes an amino acid sequence which is sufficiently homologous to the amino acid sequence of SEQ ID NO:2 or SEQ ID NO:5 such that the protein or portion thereof possesses a gp40 biological activity, e.g., the ability to bind intestinal epithelial cells. The protein or portion thereof encoded by the nucleic acid molecule maintains the ability to play a role in mediating the attachment and invasion of host cells by C. parvum. In yet another embodiment, the protein encoded by the nucleic acid molecule has at least about 60-70%, preferably at least about 80-85%, and more preferably at least about 86%, 88%, 90%, and most preferably at least about 90-95%, 96%, 97%, 98% or 99% sequence identity to the amino acid sequence of SEQ ID NO:2 or SEQ ID NO:5. In one embodiment, the protein is a full length protein which is substantially homologous to the entire amino acid sequence of SEQ ID NO:2.
In another embodiment, the isolated nucleic acid molecule encodes a portion of a gp40 protein, e.g., a portion which includes a sequence encoding a polyserine domain with multiple O-glycosylation sites.
In another embodiment, the isolated nucleic acid molecule encodes a gp40 protein, or portion thereof, which has at least about 55%, 65%, 75%, 85% or 95% identity to SEQ ID NO:2 or SEQ ID NO:5, and has one or more of the following activities:1) it is involved in parasite-host interactions; 2) it interacts, directly or indirectly, with a host cell, e.g., a human intestinal epithelial cell; or 3) it modulates the ability of C. parvum sporozoites to attach and invade a host cell.
In another embodiment, the isolated nucleic acid molecule is at least 15 (30, 50, 100, 200, 300, 400, 500, 600, 700, 800, or 900) nucleotides in length and hybridizes under stringent conditions to a nucleic acid molecule consisting of the nucleotide sequence of SEQ ID NO:3.
Given the disclosure herein of gp40-encoding sequence (e.g., SEQ ID NO:3), antisense nucleic acid molecules (i.e., molecules which are complementary to the gp40 nucleotide sequence) are also provided by the invention.
In another embodiment, the encoded gp40 protein differs in amino acid sequence at least 1 to as many as 2, 3, 5, 10, 20 or 40 residues from a sequence in SEQ ID NO:2 or SEQ ID NO:5. In one embodiment, the differences are such that the gp40 encoded protein exhibits a gp40 biological activity, e.g., the encoded gp40 protein retains a biological activity of a naturally-occurring gp40, e.g., the gp40 protein of SEQ ID NO:2 or SEQ ID NO:5.
In another embodiment the encoded gp40 protein includes a gp40 sequence described herein as well as other N-terminal and/or C-terminal amino acid sequence.
The invention also features vectors, e.g., recombinant expression vectors, containing the nucleic acid molecules of the invention and host cells into which such vectors have been introduced. In one embodiment, such a host cell is used to produce gp40 protein by culturing the host cell in a suitable medium. The gp40 protein can be then isolated from the medium or the host cell.
In yet another embodiment, the biologically active portion of the gp40 protein includes a domain or motif, preferably a domain or motif which has a gp40 activity. The motif can be e.g., a short hydrophobic region at the C-terminus, consistent with that required for addition of a GPI anchor; a polyserine domain which has multiple predicted O-glycosylation sites which may be used by the parasite to bind a host cell; a carbohydrate domain; or a N-glycosylation site.
The invention also provides an isolated preparation of a gp40 protein. In one embodiment, the gp40 protein includes the amino acid sequence of SEQ ID NO:2 or SEQ ID NO:5. In another embodiment, the invention pertains to an isolated full length protein which is substantially homologous to the entire amino acid sequence of SEQ ID NO:2 (encoded by SEQ ID NO:3) or the mature amino acid sequence of SEQ ID NO:5. In yet another embodiment, the protein has at least about 60-70%, preferably at least about 80-85%, and more preferably at least about 86%, 88%, 90%, and most preferably at least about 90-95%, 96%, 97%, 98% or 99% sequence identity to the entire amino acid sequence of SEQ ID NO:2 or SEQ ID NO:5. In other embodiments, the isolated gp40 protein includes an amino acid sequence which has at least about 60-70% or more sequence identity to the amino acid sequence of SEQ ID NO:2 or SEQ ID NO:5 and has an one or more of the following activities: 1) it is involved in parasite-host interactions: 2) it interacts, directly or indirectly, with a host cell, e.g., a human intestinal epithelial cell; or 3) it modulates the ability of C. parvum sporozoites to attach and invade a host cell.
In yet another embodiment, the gp40 protein differs in amino acid sequence at up to 1, 2, 3, 5, or 10% of the residues from a sequence in SEQ ID NO:2 or SEQ ID NO:5. The differences are such that: the gp40 protein exhibits a gp40 biological activity, e.g., the gp40 protein retains a biological activity of a naturally occurring gp40.
In another aspect of the invention, the gp40 protein is a recombinant gp40 protein which differs from gp40 isolated from oocysts of C. parvum in its pattern of glycosylation or other posttranslational modifications.
The gp40 protein, portions or fragments thereof, can be used to prepare anti-gp40 antibodies. Accordingly, the invention also provides an antigenic peptide of gp40 which includes at least 8, 10, 20, 30, 50, 70 or 80 amino acid residues of the amino acid sequence shown in SEQ ID NO:2 or SEQ ID NO:5, and encompasses an epitope of gp40 such that an antibody raised against the peptide forms a specific immune complex with gp40. The invention further provides an antibody, e.g., a monoclonal antibody such as 4E9, or a monoclonal antibody that specifically binds gp40. In another embodiment, the antibody is coupled to a detectable substance. In yet another embodiment, the antibody is incorporated into a pharmaceutical composition comprising the antibody and a pharmaceutically acceptable carrier.
In another aspect, the invention features a method of inhibiting attachment and/or infection of a host by C. parvum by administering to an animal a therapeutically effective amount of a compound which inhibits gp40 expression or activity. The animal can be any mammal, including a human, a monkey, a horse, a pig, a cow or a sheep. The compound can be any molecule that binds to gp40, or to a gp40 target binding molecule, and inhibits the ability of C. parvum to attach and/or infect a host cell, e.g., an epithelial cell. The compound can be a polypeptide selected for binding in, e.g., a phage display or two-hybrid assay; an antibody that is specifically reactive with gp40 or gp40 binding protein; a gp40 antisense molecule; fusions of gp40; a small molecule, e.g., a small molecule which binds to the control region of gp40; or an agent. A compound which modulates gp40 activity can be a compound which decreases gp40 protein activity or gp40 nucleic acid expression. In another embodiment, the method includes administering a nucleic acid which encodes one of the above-described compounds.
In another aspect, the invention features, a method of modulating a gp40 activity, in vitro or in vivo. The method includes contacting gp40 with a compound that modulates the activity of gp40. gp40 activity may be modulated by administering: a gp40 antisense molecule; an antibody; a gp40 target binding protein (i.e., a protein that binds GP40), or a gp40 target binding portion thereof, e.g., a polypeptide selected for binding in, e.g., a phage display or two hybrid assay; a small molecule, e.g., a small molecule which binds to the control region of gp40. In another embodiment, the method includes administering a nucleic acid which encodes one of the above-described agents. A biological activity of gp40 that can modulated by the present method includes: 1) modulating an interaction, directly or indirectly, with a gp40 target binding protein, e.g., a gp40 target binding protein on a human intestinal epithelial cell; or 2) inhibiting the attachment of sporozoites of C. parvum to intestinal epithelial cells. In another embodiment, gp40 is a C. parvum sporozoite present within a subject and the agent is administered to the subject.
The invention also features methods for evaluating a subject suspected of having a C. parvum infection. The method includes evaluating, e.g., detecting, the presence of a gp40 gene or gp40 protein in a sample, thereby determining if a subject is infected with C. parvum. In one embodiment, the method includes evaluating, e.g., in a sample of cells from the subject, the presence or absence of gp40, e.g., by contacting the sample with a nucleic acid probe capable of hybridizing to gp40 mRNA, e.g., a labeled probe or contacting a sample with an antibody capable of binding to gp40 protein, e.g., a labeled antibody; or by detecting the presence of C. parvum by using an ELISA that contains an antibody which specifically binds to gp40 and evaluates the level of C. parvum gp40 in the sera. A patient may be evaluated for the presence of antibodies directed against gp40 by obtaining a biological sample form the patient, e.g., a serum sample, contacting the sample with gp40 protein (or a fragment thereof) and determining if there are antibodies which bind the gp40 protein present in the biological sample.
The invention also features methods for identifying a compound or agent which interacts with a gp40 protein. In one embodiment, the interaction with a gp40 protein can be binding, phosphorylation, or otherwise interacting to form or break a bond, e.g., a covalent or non-covalent bond. A compound can include, for example, a fragment or analog of a gp40 binding polypeptide, e.g., a randomly generated polypeptide which interacts with gp40, or a small molecule. In another embodiment, the method can include the steps of contacting the gp40 protein with the compound or agent under conditions which allow binding of the compound to the gp40 protein to form a complex and detecting the formation of a complex of the gp40 protein and the compound in which the ability of the compound to bind to the gp40 protein is indicated by the presence of the compound in the complex. Methods for identifying a compound or agent can be performed, for example, using a cell free assay or a cell-based assay.
In another aspect, the invention features methods for identifying compounds which modulate gp40 nucleic acid expression. In one embodiment, nucleic acid expression can be evaluated using a nucleic acid probe, e.g., a labeled probe, capable of hybridizing to a gp40 nucleic acid molecule, e.g., gp40 mRNA. gp40 expression can be evaluated, for example, by detecting the production of gp40 protein, e.g., using an antibody, e.g., a labeled antibody, or by determining a cell activity, e.g., using a marker gene, e.g., a lacZ gene, fused to the control region of gp40 and following production of the marker.
A xe2x80x9cpurifiedxe2x80x9d or xe2x80x9csubstantially purexe2x80x9d or xe2x80x9cisolatedxe2x80x9d polypeptide, as used herein, means a polypeptide that has been separated from other proteins, lipids, and nucleic acids with which it naturally occurs. Preferably, the polypeptide is also separated from substances, e.g., antibodies or gel matrix, e.g., polyacrylamide, which are used to purify it. Preferably, the polypeptide constitutes at least 10, 20, 50 70, 80 or 95% dry weight of the purified preparation. Preferably, the preparation contains: sufficient polypeptide to allow protein sequencing; at least 1, 10, or 100 xcexcg of the polypeptide; at least 1, 10, or 100 mg of the polypeptide.
An xe2x80x9cisolatedxe2x80x9d or xe2x80x9cpure nucleic acidxe2x80x9d, e.g., a substantially pure DNA, is a nucleic acid which is one or both of: not immediately contiguous with either one or both of the sequences, e.g., coding sequences, with which it is immediately contiguous (i.e., one at the 5xe2x80x2 end and one at the 3xe2x80x2 end) in the naturally-occurring genome of the organism from which the nucleic acid is derived; or which is substantially free of a nucleic acid sequence with which it occurs in the organism from which the nucleic acid is derived. The term includes, for example, a recombinant DNA which is incorporated into a vector, e.g., into an autonomously replicating plasmid or virus, or into the genomic DNA of a prokaryote or eukaryote, or which exists as a separate molecule (e.g., a cDNA or a genomic DNA fragment produced by PCR or restriction endonuclease treatment) independent of other DNA sequences. Substantially pure DNA can also include a recombinant DNA which is part of a hybrid gene encoding sequence. Moreover, an xe2x80x9cisolatedxe2x80x9d nucleic acid molecule, such as a cDNA molecule, can be substantially free of other cellular material, or culture medium when produced by recombinant techniques, or substantially free of chemical precursors or other chemicals when chemically synthesized.
The terms xe2x80x9cpeptidesxe2x80x9d, xe2x80x9cproteinsxe2x80x9d, and xe2x80x9cpolypeptidesxe2x80x9d are used interchangeably herein.
A xe2x80x9cbiological activity of gp40xe2x80x9d refers to one or more of the following activities: 1) it is involved in parasite-lost interactions: 2) it interacts, directly or indirectly, with a host cell, e.g., a human intestinal epithelial cell; or 3) it modulates the ability of C. parvum sporozoites to attach and invade a host cell.
The term xe2x80x9csmall moleculexe2x80x9d, as used herein, includes peptides, peptidomimetics, or non-peptidic compounds, such as organic molecules, having a molecular weight less than 2000, preferably less than 1000.