The invention relates to isolated nucleic acids and polypeptides derived from Staphylococcus epidermidis that are useful as molecular targets for diagnostics, prophylaxis and treatment of pathological conditions, as well as materials and methods for the diagnosis, prevention, and amelioration of pathological conditions resulting from bacterial infection.
Incorporated herein by reference in its entirety is a Sequence Listing, comprising SEQ ID NO: 1 to SEQ ID NO: 5674. The Sequence Listing is contained on a CD-ROM, three copies of which are filed, the Sequence Listing being in a computer-readable ASCII file named xe2x80x9cGtc007.ptoxe2x80x9d, created on May 15, 2001 and of 9,485,000 bytes in size, in IBM-PC Windows(copyright)NT v4.0 format.
Staphylococcus epidermidis (S. epidermidis) is a species of staphylococcal bacteria that are Gram-positive, nonmotile, nonpigmented and coagulase-negative cocci, which are mainly found on the skin and mucous membrane of warm-blooded animals. Their large numbers and ubiquitous distribution result in frequent contamination of specimens collected from or through the skin, making these organisms amongst the most frequently isolated in the clinical laboratory. In the past, S. epidermidis was rarely the cause of significant infections, but with the increasing use of implanted catheters and prosthetic devices, it has emerged as an important agent of hospital-acquired infections and has been recognized as a true pathogen (Lowy and Hammer, 1983, Ann Intern Med, 99: 834-9; Blum and Rodvold, 1987, Clin Pharm, 6: 464-75; Hamory, Parisi et al., 1987, Am J Infect Control, 15: 59-74). S. epidermidis is a major cause of infection of indwelling foreign devices such as, orthopedic devices, intravenous catheters, prosthetic heart valves, central nervous system shunts, and peritoneal dialysis catheters (Blum and Rodvold, 1987, Clin Pharm, 6: 464-75; Archer, 1988, J Antimicrob Chemother, 21 Suppl C: 133-8)(Lowy and Hammer, 1983, Ann Intern Med, 99: 834-9; Hamory, Parisi et al., Staphylococcus 1987. Am J Infect Control, 15: 59-74). In addition S. epidermidis is a common cause of postoperative wound infections, bacteremia of immunosuppressed patients, intensive-care unit patients and premature newborns (MacLowry, 1983, Am J Med, 75: 2-6)(Eykyn, 1988, Lancet, 1: 100-4). According to a national survey (Centers for Disease Control, 1981:7) S. epidermidis caused 8.9% of primary nosocomial bacteremias.
Treatment of S. epidermidis infections remains difficult because of the occult nature, association with foreign bodies, and frequent resistance to antimicrobial agents. Ordinarily, S. epidermidis is an organism with low virulence, however breaks in host defense caused by surgery, catheter placement, prosthesis insertion or immunosuppression is prerequisite for infection. The presence of foreign bodies itself facilitates infection by protecting the organism from elimination by host defenses or antimicrobial therapy (Lowy and Hammer, 1983, Ann Intern Med, 99: 834-9). Furthermore, S. epidermidis due to its ability to produce extracellular polysaccharide material or slime, may be uniquely adapted to adhere to smooth surfaces such as plastics or metal. Slime producing strains of S. epidermidis appear to be more pathogenic than non-slime producing strains (Christensen, Simpson et al., 1983, Infect Immun, 40: 407-10; Peters and Pulverer, 1984, J Antimicrob Chemother, 14 Suppl D: 67-71; Gallimore, Gagnon et al., 1991, J Infect Dis, 164: 1220-3). This property and many factors are involved in the pathogenesis of device associated infections. Despite the increased recognition as a pathogen, S. epidermidis infections are difficult to diagnose. Differentiating clinically important from clinically unimportant bacterial isolates of S. epidermidis is difficult because of the high rate of contamination.
Although laboratory isolates of S. epidermidis have generally been susceptible to semisynthetic penicillins (methicillin, nafcillin, oxacillin), cephalosporins, amino-glycosides, vancomycin and rafampin, recent clinical isolates have had an increased resistance. Recent reports (Karchmer, 1985, Am J Med, 78: 116-27; Karchmer, 1991, J Hosp Infect, 18 Suppl A: 355-66) show that 83% of S. epidermidis isolates from patients with prosthetic valve endocarditis are methicillin resistant and 32% are gentamicin resistant as well. Multi-drug resistant staphylococci have emerged in the midst of high level use of penicillin and aminoglycosides (Centers for Disease Control and Prevention, 1993 MMWR 42:597; and S. Handwerger et al., 1993, Clin Infect Dis 16:750).
The use of antibiotics for therapeutics and prophylactic purposes, promotes the selection of resistant organisms and the spread of antibiotic resistance genes among bacteria. Previous studies have shown that virtually all staphylococci carry some antibiotic resistance genes on naturally occurring extrachromosomal mobile genetic elements, such as the plasmids. Survey and analysis of plasmids in clinical isolates of S. epidermidis have shown that more that 80% of isolates carry plasmids and in several cases more than one plasmid (Archer et al., 1982, Infect Immun, 35:627-632; Kloos et al., 1981, Can J Microbiol, 27:271-278; Moller, 1988, J Hosp Infect 12:19-27). Though the most important forms of resistance has been the inactivation of antibiotics, particularly penicillins and cephalosporins, recent clinical isolates have resistance to one or more of the following antibiotics, methicillin, tetracycline, erythromycin, gentamycin, kanamycin and chloramphenicol. In fact due to the wide spread occurrence of plasmids and their involvement in antibiotic resistance, plasmid profiling has been used as an epidemiological reagent to study nosocomial infections. This invention relates to isolated nucleic acids and polypeptides derived from S. epidermidis plasmids that are useful as molecular targets for diagnosis, prophylaxis and treatment of pathological conditions, as well as materials and methods for the diagnosis, prevention, and amelioration of pathological conditions resulting from bacterial infection.
These concerns point to the need for diagnostic tools and therapeutics aimed at proper identification of strain and eradication of virulence. The design of vaccines that will limit the spread of infection and halt transfer of resistance factors is very desirable.
The present invention fulfills the need for diagnostic tools and therapeutics by providing bacterial-specific compositions and methods for detecting, treating, and preventing bacterial infection, in particular S. epidermidis infection.
The present invention encompasses isolated nucleic acids and polypeptides derived from S. epidermidis that are useful as reagents for diagnosis of bacterial disease, components of effective antibacterial vaccines, and/or as targets for antibacterial drugs including anti-S. epidermidis drugs. They can also be used to detect the presence of S. epidermidis and other Staphylococcus species in a sample; and in screening compounds for the ability to interfere with the S. epidermidis life cycle or to inhibit S. epidermidis infection. They also has use as biocontrol agents for plants.
More specifically, this invention features compositions of nucleic acids corresponding to entire coding sequences of S. epidermidis proteins, including surface or secreted proteins or parts thereof, nucleic acids capable of binding mRNA from S. epidermidis proteins to block protein translation, and methods for producing S. epidermidis proteins or parts thereof using peptide synthesis and recombinant DNA techniques. This invention also features antibodies and nucleic acids useful as probes to detect S. epidermidis infection. In addition, vaccine compositions and methods for the protection or treatment of infection by S. epidermidis are within the scope of this invention.
The nucleotide sequences provided in SEQ ID NO: 1-SEQ ID NO: 2837, a fragment thereof, or a nucleotide sequence at least 99.5% identical to SEQ ID NO: 1-SEQ ID NO: 2837 may be xe2x80x9cprovidedxe2x80x9d in a variety of medias to facilitate use thereof. As used herein, xe2x80x9cprovidedxe2x80x9d refers to a manufacture, other than an isolated nucleic acid molecule, which contains a nucleotide sequence of the present invention, i.e., the nucleotide sequence provided in SEQ ID NO: 1-SEQ ID NO: 2837, a fragment thereof, or a nucleotide sequence at least 99.5% identical to a sequence contained within SEQ ID NO: 1-SEQ ID NO: 2837. Uses for and methods for providing nucleotide sequences in a variety of media is well known in the art (see e.g., EPO Publication No. EP 0 756 006).
In one application of this embodiment, a nucleotide sequence of the present invention can be recorded on computer readable media. As used herein, xe2x80x9ccomputer readable mediaxe2x80x9d refers to any media which can be read and accessed directly by a computer. Such media include, but are not limited to: magnetic storage media, such as floppy discs, hard disc storage media, and magnetic tape; optical storage media such as CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these categories such as magnetic/optical storage media. A person skilled in the art can readily appreciate how any of the presently known computer readable media can be used to create a manufacture comprising computer readable media having recorded thereon a nucleotide sequence of the present invention.
As used herein, xe2x80x9crecordedxe2x80x9d refers to a process for storing information on computer readable media. A person skilled in the art can readily adopt any of the presently known methods for recording information on computer readable media to generate manufactures comprising the nucleotide sequence information of the present invention.
A variety of data storage structures are available to a person skilled in the art for creating a computer readable media having recorded thereon a nucleotide sequence of the present invention. The choice of the data storage structure will generally be based on the means chosen to access the stored information. In addition, a variety of data processor programs and formats can be used to store the nucleotide sequence information of the present invention on computer readable media. The sequence information can be represented in a word processing text file, formatted in commercially-available software such as WordPerfect and Microsoft Word, or represented in the form of an ASCII file, stored in a database application, such as DB2, Sybase, Oracle, or the like. A person skilled in the art can readily adapt any number of data processor structuring formats (e.g. text file or database) in order to obtain computer readable media having recorded thereon the nucleotide sequence information of the present invention.
By providing the nucleotide sequence of SEQ ID NO: 1-SEQ ID NO: 2837, a fragment thereof, or a nucleotide sequence at least 99.5% identical to SEQ ID NO: 1-SEQ ID NO: 2837 in computer readable form, a person skilled in the art can routinely access the coding sequence information for a variety of purposes. Computer software is publicly available which allows a person skilled in the art to access sequence information provided in a computer readable media. Examples of such computer software include programs of the xe2x80x9cStaden Packagexe2x80x9d, xe2x80x9cDNA Starxe2x80x9d, xe2x80x9cMacVectorxe2x80x9d, GCG xe2x80x9cWisconsin Packagexe2x80x9d (Genetics Computer Group, Madison, Wis.) and xe2x80x9cNCBI Toolboxxe2x80x9d (National Center For Biotechnology Information).
Computer algorithms enable the identification of S. epidermidis open reading frames (ORFs) within SEQ ID NO: 1-SEQ ID NO: 2837 which contain homology to ORFs or proteins from other organisms. Examples of such similarity-search algorithms include the BLAST [Altschul et al., J. Mol. Biol. 215:403-410 (1990)] and Smith-Waterman [Smith and Waterman (1981) Advances in Applied Mathematics, 2:482-489] search algorithms. These algorithms are utilized on computer systems as exemplified below. The ORFs so identified represent protein encoding fragments within the S. epidermidis genome and are useful in producing commercially important proteins such as enzymes used in fermentation reactions and in the production of commercially useful metabolites.
The present invention further provides systems, particularly computer-based systems, which contain the sequence information described herein. Such systems are designed to identify commercially important fragments of the S. epidermidis genome. As used herein, xe2x80x9ca computer-based systemxe2x80x9d refers to the hardware means, software means, and data storage means used to analyze the nucleotide sequence information of the present invention. The minimum hardware means of the computer-based systems of the present invention comprises a central processing unit (CPU), input means, output means, and data storage means. A person skilled in the art can readily appreciate that any one of the currently available computer-based systems is suitable for use in the present invention. The computer-based systems of the present invention comprise a data storage means having stored therein a nucleotide sequence of the present invention and the necessary hardware means and software means for supporting and implementing a search means. As used herein, xe2x80x9cdata storage meansxe2x80x9d refers to memory which can store nucleotide sequence information of the present invention, or a memory access means which can access manufactures having recorded thereon the nucleotide sequence information of the present invention.
As used herein, xe2x80x9csearch meansxe2x80x9d refers to one or more programs which are implemented on the computer-based system to compare a target sequence or target structural motif with the sequence information stored within the data storage means. Search means are used to identify fragments or regions of the S. epidermidis genome which are similar to, or xe2x80x9cmatchxe2x80x9d, a particular target sequence or target motif. A variety of known algorithms are known in the art and have been disclosed publicly, and a variety of commercially available software for conducting homology-based similarity searches are available and can be used in the computer-based systems of the present invention. Examples of such software includes, but is not limited to, FASTA (GCG Wisconsin Package), Bic_SW (Compugen Bioccelerator), BLASTN2, BLASTP2, BLASTX2 (NCBI) and Motifs (GCG). A person skilled in the art can readily recognize that any one of the available algorithms or implementing software packages for conducting homology searches can be adapted for use in the present computer-based systems.
As used herein, a xe2x80x9ctarget sequencexe2x80x9d can be any DNA or amino acid sequence of six or more nucleotides or two or more amino acids. A person skilled in the art can readily recognize that the longer a target sequence is, the less likely a target sequence will be present as a random occurrence in the database. The most preferred sequence length of a target sequence is from about 10 to 100 amino acids or from about 30 to 300 nucleotide residues. However, it is well recognized that many genes are longer than 500 amino acids, or 1.5 kb in length, and that commercially important fragments of the S. epidermidis genome, such as sequence fragments involved in gene expression and protein processing, will often be shorter than 30 nucleotides.
As used herein, xe2x80x9ca target structural motif,xe2x80x9d or xe2x80x9ctarget motif,xe2x80x9d refers to any rationally selected sequence or combination of sequences in which the sequence(s) are chosen based on a specific functional domain or three-dimensional configuration which is formed upon the folding of the target polypeptide. There are a variety of target motifs known in the art. Protein target motifs include, but are not limited to, enzymatic active sites, membrane-spanning regions, and signal sequences. Nucleic acid target motifs include, but are not limited to, promoter sequences, hairpin structures and inducible expression elements (protein binding sequences).
A variety of structural formats for the input and output means can be used to input and output the information in the computer-based systems of the present invention. A preferred format for an output means ranks fragments of the S. epidermidis genome possessing varying degrees of homology to the target sequence or target motif. Such presentation provides a person skilled in the art with a ranking of sequences which contain various amounts of the target sequence or target motif and identifies the degree of homology contained in the identified fragment.
A variety of comparing means can be used to compare a target sequence or target motif with the data storage means to identify sequence fragments of the S. epidermidis genome. In the present examples, implementing software which implement the BLASTP2 and bic_SW algorithms (Altschul et al., J Mol. Biol. 215:403-410 (1990); Compugen Biocellerator) was used to identify open reading frames within the S. epidermidis genome. A person skilled in the art can readily recognize that any one of the publicly available homology search programs can be used as the search means for the computer-based systems of the present invention.
The invention features S. epidermidis polypeptides, preferably a substantially pure preparation of a S. epidermidis polypeptide, or a recombinant S. epidermidis polypeptide. In preferred embodiments: the polypeptide has biological activity; the polypeptide has an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 98%, or 99% identical to an amino acid sequence of the invention contained in the Sequence Listing, preferably it has about 65% sequence identity with an amino acid sequence of the invention contained in the Sequence Listing, and most preferably it has about 92% to about 99% sequence identity with an amino acid sequence of the invention contained in the Sequence Listing; the polypeptide has an amino acid sequence essentially the same as an amino acid sequence of the invention contained in the Sequence Listing; the polypeptide is at least 5, 10, 20, 50, 100, or 150 amino acid residues in length; the polypeptide includes at least 5, preferably at least 10, more preferably at least 20, more preferably at least 50, 100, or 150 contiguous amino acid residues of the invention contained in the Sequence Listing. In yet another preferred embodiment, the amino acid sequence which differs in sequence identity by about 7% to about 8% from the S. epidermidis amino acid sequences of the invention contained in the Sequence Listing is also encompassed by the invention.
In preferred embodiments: the S. epidermidis polypeptide is encoded by a nucleic acid of the invention contained in the Sequence Listing, or by a nucleic acid having at least 60%, 70%, 80%, 90%, 95%, 98%, or 99% homology with a nucleic acid of the invention contained in the Sequence Listing.
In a preferred embodiment, the subject S. epidermidis polypeptide differs in amino acid sequence at 1, 2, 3, 5, 10 or more residues from a sequence of the invention contained in the Sequence Listing. The differences, however, are such that the S. epidermidis polypeptide exhibits a S. epidermidis biological activity, e.g., the S. epidermidis polypeptide retains a biological activity of a naturally occurring S. epidermidis enzyme.
In preferred embodiments, the polypeptide includes all or a fragment of an amino acid sequence of the invention contained in the Sequence Listing; fused, in reading frame, to additional amino acid residues, preferably to residues encoded by genomic DNA 5xe2x80x2 or 3xe2x80x2 to the genomic DNA which encodes a sequence of the invention contained in the Sequence Listing.
In yet other preferred embodiments, the S. epidermidis polypeptide is a recombinant fusion protein having a first S. epidermidis polypeptide portion and a second polypeptide portion, e.g., a second polypeptide portion having an amino acid sequence unrelated to S. epidermidis. The second polypeptide portion can be, e.g., any of glutathione-S-transferase, a DNA binding domain, or a polymerase activating domain. In preferred embodiment the fusion protein can be used in a two-hybrid assay.
Polypeptides of the invention include those which arise as a result of alternative transcription events, alternative RNA splicing events, and alternative translational and postranslational events.
In a preferred embodiment, the encoded S. epidermidis polypeptide differs (e.g., by amino acid substitution, addition or deletion of at least one amino acid residue) in amino acid sequence at 1, 2, 3, 5, 10 or more residues, from a sequence of the invention contained in the Sequence Listing. The differences, however, are such that: the S. epidermidis encoded polypeptide exhibits a S. epidermidis biological activity, e.g., the encoded S. epidermidis enzyme retains a biological activity of a naturally occurring S. epidermidis. 
In preferred embodiments, the encoded polypeptide includes all or a fragment of an amino acid sequence of the invention contained in the Sequence Listing; fused, in reading frame, to additional amino acid residues, preferably to residues encoded by genomic DNA 5xe2x80x2 or 3xe2x80x2 to the genomic DNA which encodes a sequence of the invention contained in the Sequence Listing.
The S. epidermidis strain, from which the nucleotide sequences have been sequenced, was deposited on Jul. 10, 1997 in the American Type Culture Collection (ATCC #55998) as strain 18972.
Included in the invention are: allelic variations; natural mutants; induced mutants; proteins encoded by DNA that hybridize under high or low stringency conditions to a nucleic acid which encodes a polypeptide of the invention contained in the Sequence Listing (for definitions of high and low stringency see Current Protocols in Molecular Biology, John Wiley and Sons, New York, 1989, 6.3.1-6.3.6, hereby incorporated by reference); and, polypeptides specifically bound by antisera to S. epidermidis polypeptides, especially by antisera to an active site or binding domain of S. epidermidis polypeptide. The invention also includes fragments, preferably biologically active fragments. These and other polypeptides are also referred to herein as S. epidermidis polypeptide analogs or variants.
The invention further provides nucleic acids, e.g., RNA or DNA, encoding a polypeptide of the invention. This includes double stranded nucleic acids as well as coding and antisense single strands.
In preferred embodiments, the subject S. epidermidis nucleic acid will include a transcriptional regulatory sequence, e.g. at least one of a transcriptional promoter or transcriptional enhancer sequence, operably linked to the S. epidermidis gene sequence, e.g., to render the S. epidermidis gene sequence suitable for expression in a recombinant host cell.
In yet a further preferred embodiment, the nucleic acid which encodes a S. epidermidis polypeptide of the invention, hybridizes under stringent conditions to a nucleic acid probe corresponding to at least 8 consecutive nucleotides of the invention contained in the Sequence Listing; more preferably to at least 12 consecutive nucleotides of the invention contained in the Sequence Listing; more preferably to at least 20 consecutive nucleotides of the invention contained in the Sequence Listing; more preferably to at least 40 consecutive nucleotides of the invention contained in the Sequence Listing.
In another aspect, the invention provides a substantially pure nucleic acid having a nucleotide sequence which encodes a S. epidermidis polypeptide. In preferred embodiments: the encoded polypeptide has biological activity; the encoded polypeptide has an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 98%, or 99% homologous to an amino acid sequence of the invention contained in the Sequence Listing; the encoded polypeptide has an amino acid sequence essentially the same as an amino acid sequence of the invention contained in the Sequence Listing; the encoded polypeptide is at least 5, 10, 20, 50, 100, or 150 amino acids in length; the encoded polypeptide comprises at least 5, preferably at least 10, more preferably at least 20, more preferably at least 50, 100, or 150 contiguous amino acids of the invention contained in the Sequence Listing.
In another aspect, the invention encompasses: a vector including a nucleic acid which encodes a S. epidermidis polypeptide or a S. epidermidis polypeptide variant as described herein; a host cell transfected with the vector; and a method of producing a recombinant S. epidermidis polypeptide or S. epidermidis polypeptide variant; including culturing the cell, e.g., in a cell culture medium, and isolating a S. epidermidis or S. epidermidis polypeptide variant, e.g., from the cell or from the cell culture medium.
One embodiment of the invention is directed to substantially isolated nucleic acids. Nucleic acids of the invention include sequences comprising at least about 8 nucleotides in length, more preferably at least about 12 nucleotides in length, even more preferably at least about 15-20 nucleotides in length, that correspond to a subsequence of any one of SEQ ID NO: 1-SEQ ID NO: 2837 or complements thereof. Alternatively, the nucleic acids comprise sequences contained within any ORF (open reading frame), including a complete protein-coding sequence, of which any of SEQ ID NO: 1-SEQ ID NO: 2837 forms a part. The invention encompasses sequence-conservative variants and function-conservative variants of these sequences. The nucleic acids may be DNA, RNA, DNA/RNA duplexes, protein-nucleic acid (PNA), or derivatives thereof.
In another aspect, the invention features, a purified recombinant nucleic acid having at least 50%, 60%, 70%, 80%, 90%, 95%, 98%, or 99% homology with a sequence of the invention contained in the Sequence Listing.
The invention also encompasses recombinant DNA (including DNA cloning and expression vectors) comprising these S. epidermidis-derived sequences; host cells comprising such DNA, including fungal, bacterial, yeast, plant, insect, and mammalian host cells; and methods for producing expression products comprising RNA and polypeptides encoded by the S. epidermidis sequences. These methods are carried out by incubating a host cell comprising a S. epidermidis-derived nucleic acid sequence under conditions in which the sequence is expressed. The host cell may be native or recombinant. The polypeptides can be obtained by (a) harvesting the incubated cells to produce a cell fraction and a medium fraction; and (b) recovering the S. epidermidis polypeptide from the cell fraction, the medium fraction, or both. The polypeptides can also be made by in vitro translation.
In another aspect, the invention features nucleic acids capable of binding mRNA of S. epidermidis. Such nucleic acid is capable of acting as antisense nucleic acid to control the translation of mRNA of S. epidermidis. A further aspect features a nucleic acid which is capable of binding specifically to a S. epidermidis nucleic acid. These nucleic acids are also referred to herein as complements and have utility as probes and as capture reagents.
In another aspect, the invention features an expression system comprising an open reading frame corresponding to S. epidermidis nucleic acid. The nucleic acid further comprises a control sequence compatible with an intended host. The expression system is useful for making polypeptides corresponding to S. epidermidis nucleic acid.
In another aspect, the invention encompasses: a vector including a nucleic acid which encodes a S. epidermidis polypeptide or a S. epidermidis polypeptide variant as described herein; a host cell transfected with the vector; and a method of producing a recombinant S. epidermidis polypeptide or S. epidermidis polypeptide variant; including culturing the cell, e.g., in a cell culture medium, and isolating the S. epidermidis or S. epidermidis polypeptide variant, e.g., from the cell or from the cell culture medium.
In yet another embodiment of the invention encompasses reagents for detecting bacterial infection, including S. epidermidis infection, which comprise at least one S. epidermidis-derived nucleic acid defined by any one of SEQ ID NO: 1-SEQ ID NO: 2837, or sequence-conservative or function-conservative variants thereof. Alternatively, the diagnostic reagents comprise polypeptide sequences that are contained within any open reading frames (ORFs), including complete protein-coding sequences, contained within any of SEQ ID NO: 1-SEQ ID NO: 2837, or polypeptide sequences contained within any of SEQ ID NO: 2838-SEQ ID NO: 5674, or polypeptides of which any of the above sequences forms a part, or antibodies directed against any of the above peptide sequences or function-conservative variants and/or fragments thereof.
The invention further provides antibodies, preferably monoclonal antibodies, which specifically bind to the polypeptides of the invention. Methods are also provided for producing antibodies in a host animal. The methods of the invention comprise immunizing an animal with at least one S. epidermidis-derived immunogenic component, wherein the immunogenic component comprises one or more of the polypeptides encoded by any one of SEQ ID NO: 1-SEQ ID NO: 2837 or sequence-conservative or function-conservative variants thereof; or polypeptides that are contained within any ORFs, including complete protein-coding sequences, of which any of SEQ ID NO: 1-SEQ ID NO: 2837 forms a part; or polypeptide sequences contained within any of SEQ ID NO: 2838-SEQ ID NO: 5674, or polypeptides of which any of SEQ ID NO: 2838-SEQ ID NO: 5674 forms a part. Host animals include any warm blooded animal, including without limitation mammals and birds. Such antibodies have utility as reagents for immunoassays to evaluate the abundance and distribution of S. epidermidis-specific antigens.
In yet another aspect, the invention provides diagnostic methods for detecting S. epidermidis antigenic components or anti-S. epidermidis antibodies in a sample. S. epidermidis antigenic components are detected by a process comprising: (i) contacting a sample suspected to contain a bacterial antigenic component with a bacterial-specific antibody, under conditions in which a stable antigen-antibody complex can form between the antibody and bacterial antigenic components in the sample; and (ii) detecting any antigen-antibody complex formed in step (i), wherein detection of an antigen-antibody complex indicates the presence of at least one bacterial antigenic component in the sample. In different embodiments of this method, the antibodies used are directed against a sequence encoded by any of SEQ ID NO: 1-SEQ ID NO: 2837 or sequence-conservative or function-conservative variants thereof, or against a polypeptide sequence contained in any of SEQ ID NO: 2838-SEQ ID NO: 5674 or function-conservative variants thereof.
In yet another aspect, the invention provides a method for detecting antibacterial-specific antibodies in a sample, which comprises: (i) contacting a sample suspected to contain antibacterial-specific antibodies with a S. epidermidis antigenic component, under conditions in which a stable antigen-antibody complex can form between the S. epidermidis antigenic component and antibacterial antibodies in the sample; and (ii) detecting any antigen-antibody complex formed in step (i), wherein detection of an antigen-antibody complex indicates the presence of antibacterial antibodies in the sample. In different embodiments of this method, the antigenic component is encoded by a sequence contained in any of SEQ ID NO: 1-SEQ ID NO: 2837 or sequence-conservative and function-conservative variants thereof, or is a polypeptide sequence contained in any of SEQ ID NO: 2838-SEQ ID NO: 5674 or function-conservative variants thereof.
In another aspect, the invention features a method of generating vaccines for immunizing an individual against S. epidermidis. The method includes: immunizing a subject with a S. epidermidis polypeptide, e.g., a surface or secreted polypeptide, or a combination of such peptides or active portion(s) thereof, and a pharmaceutically acceptable carrier. Such vaccines have therapeutic and prophylactic utilities.
In another aspect, the invention features a method of evaluating a compound, e.g. a polypeptide, e.g., a fragment of a host cell polypeptide, for the ability to bind a S. epidermidis polypeptide. The method includes: contacting the Staphylococcus compound with a S. epidermidis polypeptide and determining if the compound binds or otherwise interacts with a S. epidermidis polypeptide. Compounds which bind S. epidermidis are candidates as activators or inhibitors of the bacterial life cycle. These assays can be performed in vitro or in vivo.
In another aspect, the invention features a method of evaluating a compound, e.g. a polypeptide, e.g., a fragment of a host cell polypeptide, for the ability to bind a S. epidermidis nucleic acid, e.g., DNA or RNA. The method includes: contacting the Staphylococcus compound with a S. epidermidis nucleic acid and determining if the compound binds or otherwise interacts with a S. epidermidis polypeptide. Compounds which bind S. epidermidis are candidates as activators or inhibitors of the bacterial life cycle. These assays can be performed in vitro or in vivo.
A particularly preferred embodiment of the invention is directed to a method of screening test compounds for anti-bacterial activity, which method comprises: selecting as a target a bacterial specific sequence, which sequence is essential to the viability of a bacterial species; contacting a test compound with said target sequence; and selecting those test compounds which bind to said target sequence as potential anti-bacterial candidates. In one embodiment, the target sequence selected is specific to a single species, or even a single strain, i.e., the S. epidermidis 18972. In a second embodiment, the target sequence is common to at least two species of bacteria. In a third embodiment, the target sequence is common to a family of bacteria. The target sequence may be a nucleic acid sequence or a polypeptide sequence. Methods employing sequences common to more than one species of microorganism may be used to screen candidates for broad spectrum anti-bacterial activity.
The invention also provides methods for preventing or treating disease caused by certain bacteria, including S. epidermidis, which are carried out by administering to an animal in need of such treatment, in particular a warm-blooded vertebrate, including but not limited to birds and mammals, a compound that specifically inhibits or interferes with the function of a bacterial polypeptide or nucleic acid. In a particularly preferred embodiment, the mammal to be treated is human.
The sequences of the present invention include the specific nucleic acid and amino acid sequences set forth in the Sequence Listing that forms a part of the present specification, and which are designated SEQ ID NO: 1-SEQ ID NO: 5674. Use of the terms xe2x80x9cSEQ ID NO: 1-SEQ ID NO: 2837,xe2x80x9d xe2x80x9cSEQ ID NO: 2838-SEQ ID NO: 5674,xe2x80x9d and xe2x80x9cthe sequences depicted in Table 2xe2x80x9d, etc., is intended, for convenience, to refer to each individual SEQ ID NO individually, and is not intended to refer to the genus of these sequences. In other words, it is a shorthand for listing all of these sequences individually. The invention encompasses each sequence individually, as well as any combination thereof.
Definitions
xe2x80x9cNucleic acidxe2x80x9d or xe2x80x9cpolynucleotidexe2x80x9d as used herein refers to purine- and pyrimidine-containing polymers of any length, either polyribonucleotides or polydeoxyribonucleotides or mixed polyribo-polydeoxyribo nucleotides. This includes single- and double-stranded molecules, i.e., DNA-DNA, DNA-RNA and RNA-RNA hybrids, as well as xe2x80x9cprotein nucleic acidsxe2x80x9d (PNA) formed by conjugating bases to an amino acid backbone. This also includes nucleic acids containing modified bases.
A nucleic acid or polypeptide sequence that is xe2x80x9cderived fromxe2x80x9d a designated sequence refers to a sequence that corresponds to a region of the designated sequence. For nucleic acid sequences, this encompasses sequences that are homologous or complementary to the sequence, as well as xe2x80x9csequence-conservative variantsxe2x80x9d and xe2x80x9cfunction-conservative variants.xe2x80x9d For polypeptide sequences, this encompasses xe2x80x9cfunction-conservative variants.xe2x80x9d Sequence-conservative variants are those in which a change of one or more nucleotides in a given codon position results in no alteration in the amino acid encoded at that position. Function-conservative variants are those in which a given amino acid residue in a polypeptide has been changed without altering the overall conformation and function of the native polypeptide, including, but not limited to, replacement of an amino acid with one having similar physico-chemical properties (such as, for example, acidic, basic, hydrophobic, and the like). xe2x80x9cFunction-conservativexe2x80x9d variants also include any polypeptides that have the ability to elicit antibodies specific to a designated polypeptide.
An xe2x80x9cS. epidermidis-derivedxe2x80x9d nucleic acid or polypeptide sequence may or may not be present in other bacterial species, and may or may not be present in all S. epidermidis strains. This term is intended to refer to the source from which the sequence was originally isolated. Thus, a S. epidermidis-derived polypeptide, as used herein, may be used, e.g., as a target to screen for a broad spectrum antibacterial agent, to search for homologous proteins in other species of bacteria or in eukaryotic organisms such as fungi and humans, etc.
A purified or isolated polypeptide or a substantially pure preparation of a polypeptide are used interchangeably herein and, as used herein, mean a polypeptide that has been separated from other proteins, lipids, and nucleic acids with which it naturally occurs. Preferably, the polypeptide is also separated from substances, e.g., antibodies or gel matrix, e.g., polyacrylamide, which are used to purify it. Preferably, the polypeptide constitutes at least 10, 20, 50 70, 80 or 95% dry weight of the purified preparation. Preferably, the preparation contains: sufficient polypeptide to allow protein sequencing; at least 1, 10, or 100 mg of the polypeptide.
A purified preparation of cells refers to, in the case of plant or animal cells, an in vitro preparation of cells and not an entire intact plant or animal. In the case of cultured cells or microbial cells, it consists of a preparation of at least 10% and more preferably 50% of the subject cells.
A purified or isolated or a substantially pure nucleic acid, e.g., a substantially pure DNA, (are terms used interchangeably herein) is a nucleic acid which is one or both of the following: not immediately contiguous with both of the coding sequences with which it is immediately contiguous (i.e., one at the 5xe2x80x2 end and one at the 3xe2x80x2 end) in the naturally-occurring genome of the organism from which the nucleic acid is derived; or which is substantially free of a nucleic acid with which it occurs in the organism from which the nucleic acid is derived. The term includes, for example, a recombinant DNA which is incorporated into a vector, e.g., into an autonomously replicating plasmid or virus, or into the genomic DNA of a prokaryote or eukaryote, or which exists as a separate molecule (e.g., a cDNA or a genomic DNA fragment produced by PCR or restriction endonuclease treatment) independent of other DNA sequences. Substantially pure DNA also includes a recombinant DNA which is part of a hybrid gene encoding additional S. epidermidis DNA sequence.
A xe2x80x9ccontigxe2x80x9d as used herein is a nucleic acid representing a continuous stretch of genomic sequence of an organism.
An xe2x80x9copen reading framexe2x80x9d, also referred to herein as ORF, is a region of nucleic acid which encodes a polypeptide. This region may represent a portion of a coding sequence or a total sequence and can be determined from a stop to stop codon or from a start to stop codon.
As used herein, a xe2x80x9ccoding sequencexe2x80x9d is a nucleic acid which is transcribed into messenger RNA and/or translated into a polypeptide when placed under the control of appropriate regulatory sequences. The boundaries of the coding sequence are determined by a translation start codon at the five prime terminus and a translation stop code at the three prime terminus. A coding sequence can include but is not limited to messenger RNA, synthetic DNA, and recombinant nucleic acid sequences.
A xe2x80x9ccomplementxe2x80x9d of a nucleic acid as used herein refers to an anti-parallel or antisense sequence that participates in Watson-Crick base-pairing with the original sequence.
A xe2x80x9cgene productxe2x80x9d is a protein or structural RNA which is specifically encoded by a gene.
As used herein, the term xe2x80x9cprobexe2x80x9d refers to a nucleic acid, peptide or other chemical entity which specifically binds to a molecule of interest. Probes are often associated with or capable of associating with a label. A label is a chemical moiety capable of detection. Typical labels comprise dyes, radioisotopes, luminescent and chemiluminescent moieties, fluorophores, enzymes, precipitating agents, amplification sequences, and the like. Similarly, a nucleic acid, peptide or other chemical entity which specifically binds to a molecule of interest and immobilizes such molecule is referred herein as a xe2x80x9ccapture ligandxe2x80x9d. Capture ligands are typically associated with or capable of associating with a support such as nitro-cellulose, glass, nylon membranes, beads, particles and the like. The specificity of hybridization is dependent on conditions such as the base pair composition of the nucleotides, and the temperature and salt concentration of the reaction. These conditions are readily discernable to one of ordinary skill in the art using routine experimentation.
xe2x80x9cHomologousxe2x80x9d refers to the sequence similarity or sequence identity between two polypeptides or between two nucleic acid molecules. When a position in both of the two compared sequences is occupied by the same base or amino acid monomer subunit, e.g., if a position in each of two DNA molecules is occupied by adenine, then the molecules are homologous at that position. The percent of homology between two sequences is a function of the number of matching or homologous positions shared by the two sequences divided by the number of positions compared xc3x97100. For example, if 6 of 10 of the positions in two sequences are matched or homologous then the two sequences are 60% homologous. By way of example, the DNA sequences ATTGCC and TATGGC share 50% homology. Generally, a comparison is made when two sequences are aligned to give maximum homology.
Nucleic acids are hybridizable to each other when at least one strand of a nucleic acid can anneal to the other nucleic acid under defined stringency conditions. Stringency of hybridization is determined by: (a) the temperature at which hybridization and/or washing is performed; and (b) the ionic strength and polarity of the hybridization and washing solutions. Hybridization requires that the two nucleic acids contain complementary sequences; depending on the stringency of hybridization, however, mismatches may be tolerated. Typically, hybridization of two sequences at high stringency (such as, for example, in a solution of 0.5xc3x97SSC, at 65xc2x0 C.) requires that the sequences be essentially completely homologous. Conditions of intermediate stringency (such as, for example, 2xc3x97SSC at 65xc2x0 C.) and low stringency (such as, for example 2xc3x97SSC at 55xc2x0 C.), require correspondingly less overall complementarity between the hybridizing sequences. (1xc3x97SSC is 0.15 M NaCl, 0.015 M Na citrate).
The terms peptides, proteins, and polypeptides are used interchangeably herein.
As used herein, the term xe2x80x9csurface proteinxe2x80x9d refers to all surface accessible proteins, e.g. inner and outer membrane proteins, proteins adhering to the cell wall, and secreted proteins.
A polypeptide has S. epidermidis biological activity if it has one, two and preferably more of the following properties: (1) if when expressed in the course of a S. epidermidis infection, it can promote, or mediate the attachment of S. epidermidis to a cell; (2) it has an enzymatic activity, structural or regulatory function characteristic of a S. epidermidis protein; (3) or the gene which encodes it can rescue a lethal mutation in a S. epidermidis gene. A polypeptide has biological activity if it is an antagonist, agonist, or super-agonist of a polypeptide having one of the above-listed properties.
A biologically active fragment or analog is one having an in vivo or in vitro activity which is characteristic of the S. epidermidis polypeptides of the invention contained in the Sequence Listing, or of other naturally occurring S. epidermidis polypeptides, e.g., one or more of the biological activities described herein. Especially preferred are fragments which exist in vivo, e.g., fragments which arise from post transcriptional processing or which arise from translation of alternatively spliced RNA""s. Fragments include those expressed in native or endogenous cells as well as those made in expression systems, e.g., in CHO (Chinese Hamster Ovary) cells. Because peptides such as S. epidermidis polypeptides often exhibit a range of physiological properties and because such properties may be attributable to different portions of the molecule, a useful S. epidermidis fragment or S. epidermidis analog is one which exhibits a biological activity in any biological assay for S. epidermidis activity. Most preferably the fragment or analog possesses 10%, preferably 40%, more preferably 60%, 70%, 80% or 90% or greater of the activity of S. epidermidis, in any in vivo or in vitro assay.
Analogs can differ from naturally occurring S. epidermidis polypeptides in amino acid sequence or in ways that do not involve sequence, or both. Non-sequence modifications include changes in acetylation, methylation, phosphorylation, carboxylation, or glycosylation. Preferred analogs include S. epidermidis polypeptides (or biologically active fragments thereof) whose sequences differ from the wild-type sequence by one or more conservative amino acid substitutions or by one or more non-conservative amino acid substitutions, deletions, or insertions which do not substantially diminish the biological activity of the S. epidermidis polypeptide. Conservative substitutions typically include the substitution of one amino acid for another with similar characteristics, e.g., substitutions within the following groups: valine, glycine; glycine, alanine; valine, isoleucine, leucine; aspartic acid, glutamic acid; asparagine, glutamine; serine, threonine; lysine, arginine; and phenylalanine, tyrosine. Other conservative substitutions can be made in view of the table below.
Other analogs within the invention are those with modifications which increase peptide stability; such analogs may contain, for example, one or more non-peptide bonds (which replace the peptide bonds) in the peptide sequence. Also included are: analogs that include residues other than naturally occurring L-amino acids, e.g., D-amino acids or non-naturally occurring or synthetic amino acids, e.g., xcex2 or xcex3 amino acids; and cyclic analogs.
As used herein, the term xe2x80x9cfragmentxe2x80x9d, as applied to a S. epidermidis analog, will ordinarily be at least about 20 residues, more typically at least about 40 residues, preferably at least about 60 residues in length. Fragments of S. epidermidis polypeptides can be generated by methods known to those skilled in the art. The ability of a Staphylococcus fragment to exhibit a biological activity of S. epidermidis polypeptide can be assessed by methods known to those skilled in the art as described herein. Also included are S. epidermidis polypeptides containing residues that are not required for biological activity of the peptide or that result from alternative mRNA splicing or alternative protein processing events.
An xe2x80x9cimmunogenic componentxe2x80x9d as used herein is a moiety, such as a S. epidermidis polypeptide, analog or fragment thereof, that is capable of eliciting a humoral and/or cellular immune response in a host animal.
An xe2x80x9cantigenic componentxe2x80x9d as used herein is a moiety, such as a S. epidermidis polypeptide, analog or fragment thereof, that is capable of binding to a specific antibody with sufficiently high affinity to form a detectable antigen-antibody complex.
The term xe2x80x9cantibodyxe2x80x9d as used herein is intended to include fragments thereof which are specifically reactive with S. epidermidis polypeptides.
As used herein, the term xe2x80x9ccell-specific promoterxe2x80x9d means a DNA sequence that serves as a promoter, i.e., regulates expression of a selected DNA sequence operably linked to the promoter, and which effects expression of the selected DNA sequence in specific cells of a tissue. The term also covers so-called xe2x80x9cleakyxe2x80x9d promoters, which regulate expression of a selected DNA primarily in one tissue, but cause expression in other tissues as well.
Misexpression, as used herein, refers to a non-wild type pattern of gene expression. It includes: expression at non-wild type levels, i.e., over or under expression; a pattern of expression that differs from wild type in terms of the time or stage at which the gene is expressed, e.g., increased or decreased expression (as compared with wild type) at a predetermined developmental period or stage; a pattern of expression that differs from wild type in terms of increased expression (as compared with wild type) in a predetermined cell type or tissue type; a pattern of expression that differs from wild type in terms of the splicing size, amino acid sequence, post-translational modification, or biological activity of the expressed polypeptide; a pattern of expression that differs from wild type in terms of the effect of an environmental stimulus or extracellular stimulus on expression of the gene, e.g., a pattern of increased or decreased expression (as compared with wild type) in the presence of an increase or decrease in the strength of the stimulus.
As used herein, xe2x80x9chost cellsxe2x80x9d and other such terms denoting microorganisms or higher eukaryotic cell lines cultured as unicellular entities refers to cells which can become or have been used as recipients for a recombinant vector or other transfer DNA, and include the progeny of the original cell which has been transfected. It is understood by individuals skilled in the art that the progeny of a single parental cell may not necessarily be completely identical in genomic or total DNA compliment to the original parent, due to accident or deliberate mutation.
As used herein, the term xe2x80x9ccontrol sequencexe2x80x9d refers to a nucleic acid having a base sequence which is recognized by the host organism to effect the expression of encoded sequences to which they are ligated. The nature of such control sequences differs depending upon the host organism; in prokaryotes, such control sequences generally include a promoter, ribosomal binding site, terminators, and in some cases operators; in eukaryotes, generally such control sequences include promoters, terminators and in some instances, enhancers. The term control sequence is intended to include at a minimum, all components whose presence is necessary for expression, and may also include additional components whose presence is advantageous, for example, leader sequences.
As used herein, the term xe2x80x9coperably linkedxe2x80x9d refers to sequences joined or ligated to function in their intended manner. For example, a control sequence is operably linked to coding sequence by ligation in such a way that expression of the coding sequence is achieved under conditions compatible with the control sequence and host cell.
The xe2x80x9cmetabolismxe2x80x9d of a substance, as used herein, means any aspect of the expression, function, action, or regulation of the substance. The metabolism of a substance includes modifications, e.g., covalent or non-covalent modifications of the substance. The metabolism of a substance includes modifications, e.g., covalent or non-covalent modification, the substance induces in other substances. The metabolism of a substance also includes changes in the distribution of the substance. The metabolism of a substance includes changes the substance induces in the distribution of other substances.
A xe2x80x9csamplexe2x80x9d as used herein refers to a biological sample, such as, for example, tissue or fluid isolated from an individual (including without limitation plasma, serum, cerebrospinal fluid, lymph, tears, saliva and tissue sections) or from in vitro cell culture constituents, as well as samples from the environment.
Technical and scientific terms used herein have the meanings commonly understood by one of ordinary skill in the art to which the present invention pertains, unless otherwise defined. Reference is made herein to various methodologies known to those of skill in the art. Publications and other materials setting forth such known methodologies to which reference is made are incorporated herein by reference in their entireties as though set forth in full. The practice of the invention will employ, unless otherwise indicated, conventional techniques of chemistry, molecular biology, microbiology, recombinant DNA, and immunology, which are within the skill of the art. Such techniques are explained fully in the literature. See e.g., Sambrook, Fritsch, and Maniatis, Molecular Cloning: Laboratory Manual 2nd ed. (1989); DNA Cloning, Volumes I and II (D. N Glover ed. 1985); Oligonucleotide Synthesis (M. J. Gait ed, 1984); Nucleic Acid Hybridization (B. D. Hames and S. J. Higgins eds. 1984); the series, Methods in Enzymology (Academic Press, Inc.), particularly Vol. 154 and Vol. 155 (Wu and Grossman, eds.); PCR-A Practical Approach (McPherson, Quirke, and Taylor, eds., 1991); Immunology, 2d Edition, 1989, Roitt et al., C. V. Mosby Company, and New York; Advanced Immunology, 2d Edition, 1991, Male et al., Grower Medical Publishing, New York.; DNA Cloning. A Practical Approach, Volumes I and II, 1985 (D. N. Glover ed.); Oligonucleotide Synthesis, 1984, (M. L. Gait ed); Transcription and Translation, 1984 (Hames and Higgins eds.); Animal Cell Culture, 1986 (R. I. Freshney ed.); Immobilized Cells and Enzymes, 1986 (IRL Press); Perbal, 1984, A Practical Guide to Molecular Cloning; and Gene Transfer Vectors for Mammalian Cells, 1987 (J. H. Miller and M. P. Calos eds., Cold Spring Harbor Laboratory).
Any suitable materials and/or methods known to those of skill can be utilized in carrying out the present invention: however preferred materials and/or methods are described. Materials, reagents and the like to which reference is made in the following description and examples are obtainable from commercial sources, unless otherwise noted.
S. epidermidis Genomic Sequence
This invention provides nucleotide sequences of the genome of S. epidermidis which thus comprises a DNA sequence library of S. epidermidis genomic DNA. The detailed description that follows provides nucleotide sequences of S. epidermidis, and also describes how the sequences were obtained and how ORFs and protein-coding sequences were identified. Also described are methods of using the disclosed S. epidermidis sequences in methods including diagnostic and therapeutic applications. Furthermore, the library can be used as a database for identification and comparison of medically important sequences in this and other strains of S. epidermidis. 
To determine the genomic sequence of S. epidermidis, DNA from strain 18972 of S. epidermidis was isolated after Zymolyase digestion, sodium dodecyl sulfate lysis, potassium acetate precipitation, phenol:chloroform extraction and ethanol precipitation (Soll, D. R., T. Srikantha and S. R. Lockhart: Characterizing Developmentally Regulated Genes in S. epidermidis. In Microbial Genome Methods. K. W. Adolph, editor. CRC Press. New York. p 17-37.). DNA was sheared hydrodynamically using an HPLC (Oefner, et. al., 1996) to an insert size of 2000-3000 bp. After size fractionation by gel electrophoresis the fragments were blunt-ended, ligated to adapter oligonucleotides and cloned into the pGTC (Thomann) vector to construct a xe2x80x9cshotgunxe2x80x9d subclone library
DNA sequencing was achieved using established ABI sequencing methods on ABI377 automated DNA sequencers. The cloning and sequencing procedures are described in more detail in the Exemplification.
Individual sequence reads were assembled using PHRAP (P. Green, Abstracts of DOE Human Genome Program Contractor-Grantee Workshop V, January 1996, p. 157). The average contig length was about 3-4 kb.
All subsequent steps were based on sequencing by ABI377 automated DNA sequencing methods. The cloning and sequencing procedures are described in more detail in the Exemplification.
A variety of approaches are used to order the contigs so as to obtain a continuous sequence representing the entire S. epidermidis genome. Synthetic oligonucleotides are designed that are complementary to sequences at the end of each contig. These oligonucleotides may be hybridized to libraries of S. epidermidis genomic DNA in, for example, lambda phage vectors or plasmid vectors to identify clones that contain sequences corresponding to the junctional regions between individual contigs. Such clones are then used to isolate template DNA and the same oligonucleotides are used as primers in polymerase chain reaction (PCR) to amplify junctional fragments, the nucleotide sequence of which is then determined.
The S. epidermidis sequences were analyzed for the presence of open reading frames (ORFs) comprising at least 180 nucleotides. As a result of the analysis of ORFs based on stop-to-stop codon reads, it should be understood that these ORFs may not correspond to the ORF of a naturally-occurring S. epidermidis polypeptide. These ORFs may contain start codons which indicate the initiation of protein synthesis of a naturally-occurring S. epidermidis polypeptide. Such start codons within the ORFs provided herein were identified by those of ordinary skill in the relevant art, and the resulting ORF and the encoded S. epidermidis polypeptide is within the scope of this invention. For example, within the ORFs a codon such as AUG or GUG (encoding methionine or valine) which is part of the initiation signal for protein synthesis were identified and the portion of an ORF to corresponding to a naturally-occurring S. epidermidis polypeptide was recognized. The predicted coding regions were defined by evaluating the coding potential of such sequences with the program GENEMARK(trademark) (Borodovsky and McIninch, 1993, Comp. 17:123).
Each predicted ORF amino acid sequence was compared with all sequences found in current GENBANK, SWISS-PROT, and PIR databases using the BLAST algorithm. BLAST identifies local alignments occurring by chance between the ORF sequence and the sequence in the databank (Altschal et al., 1990, L Mol. Biol. 215:403-410). Homologous ORFs (probabilities less than 10xe2x88x925 by chance) and ORF""s that are probably non-homologous (probabilities greater than 10xe2x88x925 by chance) but have good codon usage were identified. Both homologous, sequences and non-homologous sequences with good codon usage, are likely to encode proteins and are encompassed by the invention.
S. epidermidis Nucleic Acids
The present invention provides a library of S. epidermidis-derived nucleic acid sequences. The libraries provide probes, primers, and markers which are used as markers in epidemiological studies. The present invention also provides a library of S. epidermidis-derived nucleic acid sequences which comprise or encode targets for therapeutic drugs.
The nucleic acids of this invention may be obtained directly from the DNA of the above referenced S. epidermidis strain by using the polymerase chain reaction (PCR). See xe2x80x9cPCR, A Practical Approachxe2x80x9d (McPherson, Quirke, and Taylor, eds., IRL Press, Oxford, UK, 1991) for details about the PCR. High fidelity PCR issued to ensure a faithful DNA copy prior to expression. In addition, the authenticity of amplified products is verified by conventional sequencing methods. Clones carrying the desired sequences described in this invention may also be obtained by screening the libraries by means of the PCR or by hybridization of synthetic oligonucleotide probes to filter lifts of the library colonies or plaques as known in the art (see, e.g., Sambrook et al., Molecular Cloning, A Laboratory Manual 2nd edition, 1989, Cold Spring Harbor Press, N.Y.).
It is also possible to obtain nucleic acids encoding S. epidermidis polypeptides from a cDNA library in accordance with protocols herein described. A cDNA encoding a S. epidermidis polypeptide can be obtained by isolating total mRNA from an appropriate strain. Double stranded cDNAs can then be prepared from the total mRNA. Subsequently, the cDNAs can be inserted into a suitable plasmid or viral (e.g., bacteriophage) vector using any one of a number of known techniques. Genes encoding S. epidermidis polypeptides can also be cloned using established polymerase chain reaction techniques in accordance with the nucleotide sequence information provided by the invention. The nucleic acids of the invention can be DNA or RNA. Preferred nucleic acids of the invention are contained in the Sequence Listing.
The nucleic acids of the invention can also be chemically synthesized using standard techniques. Various methods of chemically synthesizing polydeoxynucleotides are known, including solid-phase synthesis which, like peptide synthesis, has been fully automated in commercially available DNA synthesizers (See e.g., Itakura et al. U.S. Pat. No. 4,598,049; Caruthers et al. U.S. Pat. No. 4,458,066; and Itakura U.S. Pat. Nos. 4,401,796 and 4,373,071, incorporated by reference herein).
In another example, DNA can be chemically synthesized using, e.g., the phosphoramidite solid support method of Matteucci et al., 1981, J. Am. Chem. Soc. 103:3185, the method of Yoo et al., 1989, J. Biol. Chem. 764:17078, or other well known methods. This can be done by sequentially linking a series of oligonucleotide cassettes comprising pairs of synthetic oligonucleotides, as described below.
Nucleic acids isolated or synthesized in accordance with features of the present invention are useful, by way of example, without limitation, as probes, primers, capture ligands, antisense genes and for developing expression systems for the synthesis of proteins and peptides corresponding to such sequences. As probes, primers, capture ligands and antisense agents, the nucleic acid normally consists of all or part (approximately twenty or more nucleotides for specificity as well as the ability to form stable hybridization products) of the nucleic acids of the invention contained in the Sequence Listing. These uses are described in further detail below.
Probes
A nucleic acid isolated or synthesized in accordance with the sequence of the invention contained in the Sequence Listing can be used as a probe to specifically detect S. epidermidis. With the sequence information set forth in the present application, sequences of twenty or more nucleotides are identified which provide the desired inclusivity and exclusivity with respect to S. epidermidis, and extraneous nucleic acids likely to be encountered during hybridization conditions. More preferably, the sequence will comprise at least twenty to thirty nucleotides to convey stability to the hybridization product formed between the probe and the intended target molecules.
Sequences larger than 1000 nucleotides in length are difficult to synthesize but can be generated by recombinant DNA techniques. Individuals skilled in the art will readily recognize that the nucleic acids, for use as probes, can be provided with a label to facilitate detection of a hybridization product.
Nucleic acid isolated and synthesized in accordance with the sequence of the invention contained in the Sequence Listing can also be useful as probes to detect homologous regions (especially homologous genes) of other Staphylococcus species using appropriate stringency hybridization conditions as described herein.
Capture Ligand
For use as a capture ligand, the nucleic acid selected in the manner described above with respect to probes, can be readily associated with a support. The manner in which nucleic acid is associated with supports is well known. Nucleic acid having twenty or more nucleotides in a sequence of the invention contained in the Sequence Listing have utility to separate S. epidermidis nucleic acid from one strain from the nucleic acid of other another strain as well as from other organisms. Nucleic acid having twenty or more nucleotides in a sequence of the invention contained in the Sequence Listing can also have utility to separate other Staphylococcus species from each other and from other organisms. Preferably, the sequence will comprise at least twenty nucleotides to convey stability to the hybridization product formed between the probe and the intended target molecules. Sequences larger than 1000 nucleotides in length are difficult to synthesize but can be generated by recombinant DNA techniques.
Primers
Nucleic acid isolated or synthesized in accordance with the sequences described herein have utility as primers for the amplification of S. epidermidis nucleic acid. These nucleic acids may also have utility as primers for the amplification of nucleic acids in other Staphylococcus species. With respect to polymerase chain reaction (PCR) techniques, nucleic acid sequences of xe2x89xa710-15 nucleotides of the invention contained in the Sequence Listing have utility in conjunction with suitable enzymes and reagents to create copies of S. epidermidis nucleic acid. More preferably, the sequence will comprise twenty or more nucleotides to convey stability to the hybridization product formed between the primer and the intended target molecules. Binding conditions of primers greater than 100 nucleotides are more difficult to control to obtain specificity. High fidelity PCR can be used to ensure a faithful DNA copy prior to expression. In addition, amplified products can be checked by conventional sequencing methods.
The copies can be used in diagnostic assays to detect specific sequences, including genes from S. epidermidis and/or other Staphylococcus species. The copies can also be incorporated into cloning and expression vectors to generate polypeptides corresponding to the nucleic acid synthesized by PCR, as is described in greater detail herein.
The nucleic acids of the present invention find use as templates for the recombinant production of S. epidermidis-derived peptides or polypeptides.
Antisense
Nucleic acid or nucleic acid-hybridizing derivatives isolated or synthesized in accordance with the sequences described herein have utility as antisense agents to prevent the expression of S. epidermidis genes. These sequences also have utility as antisense agents to prevent expression of genes of other Staphylococcus species.
In one embodiment, nucleic acid or derivatives corresponding to S. epidermidis nucleic acids is loaded into a suitable carrier such as a liposome or bacteriophage for introduction into bacterial cells. For example, a nucleic acid having twenty or more nucleotides is capable of binding to bacteria nucleic acid or bacteria messenger RNA. Preferably, the antisense nucleic acid is comprised of 20 or more nucleotides to provide necessary stability of a hybridization product of non-naturally occurring nucleic acid and bacterial nucleic acid and/or bacterial messenger RNA. Nucleic acid having a sequence greater than 1000 nucleotides in length is difficult to synthesize but can be generated by recombinant DNA techniques. Methods for loading antisense nucleic acid in liposomes is known in the art as exemplified by U.S. Pat. No. 4,241,046 issued Dec. 23, 1980 to Papahadjopoulos et al.
The present invention encompasses isolated polypeptides and nucleic acids derived from S. epidermidis that are useful as reagents for diagnosis of bacterial infection, components of effective anti-bacterial vaccines, and/or as targets for anti-bacterial drugs, including anti-S. epidermidis drugs.
Expression of S. epidermidis Nucleic Acids
Table 2, which is appended herewith and which forms part of the present specification, provides a list of open reading frames (ORFs) in both strands and a putative identification of the particular function of a polypeptide which is encoded by each ORF, based on the homology match (determined by the BLAST algorithm) of the predicted polypeptide with known proteins encoded by ORFs in other organisms. An ORF is a region of nucleic acid which encodes a polypeptide. This region may represent a portion of a coding sequence or a total sequence and was determined from stop to stop codons. The first column contains a designation for the contig from which each ORF was identified (numbered arbitrarily). Each contig represents a continuous stretch of the genomic sequence of the organism. The second column lists the ORF designation. The third and fourth columns list the SEQ ID numbers for the nucleic acid and amino acid sequences corresponding to each ORF, respectively. The fifth and sixth columns list the length of the nucleic acid and the length of the amino acid, respectively. The nucleotide sequence corresponding to each ORF designation begins at the first nucleotide immediately following a stop codon and ends at the nucleotide immediately preceding the next downstream stop codon in the same reading frame. It will be recognized by one skilled in the art that the natural translation initiation sites will correspond to ATG, GTG, or TTG codons located within the ORFs. The natural initiation sites depend not only on the sequence of a start codon but also on the context of the DNA sequence adjacent to the start codon. Usually, a recognizable ribosome binding site is found within 20 nucleotides upstream from the initiation codon. In some cases where genes are translationally coupled and coordinately expressed together in xe2x80x9coperonsxe2x80x9d, ribosome binding sites are not present, but the initiation codon of a downstream gene may occur very close to, or overlap, the stop codon of the an upstream gene in the same operon. The correct start codons can be generally identified without undue experimentation because only a few codons need be tested. It is recognized that the translational machinery in bacteria initiates all polypeptide chains with the amino acid methionine, regardless of the sequence of the start codon. In some cases, polypeptides are post-translationally modified, resulting in an N-terminal amino acid other than methionine in vivo. The seventh and eighth columns provide metrics for assessing the likelihood of the homology match (determined by the BLASTP2 algorithm), as is known in the art, to the genes indicated in the eleventh column when the designated ORF was compared against a non-redundant comprehensive protein database. Specifically, the seventh column represents the xe2x80x9cBlast Scorexe2x80x9d for the match (a higher score is a better match), and the eighth column represents the xe2x80x9cP-valuexe2x80x9d for the match (the probability that such a match can have occurred by chance; the lower the value, the more likely the match is valid). If a BLASTP2 score of less than 46 was obtained, no value is reported in the table the , xe2x80x9cP-valuexe2x80x9d. Column nine, Subject Taxonomy,xe2x80x9d provides the name of the organism that was identified as having the closest homology match. The tenth column,xe2x80x9cSubject Name,xe2x80x9d provides where available, either a public database accession number or our own sequence name. The eleventh column provides, where available, the Swissprot accession number (SP), the locus name (LN), the Organism (OR), Source of variant (SR), E.C. number (EC),the gene name (GN), the product name (PN), the Function Description (FN), Left End (LE), Right End (RE), Coding Direction (DI), and the description (DE) or notes (NT) for each ORF. Information that is not preceded by a code designation in the eleventh column represents a description of the ORF. This information allows one of ordinary skill in the art to determine a potential use for each identified coding sequence and, as a result, allows use of the polypeptides of the present invention for commercial and industrial purposes.
Using the information provided in SEQ ID NO: 1-SEQ ID NO: 2837 and in Table 2 together with routine cloning and sequencing methods, one of ordinary skill in the art will be able to clone and sequence all the nucleic acid fragments of interest including open reading frames (ORFs) encoding a large variety proteins of S. epidermidis. 
Nucleic acid isolated or synthesized in accordance with the sequences described herein have utility to generate polypeptides. The nucleic acid of the invention exemplified in SEQ ID NO: 1-SEQ ID NO: 2837 and in Table 2 or fragments of said nucleic acid encoding active portions of S. epidermidis polypeptides can be cloned into suitable vectors or used to isolate nucleic acid. The isolated nucleic acid is combined with suitable DNA linkers and cloned into a suitable vector.
The function of a specific gene or operon can be ascertained by expression in a bacterial strain under conditions where the activity of the gene product(s) specified by the gene or operon in question can be specifically measured. Alternatively, a gene product may be produced in large quantities in an expressing strain for use as an antigen, an industrial reagent, for structural studies, etc. This expression can be accomplished in a mutant strain which lacks the activity of the gene to be tested, or in a strain that does not produce the same gene product(s). This includes, but is not limited to, Eucaryotic species such as the yeast Saccharomyces cerevisiae, Methanobacterium strains or other Archaea, and Eubacteria such as E. coli, B. Subtilis, S. Aureus, S. Pneumonia or Pseudomonas putida. In some cases the expression host will utilize the natural S. epidermidis promoter whereas in others, it will be necessary to drive the gene with a promoter sequence derived from the expressing organism (e.g., an E. coli beta-galactosidase promoter for expression in E. coli).
To express a gene product using the natural S. epidermidis promoter, a procedure such as the following can be used. A restriction fragment containing the gene of interest, together with its associated natural promoter element and regulatory sequences (identified using the DNA sequence data) is cloned into an appropriate recombinant plasmid containing an origin of replication that functions in the host organism and an appropriate selectable marker. This can be accomplished by a number of procedures known to those skilled in the art. It is most preferably done by cutting the plasmid and the fragment to be cloned with the same restriction enzyme to produce compatible ends that can be ligated to join the two pieces together. The recombinant plasmid is introduced into the host organism by, for example, electroporation and cells containing the recombinant plasmid are identified by selection for the marker on the plasmid. Expression of the desired gene product is detected using an assay specific for that gene product.
In the case of a gene that requires a different promoter, the body of the gene (coding sequence) is specifically excised and cloned into an appropriate expression plasmid. This subcloning can be done by several methods, but is most easily accomplished by PCR amplification of a specific fragment and ligation into an expression plasmid after treating the PCR product with a restriction enzyme or exonuclease to create suitable ends for cloning.
A suitable host cell for expression of a gene can be any procaryotic or eucaryotic cell. Suitable methods for transforming host cells can be found in Sambrook et al. (Molecular Cloning: A Laboratory Manual, 2nd Edition, Cold Spring Harbor Laboratory press (1989)), and other laboratory textbooks.
For example, a host cell transfected with a nucleic acid vector directing expression of a nucleotide sequence encoding a S. epidermidis polypeptide can be cultured under appropriate conditions to allow expression of the polypeptide to occur. Suitable media for cell culture are well known in the art. Polypeptides of the invention can be isolated from cell culture medium, host cells, or both using techniques known in the art for purifying proteins including ion-exchange chromatography, gel filtration chromatography, ultrafiltration, electrophoresis, and immunoaffinity purification with antibodies specific for such polypeptides. Additionally, in many situations, polypeptides can be produced by chemical cleavage of a native protein (e.g., tryptic digestion) and the cleavage products can then be purified by standard techniques.
In the case of membrane bound proteins, these can be isolated from a host cell by contacting a membrane-associated protein fraction with a detergent forming a solubilized complex, where the membrane-associated protein is no longer entirely embedded in the membrane fraction and is solubilized at least to an extent which allows it to be chromatographically isolated from the membrane fraction. Chromatographic techniques which can be used in the final purification step are known in the art and include hydrophobic interaction, lectin affinity, ion exchange, dye affinity and immunoaffinity.
One strategy to maximize recombinant S. epidermidis peptide expression in E. coli is to express the protein in a host bacteria with an impaired capacity to proteolytically cleave the recombinant protein (Gottesman, S., Gene Expression Technology: Methods in Enzymology 185, Academic Press, San Diego, Calif. (1990) 119-128). Another strategy would be to alter the nucleic acid encoding a S. epidermidis peptide to be inserted into an expression vector so that the individual codons for each amino acid would be those preferentially utilized in highly expressed E. coli proteins (Wada et al., (1992) Nuc. Acids Res. 20:2111-2118). Such alteration of nucleic acids of the invention can be carried out by standard DNA synthesis techniques.
The nucleic acids of the invention can also be chemically synthesized using standard techniques. Various methods of chemically synthesizing polydeoxynucleotides are known, including solid-phase synthesis which, like peptide synthesis, has been fully automated in commercially available DNA synthesizers (See, e.g., Itakura et al. U.S. Pat. No. 4,598,049; Caruthers et al. U.S. Pat. No. 4,458,066; and Itakura U.S. Pat. Nos. 4,401,796 and 4,373,071, incorporated by reference herein).
The present invention provides a library of S. epidermidis-derived nucleic acid sequences. The libraries provide probes primers, and markers which can be used as markers in epidemiological studies. The present invention also provides a library of S. epidermidis-derived nucleic acid sequences which comprise or encode targets for therapeutic drugs.
Nucleic acids comprising any of the sequences disclosed herein or sub-sequences thereof can be prepared by standard methods using the nucleic acid sequence information provided in SEQ ID NO: 1-SEQ ID NO: 2837. For example, DNA can be chemically synthesized using, e.g., the phosphoramidite solid support method of Matteucci et al., 1981, J. Am. Chem. Soc. 103:3185, the method of Yoo et al., 1989, J. Biol. Chem. 764:17078, or other well known methods. This can be done by sequentially linking a series of oligonucleotide cassettes comprising pairs of synthetic oligonucleotides, as described below.
Of course, due to the degeneracy of the genetic code, many different nucleotide sequences can encode polypeptides having the amino acid sequences defined by SEQ ID NO: 2838-SEQ ID NO: 5674 or sub-sequences thereof. The codons can be selected for optimal expression in prokaryotic or eukaryotic systems. Such degenerate variants are also encompassed by this invention.
Insertion of nucleic acids (typically DNAs) encoding the polypeptides of the invention into a vector is easily accomplished when the termini of both the DNAs and the vector comprise compatible restriction sites. If this cannot be done, it may be necessary to modify the termini of the DNAs and/or vector by digesting back single-stranded DNA overhangs generated by restriction endonuclease cleavage to produce blunt ends, or to achieve the same result by filling in the single-stranded termini with an appropriate DNA polymerase.
Alternatively, any site desired may be produced, e.g., by ligating nucleotide sequences (linkers) onto the termini. Such linkers may comprise specific oligonucleotide sequences that define desired restriction sites. Restriction sites can also be generated by the use of the polymerase chain reaction (PCR). See, e.g., Saiki et al., 1988, Science 239:48. The cleaved vector and the DNA fragments may also be modified if required by homopolymeric tailing.
The nucleic acids of the invention may be isolated directly from cells. Alternatively, the polymerase chain reaction (PCR) method can be used to produce the nucleic acids of the invention, using either chemically synthesized strands or genomic material as templates. Primers used for PCR can be synthesized using the sequence information provided herein and can further be designed to introduce appropriate new restriction sites, if desirable, to facilitate incorporation into a given vector for recombinant expression.
The nucleic acids of the present invention may be flanked by natural S. epidermidis regulatory sequences, or may be associated with heterologous sequences, including promoters, enhancers, response elements, signal sequences, polyadenylation sequences, introns, 5xe2x80x2- and 3xe2x80x2-noncoding regions, and the like. The nucleic acids may also be modified by many means known in the art. Non-limiting examples of such modifications include methylation, xe2x80x9ccapsxe2x80x9d, substitution of one or more of the naturally occurring nucleotides with an analog, internucleotide modifications such as, for example, those with uncharged linkages (e.g., methyl phosphonates, phosphotriesters, phosphoroamidates, carbamates, etc.) and with charged linkages (e.g., phosphorothioates, phosphorodithioates, etc.). Nucleic acids may contain one or more additional covalently linked moieties, such as, for example, proteins (e.g., nucleases, toxins, antibodies, signal peptides, poly-L-lysine, etc.), intercalators (e.g., acridine, psoralen, etc.), chelators (e.g., metals, radioactive metals, iron, oxidative metals, etc.), and alkylators. PNAs are also included. The nucleic acid may be derivatized by formation of a methyl or ethyl phosphotriester or an alkyl phosphoramidate linkage. Furthermore, the nucleic acid sequences of the present invention may also be modified with a label capable of providing a detectable signal, either directly or indirectly. Exemplary labels include radioisotopes, fluorescent molecules, biotin, and the like.
The invention also provides nucleic acid vectors comprising the disclosed S. epidermidis-derived sequences or derivatives or fragments thereof. A large number of vectors, including plasmid and bacterial vectors, have been described for replication and/or expression in a variety of eukaryotic and prokaryotic hosts, and may be used for cloning or protein expression.
The encoded S. epidermidis polypeptides may be expressed by using many known vectors, such as pUC plasmids, pET plasmids (Novagen, Inc., Madison, Wis.), or pRSET or pREP (Invitrogen, San Diego, Calif.), and many appropriate host cells, using methods disclosed or cited herein or otherwise known to those skilled in the relevant art. The particular choice of vector/host is not critical to the practice of the invention.
Recombinant cloning vectors will often include one or more replication systems for cloning or expression, one or more markers for selection in the host, e.g. antibiotic resistance, and one or more expression cassettes. The inserted S. epidermidis coding sequences may be synthesized by standard methods, isolated from natural sources, or prepared as hybrids, etc. Ligation of the S. epidermidis coding sequences to transcriptional regulatory elements and/or to other amino acid coding sequences may be achieved by known methods. Suitable host cells may be transformed/transfected/infected as appropriate by any suitable method including electroporation, CaCl2 mediated DNA uptake, bacterial infection, microinjection, microprojectile, or other established methods.
Appropriate host cells include bacteria, archebacteria, fungi, especially yeast, and plant and animal cells, especially mammalian cells. Of particular interest are S. epidermidis, E. coli, B. Subtilis, Saccharomyces cerevisiae, Saccharomyces carlsbergensis, Schizosaccharomyces pombi, SF9 cells, C129 cells, 293 cells, Neurospora, and CHO cells, COS cells, HeLa cells, and immortalized mammalian myeloid and lymphoid cell lines. Preferred replication systems include M13, ColE1, SV40, baculovirus, lambda, adenovirus, and the like. A large number of transcription initiation and termination regulatory regions have been isolated and shown to be effective in the transcription and translation of heterologous proteins in the various hosts. Examples of these regions, methods of isolation, manner of manipulation, etc. are known in the art. Under appropriate expression conditions, host cells can be used as a source of recombinantly produced S. epidermidis-derived peptides and polypeptides.
Advantageously, vectors may also include a transcription regulatory element (i.e., a promoter) operably linked to the S. epidermidis portion. The promoter may optionally contain operator portions and/or ribosome binding sites. Non-limiting examples of bacterial promoters compatible with E. coli include: b-lactamase (penicillinase) promoter; lactose promoter; tryptophan (trp) promoter; araBAD (arabinose) operon promoter; lambda-derived PI promoter and N gene ribosome binding site; and the hybrid tac promoter derived from sequences of the trp and lac UV5 promoters. Non-limiting examples of yeast promoters include 3-phosphoglycerate kinase promoter, glyceraldehyde-3-phosphate dehydrogenase (GAPDH) promoter, galactokinase (GAL1) promoter, galactoepimerase promoter, and alcohol dehydrogenase (ADH) promoter. Suitable promoters for mammalian cells include without limitation viral promoters such as that from Simian Virus 40 (SV40), Rous sarcoma virus (RSV), adenovirus (ADV), and bovine papilloma virus (BPV). Mammalian cells may also require terminator sequences, polyA addition sequences and enhancer sequences to increase expression. Sequences which cause amplification of the gene may also be desirable. Furthermore, sequences that facilitate secretion of the recombinant product from cells, including, but not limited to, bacteria, yeast, and animal cells, such as secretory signal sequences and/or prohormone pro region sequences, may also be included. These sequences are well described in the art.
Nucleic acids encoding wild-type or variant S. epidermidis-derived polypeptides may also be introduced into cells by recombination events. For example, such a sequence can be introduced into a cell, and thereby effect homologous recombination at the site of an endogenous gene or a sequence with substantial identity to the gene. Other recombination-based methods such as nonhomologous recombinations or deletion of endogenous genes by homologous recombination may also be used.
The nucleic acids of the present invention find use as templates for the recombinant production of S. epidermidis-derived peptides or polypeptides.
Identification and Use of S. epidermidis Nucleic Acid Sequences
The disclosed S. epidermidis polypeptide and nucleic acid sequences, or other sequences that are contained within ORFs, including complete protein-coding sequences, of which any of the disclosed S. epidermidis-specific sequences forms a part, are useful as target components for diagnosis and/or treatment of S. epidermidis-caused infection.
It will be understood that the sequence of an entire protein-coding sequence of which each disclosed nucleic acid sequence forms a part can be isolated and identified based on each disclosed sequence. This can be achieved, for example, by using an isolated nucleic acid encoding the disclosed sequence, or fragments thereof, to prime a sequencing reaction with genomic S. epidermidis DNA as template; this is followed by sequencing the amplified product. The isolated nucleic acid encoding the disclosed sequence, or fragments thereof, can also be hybridized to S. epidermidis genomic libraries to identify clones containing additional complete segments of the protein-coding sequence of which the shorter sequence forms a part. Then, the entire protein-coding sequence, or fragments thereof, or nucleic acids encoding all or part of the sequence, or sequence-conservative or function-conservative variants thereof, may be employed in practicing the present invention.
Preferred sequences are those that are useful in diagnostic and/or therapeutic applications. Diagnostic applications include without limitation nucleic-acid-based and antibody-based methods for detecting bacterial infection. Therapeutic applications include without limitation vaccines, passive immunotherapy, and drug treatments directed against gene products that are both unique to bacteria and essential for growth and/or replication of bacteria.
Identification of Nucleic Acids Encoding Vaccine Components and Targets for Agents Effective Against S. epidermidis 
The disclosed S. epidermidis genome sequence includes segments that direct the synthesis of ribonucleic acids and polypeptides, as well as origins of replication, promoters, other types of regulatory sequences, and intergenic nucleic acids. The invention encompasses nucleic acids encoding immunogenic components of vaccines and targets for agents effective against S. epidermidis. Identification of said immunogenic components involved in the determination of the function of the disclosed sequences, which can be achieved using a variety of approaches. Non-limiting examples of these approaches are described briefly below.
Homology to Known Sequences
Computer-assisted comparison of the disclosed S. epidermidis sequences with previously reported sequences present in publicly available databases is useful for identifying functional S. epidermidis nucleic acid and polypeptide sequences. It will be understood that protein-coding sequences, for example, may be compared as a whole, and that a high degree of sequence homology between two proteins (such as, for example,  greater than 80-90%) at the amino acid level indicates that the two proteins also possess some degree of functional homology, such as, for example, among enzymes involved in metabolism, DNA synthesis, or cell wall synthesis, and proteins involved in transport, cell division, etc. In addition, many structural features of particular protein classes have been identified and correlate with specific consensus sequences, such as, for example, binding domains for nucleotides, DNA, metal ions, and other small molecules; sites for covalent modifications such as phosphorylation, acylation, and the like; sites of protein:protein interactions, etc. These consensus sequences may be quite short and thus may represent only a fraction of the entire protein-coding sequence. Identification of such a feature in a S. epidermidis sequence is therefore useful in determining the function of the encoded protein and identifying useful targets of antibacterial drugs.
Of particular relevance to the present invention are structural features that are common to secretory, transmembrane, and surface proteins, including secretion signal peptides and hydrophobic transmembrane domains. S. epidermidis proteins identified as containing putative signal sequences and/or transmembrane domains are useful as immunogenic components of vaccines.
Targets for therapeutic drugs according to the invention include, but are not limited to, polypeptides of the invention, whether unique to S. epidermidis or not, that are essential for growth and/or viability of S. epidermidis under at least one growth condition. Polypeptides essential for growth and/or viability can be determined by examining the effect of deleting and/or disrupting the genes, i.e., by so-called gene xe2x80x9cknockoutxe2x80x9d. Alternatively, genetic footprinting can be used (Smith et al., 1995, Proc. Natl. Acad. Sci. USA 92:5479-6433; Published International Application WO 94/26933; U.S. Pat. No. 5,612,180). Still other methods for assessing essentiality includes the ability to isolate conditional lethal mutations in the specific gene (e.g., temperature sensitive mutations). Other useful targets for therapeutic drugs, which include polypeptides that are not essential for growth or viability per se but lead to loss of viability of the cell, can be used to target therapeutic agents to cells.
Strain-specific Sequences
Because of the evolutionary relationship between different S. epidermidis strains, it is believed that the presently disclosed S. epidermidis sequences are useful for identifying, and/or discriminating between, previously known and new S. epidermidis strains. It is believed that other S. epidermidis strains will exhibit at least 70% sequence homology with the presently disclosed sequence. Systematic and routine analyses of DNA sequences derived from samples containing S. epidermidis strains, and comparison with the present sequence allows for the identification of sequences that can be used to discriminate between strains, as well as those that are common to all S. epidermidis strains. In one embodiment, the invention provides nucleic acids, including probes, and peptide and polypeptide sequences that discriminate between different strains of S. epidermidis. Strain-specific components can also be identified functionally by their ability to elicit or react with antibodies that selectively recognize one or more S. epidermidis strains.
In another embodiment, the invention provides nucleic acids, including probes, and peptide and polypeptide sequences that are common to all S. epidermidis strains but are not found in other bacterial species.
S. epidermidis Polypeptides
This invention encompasses isolated S. epidermidis polypeptides encoded by the disclosed S. epidermidis genomic sequences, including the polypeptides of the invention contained in the Sequence Listing. Polypeptides of the invention are preferably at least 5 amino acid residues in length. Using the DNA sequence information provided herein, the amino acid sequences of the polypeptides encompassed by the invention can be deduced using methods well-known in the art. It will be understood that the sequence of an entire nucleic acid encoding a S. epidermidis polypeptide can be isolated and identified based on an ORF that encodes only a fragment of the cognate protein-coding region. This can be achieved, for example, by using the isolated nucleic acid encoding the ORF, or fragments thereof, to prime a polymerase chain reaction with genomic S. epidermidis DNA as template; this is followed by sequencing the amplified product.
The polypeptides of the present invention, including function-conservative variants of the disclosed ORFs, may be isolated from wild-type or mutant S. epidermidis cells, or from heterologous organisms or cells (including, but not limited to, bacteria, fungi, insect, plant, and mammalian cells) including S. epidermidis into which a S. epidermidis-derived protein-coding sequence has been introduced and expressed. Furthermore, the polypeptides may be part of recombinant fusion proteins.
S. epidermidis polypeptides of the invention can be chemically synthesized using commercially automated procedures such as those referenced herein, including, without limitation exclusive solid phase synthesis, partial solid phase methods, fragment condensation or classical solution synthesis. The polypeptides are preferably prepared by solid phase peptide synthesis as described by Merrifield, 1963, J Am. Chem. Soc. 85:2149. The synthesis is carried out with amino acids that are protected at the alpha-amino terminus. Trifunctional amino acids with labile side-chains are also protected with suitable groups to prevent undesired chemical reactions from occurring during the assembly of the polypeptides. The alpha-amino protecting group is selectively removed to allow subsequent reaction to take place at the amino-terminus. The conditions for the removal of the alpha-amino protecting group do not remove the side-chain protecting groups.
Methods for polypeptide purification are well-known in the art, including, without limitation, preparative disc-gel electrophoresis, isoelectric focusing, HPLC, reversed-phase HPLC, gel filtration, ion exchange and partition chromatography, and countercurrent distribution. For some purposes, it is preferable to produce the polypeptide in a recombinant system in which the S. epidermidis protein contains an additional sequence tag that facilitates purification, such as, but not limited to, a polyhistidine sequence. The polypeptide can then be purified from a crude lysate of the host cell by chromatography on an appropriate solid-phase matrix. Alternatively, antibodies produced against a S. epidermidis protein or against peptides derived therefrom can be used as purification reagents. Other purification methods are possible.
The present invention also encompasses derivatives and homologues of S. epidermidis-encoded polypeptides. For some purposes, nucleic acid sequences encoding the peptides may be altered by substitutions, additions, or deletions that provide for functionally equivalent molecules, i.e., function-conservative variants. For example, one or more amino acid residues within the sequence can be substituted by another amino acid of similar properties, such as, for example, positively charged amino acids (arginine, lysine, and histidine); negatively charged amino acids (aspartate and glutamate); polar neutral amino acids; and non-polar amino acids.
The isolated polypeptides may be modified by, for example, phosphorylation, sulfation, acylation, or other protein modifications. They may also be modified with a label capable of providing a detectable signal, either directly or indirectly, including, but not limited to, radioisotopes and fluorescent compounds.
To identify S. epidermidis-derived polypeptides for use in the present invention, essentially the complete genomic sequence of a Staphyolococcus epidermidis isolate was analyzed. While, in very rare instances, a nucleic acid sequencing error may be revealed, resolving a rare sequencing error is well within the art, and such an occurrence will not prevent one skilled in the art from practicing the invention.
Also encompassed are any S. epidermidis polypeptide sequences that are contained within the open reading frames (ORFs), including complete protein-coding sequences, of which any of SEQ ID NO: 2838-SEQ ID NO: 5674 forms a part. Table 2, which is appended herewith and which forms part of the present specification, provides a putative identification of the particular function of a polypeptide which is encoded by each ORF, based on the homology match (determined by the BLAST algorithm) of the predicted polypeptide with known proteins encoded by ORFs in other organisms. As a result, one skilled in the art can use the polypeptides of the present invention for commercial and industrial purposes consistent with the type of putative identification of the polypeptide.
The present invention provides a library of S. epidermidis-derived polypeptide sequences, and a corresponding library of nucleic acid sequences encoding the polypeptides, wherein the polypeptides themselves, or polypeptides contained within ORFs of which they form a part, comprise sequences that are contemplated for use as components of vaccines. Non-limiting examples of such sequences are listed by SEQ ID NO in Table 2, which is appended herewith and which forms part of the present specification.
The present invention also provides a library of S. epidermidis-derived polypeptide sequences, and a corresponding library of nucleic acid sequences encoding the polypeptides, wherein the polypeptides themselves, or polypeptides contained within ORFs of which they form a part, comprise sequences lacking homology to any known prokaryotic or eukaryotic sequences. Such libraries provide probes, primers, and markers which can be used to diagnose S. epidermidis infection, including use as markers in epidemiological studies. Non-limiting examples of such sequences are listed by SEQ ID NO in Table 2, which is appended. The present invention also provides a library of S. epidermidis-derived polypeptide sequences, and a corresponding library of nucleic acid sequences encoding the polypeptides, wherein the polypeptides themselves, or polypeptides contained within ORFs of which they form a part, comprise targets for therapeutic drugs.