Thus, in a previous patent application (PCT/FR98/01442; which corresponds to U.S. patent application Ser. No. 09/446,024), the applicant screened a cDNA library using a Ppol-MSRV probe (SEQ ID NO. 18) and detected overlapping clones which allowed it to reconstruct a putative genomic RNA of 7582 nucleotides. This genomic RNA has an R-U5-gag-pol-env-U3-R structure. A “blastn” interrogation over several databases using the reconstructed genome made it possible to show that there is a considerable amount of related genomic (DNA) sequences in the human genome, which are found on several chromosomes. Thus, the applicant demonstrated the existence of partial structures of the retroviral type in the human genome and envisaged their potential role in the development of autoimmune diseases, in unsuccessful pregnancy or pathological conditions of pregnancy.
Autoimmune diseases which may be mentioned by way of example are multiple sclerosis, rhumatoid arthritis, lupus erythematosus disseminatus, insulin-dependent diabetes and/or pathologies which are associated with them.
The isolation and sequencing of overlapping cDNA fragments and the identification of genomic (DNA) clones corresponding to the isolated DNA clones, described in the applicant's above-mentioned PCT and corresponding U.S. patent applications, are incorporated herein by way of reference.
Isolation and sequencing of overlapping cDNA fragments:
The information regarding the organization of the novel family of endogenous retroviruses named, by the applicant, HERV-W was obtained by testing a placenta cDNA library (Clontech cat#HL5014a) with the Ppol-MSRV (SEQ ID NO. 18) and Penv-C15 (SEQ ID NO. 19) probes and then carrying out a “gene walking” technique using the novel sequences obtained. The experiments were carried out with reference to the recommendations of the supplier of the library. PCR amplifications on DNA were also used in order to understand this organization.
The following clones were selected and sequenced:                Clone cl.6A2 (SEQ ID NO. 20): 5′ untranslated region of HERV-W and a portion of gag.        Clone cl.6A1 (SEQ ID NO. 21): gag and a portion of pol.        Clone cl.7A16 (SEQ ID NO. 22): 3′ region of pol.        Clone cl.Pi22 (SEQ ID NO. 23): 3′ region of pol and start of env.        Clone cl.24.4 (SEQ ID NO. 24): spliced RNA comprising a portion of the 5′ untranslated region of HERV-W, the end of pol and the 5′ region of env.        Clone cl.C4C5 (SEQ ID NO. 25): end of env and 3′ untranslated region of HERV-W.        Clone cl.PH74 (SEQ ID NO. 26): subgenomic RNA: 5′ untranslated region of HERV-W, end of pol, env, and 3′ untranslated region of HERV-W.        Clone cl.PH7 (SEQ ID NO. 27): multispliced RNA: 5′ untranslated region of HERV-W, end of env and 3′ untranslated region of HERV-W.        Clone cl.Pi5T (SEQ ID NO. 28): partial pol gene and U3-R region.        Clone cl.44.4 (SEQ ID NO. 29): R-U5 region, gag gene and partial pol gene.        
A total sequence model for HERV-W was produced with the aid of these clones, by carrying out sequence alignments. The spliced RNAs were revealed and also the potential splice donor and acceptor sites. The LTR, gag, pol and env entities were defined by studying similarity with existing retroviruses.
The putative genetic organization of HERV-W in the RNA form is as follows (SEQ ID NO. 30):    gene 1..7582.
Location of the clones on the reconstructed genomic RNA sequence:    cl.6A2 (1321 bp) 1-1325;    cl.PH74 (535+2229=2764 bp) 72-606 and 5353-7582;    cl.24.4 (491+1457=1948 bp); 115-606 and 5353-6810;    cl.44.4 (2372 bp) 115-2496;    cl.PH7 (369+297=666 bp) 237-606 and 7017-7313; cl.6A1 (2938 bp) 586-3559;    cl.Pi5T (2785+566=3351 bp) 2747-5557 and 7017-7582;    cl.7A16 (1422 bp) 2908-4337;    cl.Pi22 (317+1689=2006 bp) 3957-4273 and 4476-6168;    cl.C4C5 (1116 bp) 6467-7582
5′LTR1..120/note=“R of 5′LTR (5′ end uncertain”121..575/note=“U5 of 5′LTR”misc.579..596/note=“PBS, primer binding site, for tRNA-W”misc.606/note=“splice junction (splice donor siteATCCAAAGTG-GTGAGTAATA (SEQ ID NO: 32)and splice acceptor siteCTTTTTTCAG-ATGGGAAACG (SEQ ID NO: 33),clone RG083M05,GenBank accession AC000064)”misc.5353/note=“splice acceptor site for ORF1 (env)”misc.5560/note=“splice donor site”ORF5581..7194/note=“ORF1 env 538 AA”/product-=“envelope”misc.7017/note=“splice acceptor site for ORF2 andORF3”ORF7039..7194/note=“ORF2 52 AA”ORF7112..7255/note=“ORF3 48 AA”misc.7244..7254/note“PPT, polypurine tract”3′LTR7256..7582/note=“U3-R of 3′ LTR (U3-R junctionundetermined)misc.7563..7569polyadenylation signal
Identification of genomic (DNA) clones corresponding to the isolated DNA clones:
A “blastn” interrogation over several databases, using the reconstructed genome, showed that there is a considerable amount of related sequences in the human genome. Approximately 400 sequences were identified in GenBank and more than 200 sequences in the EST bank, most of them in the antisense orientation. The 4 most significant sequences in terms of size and similarity are the sequences of the following genomic (DNA) clones:    Human clone RG083M05 (gb AC000064), the chromosomal location of which is 7q21-7q22,    Human clone BAC378 (gb U85196, gb AE000660) corresponding to the alpha/delta locus of the T-cell receptor, located at 14q11-12,    Human cosmid Q11M15 (gb AF045450) corresponding to region 21q22.3 of chromosome 21,    Cosmid U134E6 (embl Z83850) on chromosome Xq22.
The location of the aligned regions for each of the clones is indicated and the chromosome to which they belong is indicated between square brackets (FIG. 3 of the above-mentioned PCT and corresponding U.S. application, which corresponds to FIG. 1 herein). The percentage similarity (without the large deletions) between the 4 sequences and the reconstructed genomic RNA is indicated, as well as the presence of repeat sequences at each end of the genome and the size of the longest open reading frames (ORFs). Repeat sequences were found at the ends of 3 of these clones. The reconstructed sequence is entirely contained within clone RG083M05 (9.6 Kb) and exhibits 96% similarity. However, clone RG083M05 has a 2 Kb insertion located immediately downstream of the 5′ untranslated region (5′ UTR). This insertion is also found in two other genomic clones which have a 2.3 Kb deletion immediately upstream of the 3′ untranslated region (3′ UTR). No clone contained the three functional gag, pol and env open reading frames (ORFs). Clone RG083M05 shows a 538 amino acid (AA) ORF corresponding to a whole envelope. Cosmid Q11M15 contains two major contiguous ORFs of 413 AA (frame 0) and 305 AA (frame +1) corresponding to a truncated pol polyprotein.