The about 9.6 kb single-stranded RNA genome of the HCV virus comprises a 5′- and 3′-non-coding region (NCRs) and, in between these NCRs a single long open reading frame of about 9 kb encoding an HCV polyprotein of about 3000 amino acids.
HCV polypeptides are produced by translation from the open reading frame and cotranslational proteolytic processing. Structural proteins are derived from the amino-terminal one-fourth of the coding region and include the capsid or Core protein (about 21 kDa), the E1 envelope glycoprotein (about 35 kDa) and the E2 envelope glycoprotein (about 70 kDa, previously called NS1), and p7 (about 7 kDa). The E2 protein can occur with or without a C-terminal fusion of the p7 protein (Shimotohno et al. 1995). Recently, an alternative open reading frame in the Core-region was found which is encoding and expressing a protein of about 17 kDa called F (Frameshifi) protein (Xu et al. 2001; Ou & Xu in U.S. patent application Publication No. US2002/0076415). In the same region, ORFs for other 14-17 kDa ARFPs (Alternative Reading Frame Proteins), A1 to A4, were discovered and antibodies to at least A1, A2 and A3 were detected in sera of chronically infected patients (Walewski et al. 2001). From the remainder of the HCV coding region, the non-structural HCV proteins are derived which include NS2 (about 23 kDa), NS3 (about 70 kDa), NS4A (about 8 kDa), NS4B (about 27 kDa), NS5A (about 58 kDa) and NS5B (about 68 kDa) (Grakoui et al. 1993).
The sequences of cDNA clones covering the complete genome of several prototype isolates have been determined and include complete prototype genomes of the HCV genotypes 1a (e.g., GenBank accession number AF009606), 1b (e.g., GenBank accession number AB016785), 1c (e.g., GenBank accession number D14853), 2a (e.g., GenBank accession number AB047639), 2b (e.g., GenBank accession number AB030907), 2c (e.g., GenBank accession number D50409) 2k (e.g., GenBank accession number AB031663), 3a (e.g., GenBank accession number AF046866), 3b (e.g., GenBank accession number D49374), 4a (e.g., GenBank accession number Y11604), 5a (e.g., GenBank accession number AF064490), 6a (e.g., GenBank accession number Y12083), 6b (e.g., GenBank accession number D84262), 7b (e.g., GenBank accession number D84263), 8b (e.g., GenBank accession number D84264), 9a (e.g., GenBank accession number D84265), 10a (e.g., GenBank accession number D63821) and 11a (e.g., GenBank accession number D63822).
At present 11 genotypes of HCV are known which can be classified into 6 clades (or major lineages). Thus, HCV genotypes 1, 2, 4, and 5 are identified as clades 1, 2, 4 and 5, respectively; HCV genotypes 3 and 10 belong to clade 3; and HCV genotypes 6, 7, 8, 9 and 11 are members of lade 6 (Robertson et al. 1998). The current classification system is based on a threefold hierarchy. Basically, the classification system distinguishes, based on percentage of mutual homologies between sequences, between:                HCV isolates belonging to different types;        HCV isolates belonging to the same type but to a different subtype; and        HCV isolates belonging to the same subtype (Maertens and Stuyver 1997).        
Nucleic acid and amino acid sequences of HCV genotypes 1 to 11 have been disclosed not only in public databases but also in, e.g., International Patent Publications WO94/12670, WO94/25601, and WO96/13590. A further new HCV genotype different from genotypes 6-9 and 11 but belonging to clade 6 has recently been disclosed in International Patent Publication WO 03/020970. Further new subtypes of HCV genotype 2 (clade 2) and a new subtype of HCV genotype 1 (clade 1) have been described by Candotti et al. (2003).