In the first few hours after entry into a host cell, retroviruses direct the reverse transcription of the RNA genome into DNA, and then the insertion of that DNA into the host genome to form the integrated provirus (Goff, 1992; Weiss et al., 1984). The integration reaction is essential for the successful expression of the viral DNA to give rise to progeny virus, and is responsible for the ability of the virus to persist in the infected cell. The reaction is a highly efficient and orderly process. Specific inverted repeat sequences at the termini of the linear viral DNA, required in cis, are joined to the host DNA. The reaction is associated with specific alterations at the junctions: a small number of base pairs, usually two, are lost from each of the termini of the unintegrated viral DNA, and a small number of base pairs initially present only once at the target site are duplicated so as to flank the integrated provirus.
A single virally encoded enzyme, integrase (IN), is required for the establishment of the integrated provirus. This enzyme is encoded by the 3' portion of the pol gene (Schwartzberg et al., 1984) and is packaged inside the virion particle in the course of virion assembly. During the early stages of infection, the protein remains associated with the viral nucleic acid in a nucleoprotein complex (Farnet and Haseltine, 1991) and performs several specific reactions: first, the 3' termini of the viral DNA are cleaved to produce recessed 3'OH ends, and second, the two newly generated 3' termini are joined to the 5' phosphates on each strand of the target sequence in a concerted strand transfer reaction (Fujiwara and Mizuuchi, 1988). Only one strand of the viral DNA at each terminus is joined to each strand of the target DNA. The positions of attack by each 3'OH end on the two target DNA strands are staggered, such that the initial product contains gaps; host repair enzymes are thought to be responsible for removing unpaired bases, filling in gaps, and ligating the second strand. These repair steps result in the formation of the target site duplication flanking the provirus.
It is possible that some host proteins are directly involved in promoting the integration reactions occurring after viral infection. Although recombinant integrase preparations can carry out all the steps known to be required for processing and joining the viral DNA (Bushman and Craigie, 1991; Bushman et al., 1990; Craigie et al, 1990; Katz et al., 1990), some aspects of the reaction are not fully recapitulated in vitro. For example, the isolated proteins show only very low specific activity for both cutting and joining of DNA (Bushman et al., 1990; Craigie et al., 1990). Furthermore, joining reactions carried out with oligonucleotide substrates for some viruses result in the transfer of only one 3'OH to the target DNA yielding a Y structure, rather than the concerted transfer of two 3'OH termini to the target (Bushman et al., 1990). These inadequacies of the in vitro systems may reflect problems with proper oligomerization of the IN protein, or with the absence of stimulatory cofactors. For some viruses, host proteins might be responsible for stimulation of the overall reaction in vivo, and, especially, for the concerted integration of the two termini at a single locus.
Integration of retroviral DNA occurs on many chromosomes and with no apparent local sequence specificity (Dhar et al., 1980; Hughes et al., 1978; Shimotohno and Temin, 1980; Shoemaker et al., 1981). Several studies, however, suggest that there may be preferred sites for integration. Proviral DNAs established by infection, rather than by transfection with cloned DNAs, seem to be more highly and consistently transcribed, implying that integration sites are selected from transcriptionally active areas of the genome (Hwang and Gilboa, 1984). A significant bias for insertions into open chromatin was detected at high frequency insertion near DNAse hypersensitive sites (Rohdewohld et al., 1987; Vijaya et al., 1986) and into transcriptionally active regions (Scherdin et al., 1990). In addition, there may be a small number of "hot spots", or preferred sites, which are frequently targeted (Shih et al., 1988). Measurements of the frequency of insertional inactivation into particular genes have been shown to give fewer events than predicted, suggesting that there may be "cold spots" as well (King et al., 1985; Varmus et al., 1981). In vitro studies of the integration into SV40 minichromosomes showed that the origin region and linker regions between the nucleosomes tended to exclude insertions, while nucleosomal regions were efficiently targeted; phasing of the insertions in the chromatin could be observed, with a 10-bp periodicity (Pryciak et al., 1991). These results suggest that the presence of DNA binding proteins and histones on DNA can significantly perturb the target choice.
Many of the features of retroviral integration are similar to those associated with transposition of eucaryotic and prokaryotic mobile elements. Analogous studies in various retrotransposon systems also suggest that target sites for integration are non-random. The Ty elements in yeast have been shown to exhibit significant target site biases; Ty1 insertions tend to cluster near the 5' end of some target genes (Natsoulis et al., 1989) and within 400 bp of tRNA genes (Ji et al., 1993), and Ty3 insertions are highly restricted to specific positions relative to polymerase III promoters (Chalker and Sandmeyer, 1990; Chalker and Sandmeyer, 1992). In these cases the integration events are not thought to be affected by the sequence itself or by transcriptional activity, but rather are more likely to be profoundly restricted by host chromosomal proteins, with the potential candidates for the target proteins being the TFIIIB or TFIIIC transcription factors bound to the promoter (Sandmeyer et al., 1990).
The identification of host proteins that might target proviral integration, stimulate integration activity, or affect the incoming retroviral DNA in other ways would provide an important lead into new areas of research. In an attempt to find such proteins, the yeast two hybrid system has been used (Fields et al., U.S. Pat. No. 5,283,173) to screen a cDNA library for proteins that interact with the HIV-1 IN. The search resulted in the recovery of a single novel gene, termed ini-1 for integrase interactor 1. The predicted amino acid sequence of the Ini-1 protein shows an unexpected sequence similarity to SNF5, a yeast transcriptional activator required for the high-level expression of many genes (Laurent et al., 1990). The product of the ini-1 gene may serve as an internal receptor for the HIV-1 IN, and may be responsible for targeting integration to active regions of the chromosome.