VDJ recombination, also known as somatic recombination, is a mechanism of genetic recombination in the early stages of immunoglobulin (Ig) and T cell receptor (TCR) production of the immune system. VDJ recombination nearly-randomly combines Variable (V), Diverse (D) and Joining (J) gene segments of vertebrates, and because of its randomness in choosing different genes, is able to diversely encode proteins to match antigens from bacteria, viruses, parasites, dysfunctional cells such as tumor cells and pollens.
VDJ recombination of the mouse immunoglobulin heavy chain locus is pictorially shown in FIG. 1. This is a large 3 Mb locus consisting of approximately 195 variable (V) genes, 10 diversity (D) genes and 4 joining (J) genes. These are the segments that participate in VDJ recombination. There are also 8 constant genes which, as their name suggests, do not undergo VDJ recombination. The first event in the VDJ recombination of this locus is that one of the D genes rearranges to one of the J genes. Following this, one of the V genes is appended to this DJ rearrangement to form the functional VDJ rearranged gene that then codes for the variable segment of the heavy chain protein. Both of these steps is catalysed by recombinase enzymes called Rags which delete out the intervening DNA. An analogous arrangement exists in the human genome, which instead comprises 95 variable (V) genes, 20 diversity (D) genes and 6 joining (J) genes.
This recombination process takes place in a stepwise fashion in progenitor B cells to produce the diversity required for the antibody repertoire but there is another requirement—that of specificity such that each B cell only produces one antibody. This specificity is fundamental for the function of the immune system and is achieved by a process called allelic exclusion such that functional rearrangement of one allele signals via a currently unclear mechanism to prevent further recombination of the second allele.
The existing methodology uses PCR-based approaches to identify VDJ recombination products. This comprises pairs of primers, where one primer binds to one of the four J genes, common to all VDJ recombination products, or a sequence immediately downstream of a J gene, in combination with a primer or primers specific for the V gene component of the VDJ recombination product.
There are a number of weaknesses with the existing methodology. For example, there are numerous V gene families (16 in the mouse Igh), and to ensure specificity of detection, different V gene primers must be designed for each family. This introduces a bias in quantitative comparative analysis, since amplification of individual V gene families will depend on the relative efficiencies of the V gene primers designed for different V gene families, and differences in efficiency introduce inaccuracies in comparative analysis.
Even within V gene families, the V gene members have slightly different sequences and thus, unless a primer can be designed that matches each V gene member sequence 100%, this will introduce bias in comparative amplification of V genes within a family. For larger V gene families, and thus the majority of V genes, it is virtually impossible to design a V gene primer that can detect all V family members. The only way to circumvent this is to design primers to subsets of V genes within a family, but this introduces an additional bias again, due to different efficiency of amplification with different PCR primers. The combination of these two problems means that current methods cannot provide an unbiased and complete analysis of the VDJ recombination products in a sample.
The current PCR-based methods also have a problem with scale. The usual step after PCR amplification is to clone and sequence the PCR products. As an example, there are almost 200 V genes in the mouse Igh. The most frequent aim is to determine how often these are used relative to each other in the immunoglobulin repertoire. In order to detect each different V gene once, assuming they were recombined at equal efficiency and detected by PCR with equal efficiency (neither of which is the case), 200 clones would have to be sequenced. To actually determine relative usages of V genes in a population in which they are used at frequencies that can differ by orders of magnitude, tens of thousands of clones would have to be generated and sequenced. This is currently prohibitive, both in terms of cost and labour.
Some attempts have been made to overcome the problem of scale by incorporating next generation sequencing approaches into the methodology. Although several of these have been described recently, they all continue to use PCR primers for the V gene families as the starting point for detection of VDJ recombination products, and subsequently incorporate next generation sequencing as a method of ‘cloning and sequencing’ large numbers of PCR products. Thus the inherent biases due to PCR primer efficiency remain.
There is therefore a great need to provide improved methods of identifying VDJ recombination products which overcome one or more of the aforementioned problems.