Molecular interactions are critical for maintaining cellular functions. These interactions can be classified into 3 broad categories. The first is protein complex formation via covalent bonds such as the heavy chain and light chain of an immunoglobulin. The second is protein-protein association via non-covalent bonds exemplified by heterodimer formation of HER-2 and EGFR (epidermal growth factor receptor). The third is an association between 2 molecules involved in a cellular pathway such as a cytokine receptor and a caspase in an apoptosis pathway.
Antigen binding proteins involved in the immune response are present in mammals as large polyclonal repertoires representing a broad diversity of binding specificities. This diversity is generated by rearrangement of gene sequences encoding variable regions of these binding proteins. Such variable region binding proteins include soluble and membrane-bound forms of the B cell receptor (also known as immunoglobulins or antibodies) and the membrane-bound T cell receptors (TCR). With respect to immunoglobulins, their affinity is enhanced subsequent to recognition of an antigen by a B cell antigen receptor, through a process termed affinity maturation which involves cycles of somatic hypermutation of these variable genes.
Known approaches for isolating antibodies with a desired binding specificity most often either involves generation of hybridoma cells from immunized hosts followed by screening for specific clones or involves the generation of combinatorial expression libraries in E. coli composed of immunoglobulin variable domains, which are subsequently screened using techniques such as phage display.
There are several limitations in the use of the hybridoma technology. The generation time for a specific hybridoma can be long (5-15 months). Functional screens are possible only after clone selection and culture. Furthermore, if a hybridoma is desired for producing human antibodies, typically for therapeutic purposes, then alternative strategies must be sought because of the absence of human myeloma lines suitable as fusion partners for human B lymphocytes. Heterohybridomas, i.e., fusion of human B cells with mouse myeloma cells, were attempted but they are extremely unstable and thus rarely lead to suitable cell lines for production purposes. Human B cells immortalized through infection with Epstein-Barr virus exhibit similar instability.
Use of combinatorial libraries and phage display allows for generation of large repertoires of antibody clones with a potential diversity in excess of 1010. From this repertoire, selection for binding to a specific target can be performed thereby generating a sub-library. This sub-library can be used to generate either polyclonal or monoclonal antibodies. The variable region encoding sequences (for example heavy chain variable region and light chain variable region encoding sequences) which are to constitute the library can be amplified from lymphocytes, plasma cells, hybridomas or any other immunoglobulin expressing population of cells. Current technologies for generating combinatorial libraries involve separate isolation of the variable region encoding sequences. Thus, the original pairing of, for example, immunoglobulin heavy chain variable region and light chain variable region encoding sequences is lost. Said sequences are randomly paired and the original combinations of these variable sequences only occur by chance.
Thus, in order to isolate variable region encoding sequences responsible for a desired binding specificity, a considerable amount of screening is necessary. This is typically performed in combination with methods for enrichment of clones exhibiting a desired specificity, such as ribosome display or phage display. Even then, the diversity achieved might not be sufficiently large to isolate variable region encoding sequence pairs giving rise to binding proteins of similar high affinity as those found in the original cells. Further, the enrichment procedures normally used to screen combinatorial libraries introduce a strong bias, e.g. polypeptides of particular low toxicity in E. coli, efficient folding, slow off-rates, or other system dependent parameters, thus reducing the diversity of the library even further. In addition, clones derived from such combinatorial libraries will be more prone to producing binding proteins with cross reactivity against self-antigens because they, in contrast to original pairs (hereafter called cognate pairs), have never been subjected to in vivo negative selection against self-antigens as pairs, such as is the case for B- and T-lymphocyte receptors during particular stages of their development. Therefore, the identification of cognate pairs of variable region encoding sequences is a desirable approach. Moreover, the frequency of clones exhibiting a desired binding specificity is expected to be considerably higher within a library of cognate pairs than in a conventional combinatorial library, particularly if the starting cells are derived from a donor with high frequency of cells encoding specific binding pairs, e.g., immune or immunized donors.
In order to generate cognate pair libraries the linkage of the variable region encoding sequences derived from the same cell is required. At present, 3 different approaches that achieve cognate pairing of variable region encoding sequences have been described.
In-cell PCR is an approach where a population of cells is fixed and permeabilized, followed by in-cell linkage of heavy chain variable region and light chain variable region encoding sequences from immunoglobulins. This linkage can be performed either by overlap extension RT-PCR or by recombination. The amplification process, as described in these publications, is a three to four step process consisting of i) reverse transcription utilizing constant region primers generating immunoglobulin cDNA, ii) PCR amplification of the heavy and light chain variable region encoding sequences utilizing primer sets containing either overlap-extension design or recombination sites, iii) optional linkage by recombination, and iv) nested PCR of the products generating restriction sites for cloning. Since the cells are permeabilized there is a considerable risk that amplification products might leak out of the cells, thereby introducing scrambling of the heavy chain variable region and light chain variable region encoding sequences, resulting in the loss of cognate pairing. Therefore, the procedure includes washing steps after each reaction which makes the process laborious and reduces the efficiency of the reactions.
More generally, the in-cell PCR is extremely inefficient, resulting in low sensitivity. Accordingly, the in-cell PCR linkage technique has never found widespread usage, and the original studies have never been reliably repeated in a way that can verify that the linkage actually occurs within the cell. This, however, is absolutely crucial to avoid scrambling the heavy chain variable region and light chain variable region encoding sequences and thereby disrupting the cognate pairs.
Single-cell PCR is a different approach to achieve cognate pairing of heavy chain variable region and light chain variable region encoding. In these publications, a population of immunoglobulin expressing cells are distributed by diluting to a density of one cell per reaction, thereby eliminating scrambling of heavy chain variable region and light chain variable region encoding sequences during the cloning process. The process described is a three to four step procedure consisting of i) reverse transcription utilizing oligo-dT-, random hexamer- or constant region primers generating cDNA, ii) fractionating the cDNA product into several tubes and performing PCR amplification on the individual variable chain encoding sequences (in separate tubes) with primer sets containing restriction sites for cloning, iii) nested PCR of the products generating restriction sites for cloning (optional) and iv) linking the heavy chain variable region and light chain variable region encoding sequences from the separate tubes by cloning them into an appropriate vector, which is itself a multi-step process.
In humans, there are two types of light chains: lambda (λ) and kappa (κ). This means that with the cDNA generated from every single cell at least three separate PCR reactions must be performed followed by analysis and cloning of the appropriate bands into a single vector to achieve cognate pairing. Thus, the single-cell PCR approach as described requires a large number of reactions to generate a library of cognate pairs. Although, a cognate pair library does not need to be as large as a combinatorial library in order to obtain binding proteins representing a broad diversity of binding specificities it would still be a laborious task to generate a library of, for example, 104 to 105 clones by the described single cell PCR approach. Further, the large number of steps greatly increases the risk of contamination and human error.
Symphogen's (Symphogen A/S, Lyngby, Denmark) approach is to use multiplex overlap-extension RT-PCR (reverse transcription-polymerase chain reaction) to identify the natural pairing of heavy chain and light chain antibodies by isolating single B cells in a well of a 96-well microtiter plate. However, the process is labor intensive and verification of the structure of individual antibodies is inefficient since each pair needs to be tested separately by expressing the corresponding single chain Fv in E. coli. 
PCR primer based DNA barcoding has been described in the literature. There have been studies utilizing oligonucleotide tags to label DNA molecules during sample preparation such that after sequencing by a massively parallel technology, one is able to digitally sort DNA sequences originating from different samples or a positive control. However, each tagging PCR primer must be individually synthesized. As such, when a large number of samples are processed, this practice will become prohibitively expensive. Consequently, the technique is only suitable for a limited number of samples.