Immunoglobulins (Igs) expressed by B-cells, also referred to herein as B-cell receptors (BCR), are proteins consisting of four polypeptide chains, two heavy chains (H chains) from the IGH locus and two light chains (L chains) from either the IGK (kappa) or the IGL (lambda) locus, forming an H2L2 structure. Both H and L chains contain complementarity determining regions (CDR) involved in antigen recognition, and a constant domain. The H chains of Igs are initially expressed as membrane-bound isoforms using either the IgM or IgD constant region isoform, but after antigen recognition the H chain constant region can class switch to several additional isotypes, including IgG, IgE and IgA. The diversity of naïve Igs within an individual is mainly determined by the hypervariable complementarity determining regions (CDR). The CDR3 domain of IGH chains is created by the combinatorial joining of the VH, DH, and JH gene segments. Hypervariable domain sequence diversity is further increased by independent addition and deletion of nucleotides at the VH-DH, DH-JH, and VH-JH junctions during the process of IG gene rearrangement. Ig sequence diversity is further augmented by somatic hypermutation (SHM) throughout the rearranged IG gene after a naïve B cell initially recognizes an antigen. The process of SHM is not restricted to CDR3, and therefore can introduce changes in the germline sequence in framework regions, CDR1 and CDR2, as well as in the somatically rearranged CDR3.
As the adaptive immune system functions in part by clonal expansion of cells expressing unique BCRs, accurately measuring the changes in total abundance of each clone is important to understanding the dynamics of an adaptive immune response. Utilizing advances in high-throughput sequencing, a new field of molecular immunology has recently emerged to profile the vast BCR repertoires. Compositions and methods for the sequencing of rearranged adaptive immune receptor gene sequences and for adaptive immune receptor clonotype determination are described, for example, in Robins et al., 2009 Blood 114, 4099; Robins et al., 2010 Sci. Translat. Med. 2:47ra64; Robins et al., 2011 J. Immunol. Meth. doi:10.1016/j.jim.2011.09. 001; Sherwood et al. 2011 Sci. Translat. Med. 3:90ra61; U.S. Patent Application Nos. 61/550,311 and 61/569,118; US Patent Application Publication Nos. US 2012-0058902 and US 2010-0330571; and International PCT Publication Nos. WO 2010/151416, WO 2011/106738, and WO 2012/027503, all of which are herein incorporated by reference.
The sequence of the BCR repertoire yields complex DNA samples in which accurate determination of the multiple distinct sequences contained therein is hindered by technical limitations on the ability to quantify a plurality of molecular species simultaneously using multiplexed amplification and high throughput sequencing. In addition, it is difficult from existing methodologies to sequence quantitatively DNA or RNA encoding both chains of a BCR heterodimer in a manner that permits determination that both chains originated from the same lymphoid cell.
One or more factors can give rise to artifacts that skew sequencing data outputs, compromising the ability to obtain reliable quantitative data from sequencing strategies that are based on multiplexed amplification of a highly diverse collection of IG gene templates. These artifacts often result from unequal use of diverse primers during the multiplexed amplification step. Such biased utilization of one or more oligonucleotide primers in a multiplexed reaction that uses diverse amplification templates may arise as a function of one or more of differences in the nucleotide base composition of templates and/or oligonucleotide primers, differences in template and/or primer length, the particular polymerase that is used, the amplification reaction temperatures (e.g., annealing, elongation and/or denaturation temperatures), and/or other factors (e.g., Kanagawa, 2003 J. Biosci. Bioeng. 96:317; Day et al., 1996 Hum. Mol. Genet. 5:2039; Ogino et al., 2002 J. Mol. Diagnost. 4:185; Barnard et al., 1998 Biotechniques 25:684; Aird et al., 2011 Genome Biol. 12:R18).
The identification of paired light and heavy chains from a single B-cell is only one half of the equation regarding immuno-surveillance of antigens/epitopes that are recognized by the adaptive immune system. In the absence of the ability to identify B-cell receptors in the diverse BCR repertoire that bind to corresponding epitopes/antigens, the sequenced BCR profile does not allow for the ability to draw direct correlations between the presence of a specific BCR sequence and the presence of a corresponding epitope/antigen of a pathogen or cancer.
A BCR-specific epitope display library or a BCR-specific antigen display library is the result of introducing B-cells with an extracellular BCR into a solution comprising a genetic conveyance of random or specific antigens to which the BCRs may bind to, and which the BCR heterodimers can be linked to a specific antigen, thus allowing for the correlation of specific BCR sequences to specific antigens. Methods of utilizing phage display for serological profiling are described in Xu et al. Science. 348(6239): aaa0698.
Conventional techniques have focused on determining antigen specificity using antibodies (soluble forms of BCRs), but have not been able to directly assess BCR specificity to antigens. Current methods are not able to simultaneously determine antigen-specific BCRs on a large scale. Antigen-specificity of rare B cells is also difficult to achieve using current techniques.
Clearly there remains a need for identifying antigen-specific BCRs in a high throughput and accurate method. In particular, there exists a need for (1) improved compositions and methods that will permit accurate quantification of adaptive immune receptor-encoding DNA and RNA sequence diversity in complex samples, in a manner that avoids skewed results, for example, from amplification bias, and in a manner that permits determination of the coding sequences for both chains of a BCR heterodimer that originate from the same lymphoid cell; and (2) matching the heterodimers to a corresponding epitope/antigen binding partner to identify BCRs that bind a particular epitope or antigen of interest. The presently described embodiments address this need and provide other related advantages.