Variable genes such as immunoglobulin (Ig) and T cell receptor (TCR) genes are formed from rearrangement of V(D)J gene segments with P/N nucleotide additions between the junctions. A fully functional Ig or TCR protein is formed by association of two genes—heavy and light chain genes for Ig, alpha and beta genes for an αβTCR and gamma and delta genes for a γδTCR. This combinatorial approach results in an extremely large variety of different possible sequences.
This repertoire allows the immune system to be able to respond to novel immunological insults that have not yet been encountered by the organism. Immunoglobulin genes also undergo somatic hypermutation which further increases the repertoire size.
Correspondingly, any nucleic acid analysis of variable genes that allows for expression of the native Ig or TCR protein to investigate its functional properties requires not just sequencing individual B (for Ig genes) or T cells (for TCR genes), but also requires native pairing of the two genes that make up the protein. This can be done by single cell cloning and Sanger sequencing, but is slow and laborious (see, e.g., Wrammert et al., Nature, 2008, 453:667-671).
High-throughput methods have been developed for high-throughput sequencing of natively paired genes, and fall into two approaches. The first approach is to attach a unique nucleic acid barcode identifier to nucleic acids from a cell, and pairing is achieved via bioinformatically linking together genes if they share the same barcode and therefore originate from the same cell (PCT/US2012/000221). The second approach is to physically link nucleic acids from the two genes together (see, e.g., U.S. Pat. No. 7,749,697).
The first approach is superior as it allows pairing for multiple genes (such as B or T cell co-expressed genes that identify specific T cell or B cell subsets), while the second approach is limited to physically linking a few nucleic acids. To date, experimental data exists only for cases in which no more than two nucleic acids have been physically linked.
Associating nucleic acids unambiguously to a single cell (the first approach) rather than associating them with each other via linking (the second approach) has advantages. When nucleic acids are associated with each other, it can be difficult to distinguish PCR and sequencing errors from true biological variation. Assumptions have to be made about the accuracy of the sequencing platform and reads arbitrarily assigned to different sequences based on a percentage similarity cutoff, i.e. all reads with >95% similarity are assigned to a sequence and any differences between them are assumed to be due to sequencing errors. This is unable to distinguish between sequences that are very similar to one another (see Zhu et al., Frontiers in Microbiology, 2012, 3:315).
Furthermore, assumptions about how many cells share an identical sequence are made using the relative frequency of reads assigned to the sequence. This is an approximate measure and is affected by PCR amplification biases, as is well known in the field. Therefore, associating Ig or TCR nucleic acids with each other can only give an approximate, but not true representation of the repertoire sequenced (see Zhu et al., Frontiers in Microbiology, 2012, 3:315).
However, associating nucleic acids to single cells using nucleic acid barcodes allows for unambiguous differentiation between similar or even identical sequences from single B or T cells as each read can be assigned to a cell.
Furthermore, by building a consensus sequence with all reads associated with a cell, very accurate and almost completely error-free sequences can be obtained and an accurate representation of the repertoire sequenced can be obtained. This is also generalizable to analysis of all nucleic acids in a cell.
Still, technical difficulties in delivering unique barcodes to each single cell remain. The current best technology to attach nucleic acid barcodes to variable genes has unique barcodes in aqueous solution and each barcode exists in a separate storage container even before the reaction to attach barcodes to variable gene nucleic acids (PCT/US2012/000221), otherwise the nucleic acid barcodes will be mixed before use. This creates a logistical difficulty of barcoding many thousands of cells, due to the large number of containers required to contain the individual barcodes.
The requirement for a large number of storage containers also makes this approach incompatible with any sort of approach where a unique barcode cannot be individually pipetted into each individual reaction container (which will also contain a single cell). An example is nanoliter-sized reaction containers such as a nanowell approach, where it is impractical to pipette a unique barcode individually to each nanowell as there are thousands to hundreds of thousands of nanowells.
This is also infeasible in a nanodroplet approach, in which droplets are made using a water-in-oil emulsion, as hundreds of thousands of nanodroplets are generated with only a few aqueous streams (see for e.g., products by Dolomite Microfluidics or Raindance Technologies), and it is not possible to have unique barcodes in individual storage containers before delivering to the nanodroplet.
One method to deliver unique barcodes to individual reaction containers is by using limiting dilution to deposit a unique barcode into the majority of reaction containers. One may perform limiting dilution of barcodes attached to manipulable objects, such as beads, each of which has multiple copies of one particular barcode attached, or one may perform limiting dilution of barcodes in solution. Upon diluting such beads, multiple copies of one particular nucleic acid barcode are present in a reaction container, whereas upon diluting barcodes in solution, only a single copy of a particular nucleic acid barcode is present in a reaction container.
Moreover, addition of a nucleic acid barcode to the sample-derived nucleic acids of interest present in a reaction container will be more complete if the introduced barcode is amplified, to ensure that it is present in a sufficient quantity in the reaction chamber. For example, a typical mammalian cell contains roughly 400,000 copies of mRNA. To maximize the efficiency of the overall single-cell analysis, as many of these mRNA copies as possible should be barcoded. Therefore, at a minimum, roughly the same number of copies of a particular nucleic acid barcode as there are mRNA copies need to be present in the reaction container. Limiting dilution of barcodes in solution leads to just a single copy of a particular barcode in the reaction container, while dilution of small (e.g. 1-2 μm in diameter) beads bearing barcodes would be expected to provide maximally tens of thousands of copies. Thus, amplification of the barcode in either case is important to generate sufficient quantities of a particular nucleic acid barcode in a reaction container such that successful addition of the barcode to the greatest number of sample-derived nucleic acids occurs. However, beads are expected to provide significantly more starting material for and therefore significantly better barcode amplification. Also, a sufficiently large bead may contain hundreds of thousands of nucleic acid barcode molecules. In this case, cleavage of nucleic acid barcodes from the bead may be sufficient to generate sufficient quantities of a particular nucleic acid barcode in a reaction container.
Furthermore, if the nucleic acids are attached to a solid surface, they will not be as free to move about in comparison to nucleic acids in solution. Solid phase kinetics are much slower than aqueous phase kinetics for nucleic acid complementary base pairing, and may result in much less efficient addition of barcodes to nucleic acids of interest. Preferably, nucleic acid barcodes should exist in the aqueous phase before participating in the barcoding reaction.
This current invention improves upon a previous invention (PCT/US2012/000221) to attach unique barcodes to each sample, where each sample is usually a single cell, but is generalizable to any type of sample. The current invention enables delivery of unique barcodes to any type of reaction container, and is also suitable for nanoliter-sized reaction containers and does not require keeping unique nucleic acid barcodes in separate storage containers. It is amendable to but does not require manually pipetting a unique barcode into each reaction container. It delivers one or more copies of a unique barcode or unique barcode set into each reaction container and the barcode is attached to nucleic acids of interest in a reaction that occurs in the aqueous phase with rapid aqueous phase kinetics. As the reaction attaches barcodes to all nucleic acids of interest in a cell, i.e. all reverse transcribed RNA in a cell, the current invention enables single cell transcriptomics analysis, and is not limited to associating immunoglobulin variable genes to specific samples. Furthermore, the amplification reaction can occur at a sufficiently low temperature that it is compatible with mesophilic enzymes (that are otherwise inactivated at high temperatures) to add barcodes to nucleic acids of interest.