More particularly, the present invention relates to a method for determining and/or analysing a transcriptome or genome and especially the global transcriptome or genome, of a tissue sample. In particular the method relates to a quantitative and/or qualitative method for analysing the distribution, location or expression of genomic sequences in a tissue sample wherein the spatial expression or distribution or location pattern within the tissue sample is retained. Thus, the new method provides a process for performing “spatial transcriptomics” or “spatial genomics”, which enables the user to determine simultaneously the expression pattern, or the location/distribution pattern of the genes expressed or genes or genomic loci present in a tissue sample.
The invention is particularly based on array technology coupled with high throughput DNA sequencing technologies, which allows the nucleic acid molecule (e.g. RNA or DNA molecules) in the tissue sample, particularly mRNA or DNA, to be captured and labelled with a positional tag. This step is followed by synthesis of DNA molecules which are sequenced and analysed to determine which genes are expressed in any and all parts of the tissue sample. Advantageously, the individual, separate and specific transcriptome of each cell in the tissue sample may be obtained at the same time. Hence, the methods of the invention may be said to provide highly parallel comprehensive transcriptome signatures from individual cells within a tissue sample without losing spatial information within said investigated tissue sample. The invention also provides an array for performing the method of the invention and methods for making the arrays of the invention.
The human body comprises over 100 trillion cells and is organized into more than 250 different organs and tissues. The development and organization of complex organs, such as the brain, are far from understood and there is a need to dissect the expression of genes expressed in such tissues using quantitative methods to investigate and determine the genes that control the development and function of such tissues. The organs are in themselves a mixture of differentiated cells that enable all bodily functions, such as nutrient transport, defence etc. to be coordinated and maintained. Consequently, cell function is dependent on the position of the cell within a particular tissue structure and the interactions it shares with other cells within that tissue, both directly and indirectly. Hence, there is a need to disentangle how these interactions influence each cell within a tissue at the transcriptional level.
Recent findings by deep RNA sequencing have demonstrated that a majority of the transcripts can be detected in a human cell line and that a large fraction (75%) of the human protein-coding genes are expressed in most tissues. Similarly, a detailed study of 1% of the human genome showed that chromosomes are ubiquitously transcribed and that the majority of all bases are included in primary transcripts. The transcription machinery can therefore be described as promiscuous at a global level.
It is well-known that transcripts are merely a proxy for protein abundance, because the rates of RNA translation, degradation etc will influence the amount of protein produced from any one transcript. In this respect, a recent antibody-based analysis of human organs and tissues suggests that tissue specificity is achieved by precise regulation of protein levels in space and time, and that different tissues in the body acquire their unique characteristics by controlling not which proteins are expressed but how much of each is produced.
However, in subsequent global studies transcriptome and proteome correlations have been compared demonstrating that the majority of all genes were shown to be expressed. Interestingly, there was shown to be a high correlation between changes in RNA and protein levels for individual gene products which is indicative of the biological usefulness of studying the transcriptome in individual cells in the context of the functional role of proteins.
Indeed, analysis of the histology and expression pattern in tissues is a cornerstone in biomedical research and diagnostics. Histology, utilizing different staining techniques, first established the basic structural organization of healthy organs and the changes that take place in common pathologies more than a century ago. Developments in this field resulted in the possibility of studying protein distribution by immunohistochemistry and gene expression by in situ hybridization.
However, the parallel development of increasingly advanced histological and gene expression techniques has resulted in the separation of imaging and transcriptome analysis and, until the methods of the present invention, there has not been any feasible method available for global transcriptome analysis with spatial resolution.
As an alternative, or in addition, to in situ techniques, methods have developed for the in vitro analysis of proteins and nucleic acids, i.e. by extracting molecules from whole tissue samples, single cell types, or even single cells, and quantifying specific molecules in said extracts, e.g. by ELISA, qPCR etc.
Recent developments in the analysis of gene expression have resulted in the possibility of assessing the complete transcriptome of tissues using microarrays or RNA sequencing, and such developments have been instrumental in our understanding of biological processes and for diagnostics. However, transcriptome analysis typically is performed on mRNA extracted from whole tissues (or even whole organisms), and methods for collecting smaller tissue areas or individual cells for transcriptome analysis are typically labour intensive, costly and have low precision.
Hence, the majority of gene expression studies based on microarrays or next generation sequencing of RNA use a representative sample containing many cells. Thus the results represent the average expression levels of the investigated genes. The separation of cells that are phenotypically different has been used in some cases together with the global gene expression platforms (Tang F et al, Nat. Protoc. 2010; 5: 516-35; Wang D & Bodovitz S, Trends Biotechnol. 2010; 28:281-90) and resulted in very precise information about cell-to-cell variations. However, high throughput methods to study transcriptional activity with high resolution in intact tissues have not, until now, been available.
Thus, existing techniques for the analysis of gene expression patterns provide spatial transcriptional information only for one or a handful of genes at a time or offer transcriptional information for all of the genes in a sample at the cost of losing positional information. Hence, it is evident that methods to determine simultaneously, separately and specifically the transcriptome of each cell in a sample are required, i.e. to enable global gene expression analysis in tissue samples that yields transcriptomic information with spatial resolution, and the present invention addresses this need.
The novel approach of the methods and products of the present invention utilizes now well established array and sequencing technology to yield transcriptional information for all of the genes in a sample, whilst retaining the positional information for each transcript. It will be evident to the person of skill in the art that this represents a milestone in the life sciences. The new technology opens a new field of so-called “spatial transcriptomics”, which is likely to have profound consequences for our understanding of tissue development and tissue and cellular function in all multicellular organisms. It will be apparent that such techniques will be particularly useful in our understanding of the cause and progress of disease states and in developing effective treatments for such diseases, e.g. cancer. The methods of the invention will also find uses in the diagnosis of numerous medical conditions.
Whilst initially conceived with the aim of transcriptome analysis in mind, as described in detail below, the principles and methods of the present invention may be applied also to the analysis of DNA and hence for genomic analyses also (“spatial genomics”). Accordingly, at its broadest the invention pertains to the detection and/or analysis of nucleic acid in general.
Array technology, particularly microarrays, arose from research at Stanford University where small amounts of DNA oligonucleotides were successfully attached to a glass surface in an ordered arrangement, a so-called “array”, and used it to monitor the transcription of 45 genes (Schena M et al, Science. 1995; 270: 368-9, 371).
Since then, researchers around the world have published more than 30,000 papers using microarray technology. Multiple types of microarray have been developed for various applications, e.g. to detect single nucleotide polymorphisms (SNPs) or to genotype or re-sequence mutant genomes, and an important use of microarray technology has been for the investigation of gene expression. Indeed, the gene expression microarray was created as a means to analyze the level of expressed genetic material in a particular sample, with the real gain being the possibility to compare expression levels of many genes simultaneously. Several commercial microarray platforms are available for these types of experiments but it has also been possible to create custom made gene expression arrays.
Whilst the use of microarrays in gene expression studies is now commonplace, it is evident that new and more comprehensive so-called “next-generation DNA sequencing” (NGS) technologies are starting to replace DNA microarrays for many applications, e.g. in-depth transcriptome analysis.
The development of NGS technologies for ultra-fast genome sequencing represents a milestone in the life sciences (Petterson E et al, Genomics. 2009; 93: 105-11). These new technologies have dramatically decreased the cost of DNA sequencing and enabled the determination of the genome of higher organisms at an unprecedented rate, including those of specific individuals (VVade C M et al Science. 2009; 326: 865-7; Rubin J et al, Nature 2010; 464: 587-91). The new advances in high-throughput genomics have reshaped the biological research landscape and in addition to complete characterization of genomes it is possible also to study the full transcriptome in a digital and quantitative fashion. The bioinformatics tools to visualize and integrate these comprehensive sets of data have also been significantly improved during recent years.
However, it has surprisingly been found that a unique combination of histological, microarray and NGS techniques can yield comprehensive transcriptional or genomic information from multiple cells in a tissue sample which information is characterised by a two-dimensional spatial resolution. Thus, at one extreme the methods of the present invention can be used to analyse the expression of a single gene in a single cell in a sample, whilst retaining the cell within its context in the tissue sample. At the other extreme, and in a preferred aspect of the invention, the methods can be used to determine the expression of every gene in each and every cell, or substantially all cells, in a sample simultaneously, i.e. the global spatial expression pattern of a tissue sample. It will be apparent that the methods of the invention also enable intermediate analyses to be performed.