Gene regulation is a complex occurrence and process in mammalian cell development and function.
In part, such complex regulation depends on proteins and the transcription machinery. The other parts depend on the accessibility of the blueprint and substrate of gene transcription, i.e., the DNA. Most gene regulation at DNA level is controlled at so-called gene promoters and similar regulatory DNA regions that serve as regulatory elements, including binding sites for transcription modulators (enhancers) and conserved non-coding regions, and the like.
One particular control instance upon/on (Greek: epi, {acute over (ε)}π{acute over (ι)}) the DNA in general and regulatory elements in particular is their epigenetic characterization, whereby chromatin plays an important role and describes a complex combination of DNA, RNA, and protein. Together, these combinations make up the chromatin of chromosomes. The major components of chromatin are DNA and histone proteins, although many other chromosomal proteins have prominent roles. The functions of chromatin are to package DNA into a relatively small and compact structure to strengthen the DNA to allow mitosis and meiosis, and to serve as a mechanism to control gene expression and DNA replication. Chromatin structure is also influenced and governed by chemical modifications such as methylation or hydroxymethylation (DNA and proteins) and acetylation (proteins), and non-histone modifications. The major building blocks of chromatin are the nucleosomes, and the location of the nucleosomes is determined by the chromatin structure/modification and is also epigenetically inherited.
The scientific community widely acknowledges that “open”, accessible chromatin corresponds to active genes or an enablement for gene expression. Conversely, densely packed and inaccessible chromatin is widely known as deactivated, transcription locked chromatin. The respective epigenetic control is long-lasting and sustainable compared with the so-called immediate control, such as executed by transcription factors. As far as reporting of activity states is concerned, chromatin state has gained interest as a level of gene control that determines more stable gene regulation than other detectable levels.
While chromatin structure is considered a very stable process of gene regulation, it is—nevertheless—inducible. So, therefore, it can change/be changed during cellular differentiation and similar processes. However, in contrast to the transcription machinery, an induction of chromatin changes appears to remain long-lasting. Furthermore, open and closed chromatin states can have long-ranging effects in the genome, e.g. the chromatin status of an enhancer can have an effect on a not directly adjacent gene.
Even though almost all cells in an individual contain the exact same complement of DNA code, higher organisms must impose and maintain different patterns of gene expression in the various tissue types. On the one hand, some gene regulation must be transitory, depending on the current state of the cell and changes in external stimuli. Persistent regulation, on the other hand, is a primary role of epigenetics—heritable regulatory patterns that do not alter the basic genetic coding of the DNA. Chromatin structure serves as the stable memory for cells and performs a crucial role in maintaining the long-term identity of various cell types.
The immune system plays a pivotal role in the development of tumours. However, understanding the role of T-lymphocytes as major cellular component of the adaptive effector immune response and their counterpart, the T-lymphocyte subset regulatory T cells within the tumour microenvironment remains fragmentary. This is equally true for most cellular components of both innate and adaptive immunity, including natural killer cells. In part this deficiency is due to the lack of reliable technical solutions for specific cell identification and quantification in solid tissue.
Until today, quantitative characterization of immune cells including T-lymphocytes, Tregs and NK cells is mainly performed by mRNA analysis, immunohistochemistry (IHC) and/or FACS analysis. However, all three technologies are afflicted with limitations when applying them in a truly quantitative manner. RNA analysis cannot be associated to cell numbers, since it determines an overall amount of a certain transcript in a sample. Both IHC and FACS analysis depend on a threshold of proteins displayed by the cell before a cell is regarded positive for a certain marker. FACS analysis is additionally problematic for solid tissues, since an experimental prerequisite is its dissociation into a single cell suspension. Despite attempts to automate cell counting by IHC, it is not a reliable method for quantification (Taylor, C. R. and R. M. Levenson, Quantification of immunohistochemistry—issues concerning methods, utility and semiquantitative assessment II. Histopathology, 2006. 49(4): p. 411-24). It is widely acknowledged that those technological difficulties associated with measurements of immune cells in the tumour microenvironment constrain precise studies and the actual immunological status and the prognostic meaning of immune cell infiltration in the tumor environment remain largely unclear.
US 2009-042184 describes methods for diagnosis and monitoring the efficacy of treatment of a cancer comprising detecting an enhanced degree of chromatin modification within Chromosome 2 of the human genome from about map position 2q14.1 to about map position 2q14.3 in a sample derived from a subject. The methods also include detecting a modulated level of expression of a gene within the region of about 2q14.1 to 2q14.3 of Chromosome 2, wherein the gene may be selected from the group consisting of DEAD box polypeptide 18 (DDX18), translin (TSN), v-ral simian leukemia viral oncogene homolog B (RALB), secretin recepto (SCTR), engrailed homolog 1 (EN1), macrophage receptor with collagenous structure (MARCO), protein tyrosine phosphatase non-receptor type 4 (PTPN4), insulin induced gene 2 (INSIG2), inhibin beta B (INHBB), GLI-Kruppel family member 2 (GLI2), FLJ10996, STEAP3, diazepam binding inhibitor (DBI), MGC10993, erythrocyte membrane protein band 4.1 like 5 (EPB41L5), F1114816, or transcription factor CP2-like 1 (TFCP2L1).
US 2007-196843 (which is herewith incorporated by reference) describes methods for identifying and monitoring epigenetic modifications, such as imprinted genes, using microarray based technology. Specifically, the detection of imprinted genes by the presence of overlapping closed and open chromatin markers is described. Disclosed is a method for detecting the loss of imprinting on a genome-wide scale, which is indicative of a variety of medical conditions. Diagnostic assays and chromatin structure markers for identifying gene imprinting and loss thereof are also disclosed.
It is an object of the present invention to provide an improved method to identify and quantify cell types, in particular immune cells. Other objects of the present invention will become apparent to the person of skill upon studying the more detailed description of the present invention as follows.
According to a first aspect thereof, the present invention solves the above object by providing a method for identifying a specific type and/or state of a mammalian cell in a sample obtained from a mammal, comprising a) analyzing the relative amount of accessible chromatin in regions that are specific for a cell-type and/or cellular state in the genome of said cell, b) comparing said relative amount of accessible chromatin said in regions with the relative amount of accessible chromatin in regions in the genome of said cell that are unspecific for a cell-type and/or cellular state, and c) deducing the specific type and/or state of said mammalian cell in said sample based on said comparison.
Preferred is a method for identifying a specific type and/or state of a mammalian cell in a sample obtained from a mammal wherein said method optionally comprises b′) a normalizing of the relative amount of said regions that are specific for a cell-type and/or cellular state and said regions in the genome of said cell that are unspecific for a cell-type and/or cellular state using a control plasmid as described herein.
The normalization using the accessible chromatin in a region or regions in the genome of said cell that are unspecific for a cell-type and/or cellular state (e.g. using the GAPDH-region as described herein) is the main difference and advantage of the present invention when compared with other common methods. Thus, no differential methylation is analyzed, but the “same chromatin DNA” is used. Furthermore, the standardization according to the present invention is independent from the cell-type as analyzed, which allows for an analysis of several cellular types in one sample (as also described below), depending from the choice of cell-types to be analyzed. This approach is not possible in other systems, as these are analyzing only one type of gene (e.g. foxp3 tpg is detected and compared with foxp3 cpg). In contrast, the present invention measures (for example) cd3, foxp3, gnly, 1588nk 1586grc, and/or ccr6 tpg against the “internal standard” Gapdh tpg in order to obtain a relative amount. Furthermore, in a preferred embodiment, the method also allows to measure the number of copies of the genes as analyzed can be measured using a control plasmid containing the regions to be analyzed as described herein.
The DNA blueprint must provide information for each and every function that a cell can possibly fulfill. Therefore, the information for a brain cell is also mandatory present in a liver cell. However, neither during development nor in adulthood there is need for a liver cell to express brain specific genes. Therefore, brain specific genes are permanently switched off in the liver.
As the prime determinator of this long term regulation, epigenetic changes and chromatin fully enable or prohibit stable gene expression by adopting either of the following two chromatin states:
A) Full and permanent inactivation (even of a whole chromosome), as observed with the Barr bodies of the x-chromosome. X-chromosomal inactivation is, however, a prominent exception to the general rule for genes that are switched off, since it switches off only one allele, such as, for example, in the case of the genes foxp3 and CD3.
B) Full and permanent activation of genes are exemplified by housekeeping genes. These genes are permanently required for survival of a cell, such as, but not limited to, glucose metabolism, citric acid cycle, or the urea cycle, the HMG-CoA-reductase-pathway and others. Any living cell can usually not afford to switch those cycles off, unless it (the cell) is dying or goes into lethal resting.
For further differentiation of cells in the body, there are genes that are switched on only in certain cells at certain parts of the body (organs, tissues) and/or at defined time points and periods. These cells differentiate, and by doing so, they allow the expression of certain genes, while rendering others into the large group into fully disabled genes. Principally, these (more) cell type specific genes also segregate into two groups:
i) Genes that are important switches/play an important role during embryonic development and are not required in adulthood, including genes like Oct-3 or Nanog and others, and
ii) Genes with an important role during terminal differentiation to a very specific cell type and/or with an expression pattern characteristic for this cell type that are switched off during development and only enabled for expression upon terminal differentiation to a very specific cell type.
In view of the above, the present invention is based on the finding that every different distinctive cell type contains certain DNA fragments (i.e. chromosomal regions and/or genes) that have accessible chromatin region specific only to this cell type, certain DNA sections that have accessible chromatin they share with one or more groups of cells, and certain DNA fragments that have accessible chromatin that they share with all other living cells in the body. Therein, accessible chromatin in living cells can be defined by one or more of the following properties: i) DNA that contains cytosines which are accessible to conversion by bisulfite (this property is preferred, since the accessibility of DNA to the modification by bisulfite is a stable property that is retained even after purification of DNA and can thus be employed any time to analytical systems that use isolated cellular DNA);    ii) accessibility to DNAse I or similar enzymes (of note, this accessibility can be measured only as long as chromatin structure is still maintained, and thus isolation of DNA destroys this property); and    iii) precipitation by ChIP; or other suitable methods known in the art.
Preferred is a method according to the present invention, wherein said identifying further comprises a relative quantification of said specific cell type and/or state based on said comparison. Further preferred is the method according to the present invention, further comprising a step of determining a specific cell-type and/or cellular state comprising measuring the relative amount of accessible chromatin in the genome of a cell having a known specific cell-type and/or cellular state prior to step a).
The present invention establishes an analytical method and system that identifies and quantifies in one preferred embodiment all possible cells, i.e., all cell types, and in a second embodiment preferably all immune cells, and in a third embodiment the specific immune cells as described herein, by measuring the relative amount of accessible, i.e., active chromatin in cell type and cell status specific regions versus accessible chromatin being accessible in all cell types.
Further preferred is a method according to the present invention, further comprising generating a knowledge base comprising information on the relative amount of accessible chromatin in the genome of cells having a known specific cell-type and/or cellular state. Said knowledge base can be a diagnostic computer, and can be fully or largely automated, such as a robot. Furthermore, the database can be centralized in order to collect information about accessible chromatin in certain cell types or cellular states.
In a further preferred embodiment of the method according to the invention, the analysis comprises measuring the relative amount of accessible chromatin with an assay comprising DNAse I digestion, ChIP chromatin immunoprecipitation microarray (e.g., ChIP), quantitative PCR analysis, selective precipitation or conversion of cytosines with bisulfite, or combinations thereof. In yet another preferred embodiment of the present invention, in order to diagnostically determine the location of nucleosomes, a DNAse I hypersensitivity assay is used as described herein.
Other examples for methods which can be used in the context of the present invention in order to identify areas of open chromatin are as follows. In certain embodiments, a molecule which is capable of binding to an accessible region, but does not necessarily cleave or covalently modify DNA in the accessible region, can be used to identify and isolate accessible regions.
Suitable molecules include, for example, minor groove binders (e.g., U.S. Pat. Nos. 5,998,140 and 6,090,947), and triplex-forming oligonucleotides (TFOs, U.S. Pat. Nos. 5,176,996 and 5,422,251). The molecule is contacted with cellular chromatin, the chromatin is optionally deproteinized, then fragmented, and fragments comprising the bound molecule are isolated, for example, by affinity techniques. Use of a TFO comprising poly-inosine (poly-I) will lead to minimal sequence specificity of triplex formation, thereby maximizing the probability of interaction with the greatest possible number of accessible sequences.
In a variation of the aforementioned methods, TFOs with covalently-attached modifying groups are used. See, for example, U.S. Pat. No. 5,935,830. In this case, covalent modification of DNA occurs in the vicinity of the triplex-forming sequence. After optional deproteinization and fragmentation of treated chromatin, marked fragments are purified by, for example, affinity selection.
In another embodiment, cellular chromatin is contacted with a non-sequence-specific DNA-binding protein. The protein is optionally cross-linked to the chromatin. The chromatin is then fragmented, and the mixture of fragments is subjected to immuno-precipitation using an antibody directed against the non-sequence-specific DNA-binding protein. Fragments in the immuno-precipitate are enriched for accessible regions of cellular chromatin. Suitable non-sequence-specific DNA-binding proteins for use in this method include, but are not limited to, prokaryotic histone-like proteins such as the bacteriophage SP01 protein TF1 and prokaryotic HU/DBPII proteins. Greene et al. (1984) Proc. Natl. Acad. Sci. USA 81:7031-7035; Rouviere-Yaniv et al. (1977) Cold Spring Harbor Symp. Quant. Biol. 42:439-447; Kimura et al. (1983) J. Biol. Chem. 258:4007-4011; Tanaka et al. (1984) Nature 310:376-381. Additional non-sequence-specific DNA-binding proteins include, but are not limited to, proteins containing poly-arginine motifs and sequence-specific DNA-binding proteins that have been mutated so as to retain DNA-binding ability but lose their sequence specificity. An example of such a protein (in this case, a mutated restriction enzyme) is provided by Rice et al. (2000) Nucleic Acids Res. 28:3143-3150.
In yet another embodiment, a plurality of sequence-specific DNA binding proteins is used to identify accessible regions of cellular chromatin. For example, a mixture of sequence-specific DNA binding proteins of differing binding specificities is contacted with cellular chromatin, chromatin is fragmented and the mixture of fragments is immuno-precipitated using an antibody that recognizes a common epitope on the DNA binding proteins. The resulting immuno-precipitate is enriched in accessible sites corresponding to the collection of DNA binding sites recognized by the mixture of proteins. Depending on the completeness of sequences recognized by the mixture of proteins, the accessible immuno-precipitated sequences will be a subset or a complete representation of accessible sites.
In addition, synthetic DNA-binding proteins can be designed in which non-sequence-specific DNA-binding interactions (such as, for example, phosphate contacts) are maximized, while sequence-specific interactions (such as, for example, base contacts) are minimized. Certain zinc finger DNA-binding domains obtained by bacterial two-hybrid selection have a low degree of sequence specificity and can be useful in the aforementioned methods. Joung et al. (2000) Proc. Natl. Acad. Sci. USA 97:7382-7387; see esp. the “Group III” fingers described therein.
As further examples, in selective/limited digestion methods (as also mentioned above), the limited nuclease digestion approach generally involves treating nuclei or chromatin under controlled reaction conditions with a chemical and/or enzymatic probe such that small fragments of DNA are generated from accessible regions. The selective and limited digestion required can be achieved by controlling certain digestion parameters. Specifically, one typically limits the concentration of the probe to very low levels. The duration of the reaction and/or the temperature at which the reaction is conducted can also be regulated to control the extent of digestion to desired levels. More specifically, relatively short reaction times, low temperatures and low concentrations of probe can be utilized. Any of a variety of nucleases can be used to conduct the limited digestion. Both non-sequence-specific endonucleases such as, for example, DNase I, S1 nuclease, and mung bean nuclease, and sequence-specific nucleases such as, for example, restriction enzymes, can be used.
A variety of different chemical probes can be utilized to cleave DNA in accessible regions. Specific examples of suitable chemical probes include, but are not limited to, hydroxyl radicals and methidiumpropyl-EDTA Fe(II) (MPE). Chemical cleavage in accessible regions can also be accomplished by treatment of cellular chromatin with reagents such as dimethyl sulfate, hydrazine, potassium permanganate, and osmium tetroxide, followed by exposure to alkaline conditions (e.g., 1 M piperidine). See, for example, Tullius et al. (1987) Meth. Enzymology, Vol. 155, (J. Ableson and M. Simon, eds.) Academic Press, San Diego, pp. 537-558; Cartwright et al. (1983) Proc. Natl. Acad. Sci. USA 80:3213-3217; Hertzberg et al. (1984) Biochemistry 23:3934-3945; Wellinger et al. in Methods in Molecular Biology, Vol. 119 (P. Becker, ed.) Humana Press, Totowa, N.J., pp. 161-173; and Maxam et al. (1980) Meth. Enzymology, Vol. 65, (L. Grossman and K. Moldave, eds.) Academic Press, New York, pp. 499-560. When using chemical probes, reaction conditions are adjusted so as to favor the generation of, on average, two sites of reaction per accessible region, thereby releasing relatively short DNA fragments from the accessible regions.
As with the previously-described methods, the resulting small fragments generated by the digestion process can be purified by size (e.g., gel electrophoresis, sedimentation, gel filtration), preferential solubility, or by procedures which result in the separation of naked nucleic acid (i.e., nucleic acids lacking histones) from bulk chromatin, thereby allowing the small fragments to be isolated and/or cloned, and/or subsequently analyzed by, for example, nucleotide sequencing.
In one embodiment of this method, nuclei are treated with low concentrations of DNase I; DNA is then purified from the nuclei and subjected to gel electrophoresis. The gel is blotted and the blot is probed with a short, labeled fragment corresponding to a known mapped DNase I hypersensitive site located, for example, in the promoter of a housekeeping gene. Examples of such genes (and associated hypersensitive sites) include, but are not limited to, those in the genes encoding rDNA, glyceraldehyde-3-phosphate dehydrogenase (GAPDH) and core histones (e.g., H2A, H2B, H3, H4). Alternatively, a DNA fragment size fraction is isolated from the gel, slot-blotted and probed with a hypersensitive site probe and a probe located several kilobases (kb) away from the hypersensitive site. Preferential hybridization of the hypersensitive site probe to the size fraction is indicative that the fraction is enriched in accessible region sequences. A size fraction enriched in accessible region sequences can be cloned, using standard procedures, to generate a library of accessible region sequences.
As used herein, the term “accessible chromatin” refers to a region of chromatin in which the DNA base cytosine is generally—and independent of the nucleotide context—accessible to conversion by bisulfite to uracil. Such regions are preferably also defined by the fact that the unpurified DNA is at least 10-fold more sensitive to the action of an endonuclease, e.g., DNAse I, than surrounding regions. Because opening of the chromatin is a prerequisite to transcription activity, DNAse I sensitivity provides a measure of the transcriptional potentiation of a chromatin region; greater DNAse sensitivity generally corresponds to greater transcription activity. DNAse hypersensitivity assays are described by Weintraub and Groudine, 1976, Science 193: 848-856, incorporated herein by reference. “Highly transcribed” or “highly expressed” regions or genes are regions of open chromatin structure that are transcribed. Recently researchers have found that regions that are rich in genes tend to be in open chromatin structures, whereas regions poor in genes tend to be in compact chromatin. However, open chromatin can contain inactive genes and compact chromatin can contain active genes. (See Bickmore, et al (2004) Chromatin Architecture of the Human Genome: Gene-Rich Domains Are Enriched in Open Chromatin Fibers, Cell, Vol 118, 555-566, 3.)
Non-completely bisulfite-converted DNA is DNA that retains cytosine residues in its sequence after the treatment with bisulfite instead of converting it to uracil at these positions. Due to the chemical properties of the DNA it is acknowledged in the state of the art that this incomplete conversion is only occurring in the context of CpG dinucleotides and is due to restricted availability of this base caused by epigenetic modifications, such as methylation, hydroxymethylation or other covalent modifications of the cytosine base or structural changes.
Sodium bisulfite treatment or treatment with similar salts of genomic DNA was performed according to Olek et al. (Olek, A., Oswald, J., Walter, J. (1996) Nucleic Acids Res. 24, 5064-5066) with minor modifications, resulting in the conversion of chemically unmodified or “accessible” cytosine to uracil, whereas cytosines covalently modified are inert or “inaccessible” to bisulfite salts and remain unchanged when forwarded to this reaction. So therefore, only for unmodified and “accessible” cytosines in the DNA—a term that is congruently used here to accessible chromatin regions—the uracil that is generated by bisulfite or similar reactions is replaced by thymine in a subsequent PCR amplification due to the typical base pairing of DNA and RNA. Thus, detection of a “C” in sequencing reactions reflects the “inaccessibility” of chromatin and thus the modification of the genomic DNA in that DNA section. Hence it reflects closed an inaccessible chromatin. In contrast, detection of a “T” at the same site instead, reflects the absence of stable modifications and corresponds to accessible chromatin at that site and the genomic cytosine. Most references attribute this inaccessibility to methyl modifications of the 5′C position in the cytosine, although surprising new research has shown that cytosine “methylation” is clearly not the only cytosine modification (see e.g. BMC Genomics 2006, 7:98) that protects DNA integrity and renders chromatin inaccessible. It is, however, clear that the absence of Cytosine residues in bisulfite converted DNA is a definitive hint for a fully accessible, functionally fully active and not-chemically modified cytosine residue, regardless of what may have been the original modification at that site.
The dinucleotide CpG is severely underrepresented in mammalian genomes relative to its expected statistical occurrence frequency of 6.25%. In addition, the bulk of CpG residues in the genome are methylated (with the modification occurring at the 5-position of the cytosine base). As a consequence of these two phenomena, total human genomic DNA is remarkably resistant to, for example, the restriction endonuclease Hpa II, whose recognition sequence is CCGG, and whose activity is blocked by methylation of the second cytosine in the target site.
An important exception to the overall paucity of demethylated Hpa II sites in the genome are exceptionally CpG-rich sequences (so-called “CpG islands”) that occur in the vicinity of transcriptional start sites, and which are accessible to the transcription machinery (i.e., open chromatin and no cytosine modification) in the promoters of active genes. (Jones et al. (1999) Nature Genet. 21:163-167. Aberrant hypermethylation of such promoter-associated CpG islands is a well-established characteristic of the genome of malignant cells. Robertson et al (2000) Carcinogenesis 21:61-467.
Accordingly, another option for generating accessible regions relies on the observation that, whereas most CpG dinucleotides in the eukaryotic genome are methylated at the C5 position of the C residue, CpG dinucleotides within the CpG islands of active genes are unmethylated. See, for example, Bird (1992) Cell 70:5-8; and Robertson et al. (2000) Carcinogenesis 21:461-467. Indeed, methylation of CpG is one mechanism by which eukaryotic gene expression is repressed. Accordingly, digestion of cellular DNA with a methylation-sensitive restriction enzyme (i.e., one that does not cleave methylated DNA), especially one with the dinucleotide CpG in its recognition sequence, such as, for example, Hpa II, generates small fragments from unmethylated CpG island DNA. For example, upon the complete digestion of genomic DNA with Hpa II, the overwhelming majority of DNA will remain >3 kb in size, whereas the only DNA fragments of approximately 100-200 bp will be derived from demethylated, CpG-rich sequences, i.e., the CpG islands of active genes. Such small fragments are enriched in regulatory regions that are active in the cell from which the DNA was derived. They can be purified by differential solubility or size selection, for example, cloned to generate a library, and their nucleotide sequences determined and placed in one or more databases. Arrays comprising such sequences can be constructed.
Digestion with methylation-sensitive enzymes, optionally in the presence of one or more additional nucleases, can be conducted in whole cells, in isolated nuclei, with bulk chromatin or with naked DNA obtained after stripping proteins from chromatin. In all instances, relatively small fragments are excised and these can be separated from the bulk chromatin or the longer DNA fragments corresponding to regions containing methylated CpG dinucleotides. The small fragments including non-methylated CpG islands can be isolated from the larger fragments using various size-based purification techniques (e.g., gel electrophoresis, sedimentation and size-exclusion columns) or differential solubility (e.g., polyethyleneimine, spermine, spermidine), for example. As indicated above, a variety of methylation-sensitive restriction enzymes are commercially available, including, but not limited to, Dpn II, Mbo I, Hpa II and Cla I. Each of the foregoing is available from commercial suppliers such as, for example, New England BioLabs, Inc., Beverly, Mass.
In another embodiment, enrichment of regulatory sequences is accomplished by digestion of deproteinized genomic DNA with agents that selectively cleave AT-rich DNA. Examples of such agents include, but are not limited to, restriction enzymes having recognition sequences consisting solely of A and T residues, and single strand-specific nucleases, such as S1 and mung bean nuclease, used at elevated temperatures. Examples of suitable restriction enzymes include, but are not limited to, Mse I, Tsp509 I, Ase I, Dra I, Pac I, Psi I, Ssp I and Swa I. Such enzymes are available commercially, for example, from New England Biolabs, Beverly, Mass. Because of the concentration of GC-rich sequences within CpG islands (see, above), large fragments resulting from such digestion generally comprise CpG island regulatory sequences, especially when a restriction enzyme with a four-nucleotide recognition sequence consisting entirely of A and T residues (e.g., Mse I, Tsp509 I), is used as a digestion agent. Such large fragments can be separated, based on their size, from the smaller fragments generated from cleavage at regions rich in AT sequences. In certain cases, digestion with multiple enzymes recognizing AT-rich sequences provides greater enrichment for regulatory sequences.
Alternatively, or in addition to a size selection, large, CpG island-containing fragments generated by these methods can be subjected to an affinity selection to separate methylated from unmethylated large fragments. Separation can be achieved, for example, by selective binding to a protein containing a methylated DNA binding domain (Hendrich et al. (1998) Mol. Cell. Biol. 18:6538-6547; Bird et al. (1999) Cell 99:451-454) and/or to antibodies to methylated cytosine. Unmethylated large fragments are likely to comprise regulatory sequences involved in gene activation in the cell from which the DNA was derived. As with other embodiments, polynucleotides obtained by the aforementioned methods can be cloned to generate a library of regulatory sequences and/or the regulatory sequences can be immobilized on an array.
Regardless of the particular strategy employed to purify the unmethylated CpG islands from other fragments, the isolated fragments can be cloned to generate a library of regulatory sequences. The nucleotide sequences of the members of the library can be determined, optionally placed in one or more databases, and compared to a genome database to map these regulatory regions on the genome.
In yet a further preferred embodiment of the method according to the invention, the said conversion of cytosines further comprises the analysis of chemically unmodified DNA, in particular loss of gene imprinting in at least one region of the chromosome. In order to determine chemically unmodified CpG positions, any known method to identify and discriminate unmodified DNA can be used, as described herein and in the respective literature. In a preferred embodiment of the method according to the present invention, the analysis of the accessibility status comprises a method selected from methylation specific enzymatic digests, bisulphite sequencing, analysis selected from promoter modification analysis, CpG island analysis, MSP, HeavyMethyl, MethyLight, Ms-SNuPE or any other method relying on a detection of amplified DNA. These methods are well known to the person of skill, and can be found in the respective literature.
The term “region” in the context of methylation analysis refers to a part of the chromosome containing CpG positions, in particular in parts that are relevant for the regulation of the expression of genes, such as in promoters and other regulatory elements. In particular, promoter regions as well as exon intron borders can be considered as respective regions. These regions at the same time are a preferred subgroup of regions to analyze the chromatin structure. In a further preferred embodiment of the method according to the invention, the regions that are analyzed that are specific for a cell-type and/or cellular state in the genome of said cell are selected from regions comprising a gene selected from FOXP3, GNLY, CD3, platelet glycoprotein IX (GP9); low affinity immunoglobulin epsilon Fc-receptor (FCER2); protein S100-P (S100 calcium-binding protein P); homeodomain-interacting protein kinase 3 (HIPK3); transmembrane 4 L6 family member 19 (TM4SF19); CD160 antigen precursor (Natural killer cell receptor BY55) (CD160); LIM domain-binding protein 2 (LDB2); CD19; CD3; CD8; CD15; CD14; CD56; and CD4. In a further preferred embodiment of the method according to the invention, the regions that are analyzed that are specific for a cell-type and/or cellular state in the genome of said cell are selected from the amplicons for the genes as analyzed as described herein (SEQ ID Nos. 16 to 22), as well as the respective ROIs (regions of interest, SEQ ID Nos. 23 to 29) which include the amplicon sequences.
As other examples, lung-specific gene products include the lung surfactant proteins SP-A, SP-B, SP-C, SP-D, and Clara cell secretory protein (CCSP). Other genes for certain cell-types are know to the person of skill, and can be identified using, e.g., gene expression profiling, preferably using chip assay technology, or are listed in respective databases, such as the LSPD (Liver Specific Gene Promoter Database, http://rulai.cshl.org/LSPD/index.html) or described in the literature, such as in Abbas et al. (Abbas A R, Baldwin D, Ma Y, Ouyang W, Gurney A, Martin F, Fong S, van Lookeren Campagne M, Godowski P, Williams P M, Chan A C, Clark H F. Immune response in silico (IRIS): immune-specific genes identified from a compendium of microarray expression data. Genes Immun. 2005 June; 6(4):319-31.). The data can be preferably entered into a knowledge base, as described above.
Whereas previous analyses focused on the measurement of the methylation status, i.e., the difference of copy numbers found for methylated to unmethylated DNA in a particular DNA region, the most preferred embodiment of the present invention is independent from any modification occurring in genomic DNA. By determining fully accessible chromatin and DNA, methylation or any other modification becomes irrelevant, since only those DNA copies are accessed that are fully unmodified. Of these fully unmodified DNA, the present invention analyzes cell type specifically accessible genes and/or regions and compares them to genes and/or regions that are accessible in a larger (preferably) arbitrarily selected subgroup of cells, which may consist of two, three or all cell types in an organism (e.g., accessible FOXP3 compared with accessible CD3, compared with accessible GAPDH). Based on such system, the relative quantification of cell numbers of different cell subtypes is possible. Such direct multiplexable inter-subtype comparison is only possible with the system according to the present invention, and is not possible when differential methylation or other techniques are used.
As stated above, the present invention uses the accessibility of DNA as a prerequisite for the activity of genes. The present invention demonstrates this using GAPDH as a housekeeping gene (HKG) and a model for the general and permanent accessibility of the DNA in the promoter of HKGs. This permanent accessibility of the DNA in the promoter does not exist in cell type specific genes. Thus, a measurement, determination and analysis of the ratio of the number of specific open (accessible) genes to the number of open HKGs serves as a quantifier for the relative cell number of specific cells or specific cellular statuses. Preferably, fully bisulfite converted DNA can be used for this purpose, by comparing the amount of fully bisulfite converted DNA in a cell specific gene locus with a cell unspecific (or a pre-selected subgroup of these genes, such as, for example, the CD3 subgroup or the like) gene locus, thus reflecting the specific cells or specific cellular states, and—in addition—preferably also cell counts and quality.
A housekeeping gene (HKG) is typically a constitutively expressed gene that is transcribed at a relatively constant level and in all living cells. The housekeeping gene's products are typically needed for maintenance of the (any) cell. Expression of housekeeping genes are used as internal standards in (reverse transcription) quantitative polymerase chain reactions since it is generally assumed that their expression is unaffected by experimental conditions. Some common housekeeping genes utilized include: Actin, GAPDH, aldolase, hexokinase, cyclophilin. Housekeeping genes possess CpG-rich islands at the promoter region that are defined by their chromatin accessibility (and transcriptional activity) in all cell types, whereas cell-specific genes are shut down and their chromatin is inaccessible to the transcription machinery in all tissues except the tissue where the gene product is required and, hence, the gene is expressed. These chromatin accessibility patterns obviously correlate with gene expression. Therefore, in a further preferred embodiment of the method according to the invention, said regions that are unspecific for a cell-type and/or cellular state are selected from regions comprising a housekeeping gene, such as GAPDH.
In a preferred example, both types of genes (i.e., those with a cell type specifically accessible/inaccessible chromatin and one or more of a generally accessible chromatin structure) in their accessible version are forwarded to conversion by bisulfite, amplified and cloned into a plasmid. Further preferred is a plasmid that contains constructs of all genes in their fully accessible states in question in the form equivalent to its fully bisulfite converted amplificate, a recombinant control plasmid is described in the examples and Figures below. In a preferred aspect, quantification for a real time PCR assay is achieved by providing such standardizing plasmid, which is quantified by absorption measurement in nanodrop or alternative methods such as UVette analysis or Q-bit system (Invitrogen), by determination of its amount by the optical density. Based on this measurement, a concentration of the plasmid is determined and a standard measurement row is made by the application of a serial dilution of the measured plasmid. By this means (i.e., when a single plasmid contains all tested genes), a standard is prepared and determined (provided) that is exactly equimolar for all genes on the plasmid. While this absolute equimolarity is a preferred embodiment, and the present inventors propose to use this standardization system for all samples, an analysis is also envisaged with a similar system, if various different standards are employed, which might be on different plasmids or even do not consist of plasmid or DNA standards. Most preferred, said system comprises suitable regions to be amplified of CD3, FOXP3, GLNY, CCR6, CAMTA1, GP9, FCER2, S-100-P, HIPK-3, TM4SF19; CD160 antigen precursor, LDB2, CD19, and GAPDH.
She et al (in: She X, Rohl C A, Castle J C, Kulkarni A V, Johnson J M, Chen R. Definition, conservation and epigenetics of housekeeping and tissue-enriched genes. BMC Genomics. 2009 Jun. 17; 10:269) describe housekeeping genes (HKG) as constitutively expressed in all tissues while tissue-enriched genes (TEG) are expressed at a much higher level in a single tissue type than in others. HKGs serve as valuable experimental controls in gene and protein expression experiments, while TEGs tend to represent distinct physiological processes and are frequently candidates for biomarkers or drug targets. The genomic features of these two groups of genes expressed in opposing patterns may shed light on the mechanisms by which cells maintain basic and tissue-specific functions. Gene expression profiles of 42 normal human tissues on custom high-density microarrays were generated to systematically identify 1,522 HKGs and 975 TEGs and compile a small subset of 20 housekeeping genes which are highly expressed in all tissues with lower variance than many commonly used HKGs.
In another aspect of the method according to the present invention, the biological sample is selected from the group consisting of blood or fractions thereof, saliva, buccal, tears, semen, urine, sweat, fecal material, skin and hair. Alternatively, fixed samples on, for example, histological slides, can also be used.
In yet another preferred aspect of the method according to the present invention, the cell type is selected from an immune cell, such as a CD19+ B cell, CD3+ CD8+ cytotoxic T cell, CD15+ granulocyte, CD14+ monocyte, CD56+ natural killer cell, CD4+ helper T cell; kidney cell; bone cell; neuronal cell; blood cell; lung cell; colon cell; and a precursor of any of these, excluding human embryonic stem cells.
In the context of the present invention, a “cell status” shall mean the biological activity of the cell in its life cycle, such as cell division, apoptosis, resting stage, chromosomal replication, production of enzymes and/or secretion of factors, and the like.
In yet another preferred aspect of the method according to the present invention, the method further comprises a diagnosis of a predisposition to a disease or a disease based on said identification. This may be achieved by determining the amount of cells with a specific cell identifier compared with the amount of all cells.
Treating a disease includes inhibiting or preventing the partial or full development or progression of a disease (e.g., ovarian cancer and/or breast cancer), for example in a person who is known to have a predisposition to a disease. An example of a person with a known predisposition is someone having a history of cancer in his or her family, or who has been exposed to factors that predispose the subject to a condition, such as exposure to radiation. Furthermore, treating a disease refers to a therapeutic intervention that ameliorates at least one sign or symptom of a disease or pathological condition, or interferes with a pathophysiological process, after the disease or pathological condition has begun to develop. By way of example, in cancer a treatment can be selected from chemotherapy, radiotherapy, or surgical removal of the affected tissue and/or surrounding area, and combinations of the given treatment options. Another aspect of the present invention relates to a method for diagnosing or prognosing development or progression of a disease, such as cancer in a subject, comprising a method according to present invention, and diagnosing or prognosing development or progression of said disease cancer based on said determinations. This in a preferred embodiment include the determination of the immune status of an individual, for which the present inventors assume that this method provides the best known means. The knowledge of the immune status may also aid in predicting the treatment as well as the general prognosis of a patient.
The results of the comparisons as above can also be used to diagnose or provide a prognosis of progression of a disease, such as cancer, in a subject. The patterns of expression can also be used to screen for therapeutic agents for the treatment of a disease, such as cancer, or monitoring response to therapy in a subject, by looking for a return of the patterns of expression of the a disease, such as cancer, toward a non-tumor tissue pattern.
Preferred is a method according to the present invention, wherein the disease is selected from the group consisting of immune diseases or conditions, cancer, birth defects, mental retardation, obesity, neurological disease, diabetes, and gestational diabetes. Preferably, said cancer is selected from the group consisting of colorectal cancer, esophageal cancer, stomach cancer, leukemia/lymphoma, lung cancer, prostate cancer, uterine cancer, breast cancer, skin cancer, endocrine cancer, urinary cancer, pancreatic cancer, other gastrointestinal cancer, ovarian cancer, cervical cancer, head cancer, neck cancer, and adenomas.
Yet another aspect of the invention relates to a method for monitoring the effect of a drug on the relative amount of a specific type and/or the state of a mammalian cell in a sample obtained from a mammal, comprising performing the method according to the present invention in a mammal treated with said drug, and comparing the relative amount of said specific type and/or the state of said mammalian cell with an untreated sample. The presence and change of numbers, quantities or ratios of cells with an open chromatin at particular marker genes as discussed herein (either increase or decrease) is indicative for a response and/or likelihood of a response of said patient to said treatment. No changes of the cell specifically accessible gene copies usually indicate no effect of the therapy as chosen. Monitoring or predicting can also be combined with other methods, such as, for example, CA125 blood tests and/or CT scans or ultrasound scans that are known in the art
Preferred is a method according to the present invention, wherein the biological sample is selected from the group consisting of blood or fractions thereof, saliva, buccal, tears, semen, urine, sweat, faecal material, skin and hair as also mentioned above. Furthermore, the sample can comprise any type of cells from healthy or diseased solid tissue, such as but not limited to, heart, liver, brain, kidney, bladder, muscle, cartilage, bone, colon, stomach, breast, pancreas, and/or thyroid. Preferably, the cell type is selected from an immune cell, such as a CD19+ B cell, CD3+ CD8+ cytotoxic T cell, CD15+ granulocyte, CD14+ monocyte, CD56+ natural killer cell, CD4+ helper T cell; kidney cell; bone cell; neuronal cell; blood cell; and a precursor of any of these, excluding totipotent human embryonic stem cells.
Again, preferred is a method according to the present invention, wherein the disease is selected from the group consisting of immune diseases or conditions, cancer, birth defects, mental retardation, obesity, neurological disease, diabetes, and gestational diabetes. Preferably, said cancer is selected from the group consisting of colorectal cancer, esophageal cancer, stomach cancer, leukemia/lymphoma, lung cancer, prostate cancer, uterine cancer, breast cancer, skin cancer, endocrine cancer, urinary cancer, pancreatic cancer, other gastrointestinal cancer, ovarian cancer, cervical cancer, head cancer, neck cancer, and adenomas.
Still another aspect then relates to a method for determining the presence of a cell in a biological sample, comprising the step of determining the chromatin structure of at least one of the genes selected from the group of GP9, FCER2, S100P, HIPK3, TM4SF19, CD160, and LDB2, whereby a profile (or panel) is generated that is specific for the cellular characteristics of a selected cell.
Thus, in another aspect thereof, the present invention refers to specific marker genes as mentioned above whose chromatin structure as determined using, for example, real time PCR analysis of completely bisulfite converted, totally cytosine free DNA, was surprisingly found to be indicative for certain cell types and statuses. In one aspect of the present invention, these genes can therefore be used as preferred examples to characterize cell types and statuses in biological samples that contain a mixture of, for example, cells of unknown type and/or status.
Preferred is a method according to the present invention, wherein the chromatin structure of at least two of said genes, preferably one cell type specific and one generally accessible, is determined. Preferred is further a method according to the present invention, wherein the chromatin structure of all of said genes is determined. The genes that form the basis of the present invention are preferably to be used to form a “gene panel”, i.e. a collection comprising their respective informative chromatin structure, for a particular cell type or a group (set) of cell types, respectively. The formation of gene panels allows for a quick and specific analysis which is indicative for particular cell types and statuses. The gene panel(s) as described and employed in this invention can furthermore be used with surprisingly high efficiency for the determination of the presence of a cell in a biological sample. In addition, the use of multiple CpG sites from a diverse array of genes allows for a high degree of sensitivity and specificity in comparison to single gene diagnostic and detection tools.
In another aspect thereof, the method according to the present invention further comprises a specific selection of the genes to be determined based on the quality of the chromatin structure analysis for a selected cell type. This is, while the broad panel with the fourteen genes may be employed for distinguishing a group of cell types with a high resolution for distinguishing and quantifying accuracy, it is possible to use only a fraction of the panel for the identification and quantification of a smaller group of cell types. In cases where it is known that not all cell types may be present or relevant for the analysis, a selected panel consisting of less markers than all fourteen may be used. For example, in a case where a differentiation experiment is performed with stem cells that produce only a subset of cell types within the respective differentiation pathway, there is demand only for the analysis of some cell types and their differential chromatin structure patterns. In this case, a mini-marker panel may be generated for this purpose, with less than the full set of fourteen marker regions to achieve both identification and quantification of the investigated population. In the most extreme case, such mini panel may—in dependence on the addressed question—consist of no more than one marker region. This minimal scenario is feasible in two situations. On one hand such situation could occur when only two cell types are possibly present in a heterogeneous cell population. In this case, a single marker region that has a consistent and distinctive chromatin structure between the two cell types is capable of distinguishing them. Alternatively, such minimal panel is applicable in a more heterogeneous mixture (i.e. more than two cell types are possibly present), when the single investigated region is known to be exclusively accessible in one particular cell type, while it is in the opposite state in all other cells. In this latter case and when the addressed question only refers to the quantification and analysis of the presence of this particular one cell type, a single region from the large panel of regions is eligible.
It is also an alternative embodiment of the invention to base the determination of the presence of a cell type in a biological sample on a combination of different regions as described above. To increase the likelihood of a correct determination, it is preferred that the distinctive chromatin structure of several selected additional regions is investigated. It is preferred that in such a “mini panel” comprising one or more regions, additional regions are used up to a number of seven. The preferred number of regions to be added to such a mini panel, would be one or more out of the regions of the genes as described above. Especially preferred would be a combined analysis of up to two regions as described, in order to distinguish between cell types and statuses with a sufficient high level of quality of said analysis. Other preferred combination would comprise the chromatin structure analysis of 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, and 14 regions. Wherever in the following the invention is described specifically for a particular cell type that is detected, it is meant to also include a combination of one regions of the above panel with one or more of the regions as indicated.
Certain markers demonstrated regions of specific genes that were particularly distinctive in some cell types relative to the other cell types tested. For example, regions, in particular open chromatin regions, of                as an example for 2 regions, the regions of GP9 and FCER2 are most accessible in CD19+ B cells (BCL05),        as an example for 2 regions, the regions S100P, HIPK3 are most accessible in CD15+ granulocytes (GRC01),        as an example for 1 region, the region of TM4SF19 is most accessible in CD14+ monocytes (MOC02), and        as an example for 2 regions, the regions of CD160, and LDB2 are most accessible in CD56+ natural killer cells (NKC02) (see FIG. 5).        
As a preferred example, the regions of the two genes S100P and HIPK3 thus can be used in order to distinguish between CD15+ granulocytes and the other immune cells as described above. Similarly, CD160 thus can be used in order to distinguish between CD56+ natural killer cells and the other immune cells as described above (see FIG. 5).
The present invention may further be described in certain embodiments as a kit for use in determining the presence or status of a cell in a biological sample through testing of a biological sample. A representative kit may comprise one or more nucleic acid segments as described above that selectively hybridize to the DNA of at least one of the above regions, and a container for each of the one or more nucleic acid segments. In certain embodiments the nucleic acid segments may be combined in a single vial. In further embodiments, the nucleic acid segments may also include a pair of primers for amplifying the target region. Such kits may also include any buffers, solutions, solvents, enzymes, nucleotides, or other components for hybridization, amplification or detection reactions. Preferred kit components further preferably include reagents for methylation analysis, reverse transcription-PCR, in situ hybridization, Northern analysis, restriction polymorphism analysis (RPA), DNAse I digestion, ChIP Chip®, chromatin immunoprecipitation microarray, quantitative PCR analysis, selective precipitation or conversion of cytosines with bisulfite, or combinations thereof. Preferred kits may also include any other components for determining the chromatin structure of at least one of the genes selected from the group of CD3, FOXP3, GLNY, CCR6, CAMTA1, GP9, FCER2, S-100-P, HIPK-3, TM4SF19; CD160 antigen precursor, LDB2, CD19 and GAPDH.
The kits according to the present invention may also contain: 1. Chemicals (bisulfite, etc.) for processing the cell samples; 2. Procedure protocols; 3. Oligonucleotide probes, amplicons, blockers or extension primers according to the present invention that will detect marker regions relevant to a particular cell type or status. The oligonucleotides would be constructed to generate a signal on a commonly available detection platform, such as Real Time-PCR (RT-PCR) or Single Base Extension (SBE). Each signal indicates the level of methylation at a particular target site in the sample. As an alternative, probes according to the described nucleic acids could be produced for usage on a chip; 4. A bioinformatic tool to process the results. This, e.g., software might normalize the signals from the raw data, contain a result matrix for interpretation of the read-out, or implement various algorithms that calculate, for example, cell type proportions, or potency predictions.
Another preferred aspect of the present invention relates to a kit for identifying a specific type and/or state of a mammalian cell in a sample obtained from a mammal comprising materials for performing a method according to the present invention. In one preferred embodiment according to the present invention, the kit comprises a) a bisulfite reagent, and b) materials for the methylation analysis of CpG positions in the regions of interest. The person of skill will furthermore be able to select materials for specific subsets of CpG positions in order to minimise the amount of sites to be analyzed. The kit can be a diagnostic kit.
The data as generated in the context of the present invention shows that DNA accessibility to bisulfite conversion in the CD3 gene defines CD3 expressing T-lymphocytes, as much as Foxp3 accessibility defines Tregs (Huehn, J., J. K. Polansky, and A. Hamann, Epigenetic control of FOXP3 expression: the key to a stable regulatory T-cell lineage? Nat Rev Immunol, 2009. 9(2): p. 83-9), since the observed correlation coefficient of above 0.8 equals the specificity of the Foxp3 TSDR to Tregs. Remaining result variability between FACS and epigenetic measurements may be attributed in part to systematic variations, including lacking precision in defining the general leukocyte count by flow cytometry. Since flow cytometric cell counting was performed about 6 hours after blood draw and epigenetic analyses one year later, the inventors propose that retrospective analyses of frozen blood samples become feasible using epigenetic immuno-phenotyping. Expression of granulysin is a functional characteristic of cytotoxic T and NK cells, which has been reported to be impaired during cancer development (Kishi, A., et al., Differential expression of granulysin and perforin by NK cells in cancer patients and correlation of impaired granulysin expression with progression of cancer. Cancer Immunol Immunother, 2002. 50(11): p. 604-14) and has been suggested to correlate with the prognosis of tumour patients (Galon, J., et al., Type, density, and location of immune cells within human colorectal tumors predict clinical outcome. Science, 2006. 313(5795): p. 1960-4). The data according to the invention shows that accessibility to bisulfite conversion of the granulysin gene is an intrinsic property of CD3−CD56+ NK and CD+CD56+ NKT cells that correlates to the amount of NK and NKT cells in whole blood as detected by flow cytometric analysis. Thus, and despite the fact, that epigenetic accessibility of the granulysin gene is also shared by a fraction of cytotoxic T cells, this epigenetic marker appears to be a good indicator for a NK cell led immune reactions. In order to further enable epigenetic immune phenotyping, the inventors developed a general standard system for whole cell counting. Equally to mRNA and certain protein technologies the inventors used a housekeeping gene, the regulatory region of GAPDH, as normalizing standard. High conservation, putative functional non-redundance and structural similarity to the cell specific epigenetic marker systems—i.e., presence of potentially differential bisulfite convertibility—were the criteria for this selection. The inventors show that the GAPDH is fully accessible in all living cells, and can therefore be used to determine the overall cell count. To warrant equimolarity of the standardisation process, the inventors have cloned the TpG variants of FOXP3, CD3, GNLY and GAPDH on a single plasmid. Accordingly, quantification becomes fully comparable, regardless of which cell type should be quantified. The inventors thus provide an integral standard system with which all cell specific parameters can be normalized equally and thus also related to each other.
When the inventors compared the infiltration of immune cells between healthy and tumour tissues, for all three unrelated tumour entities a dramatic increase of Treg numbers is found. For both lung and colorectal cancers, about a doubling of Tregs is observed, and in ovarian tissue the increase is more than 10 fold. In lung and colorectal tumours, the inventors also found a statistically significant reduction of the overall T-lymphocyte counts compared to healthy tissue. Both decrease of CD3 cells and increase of Tregs are in agreement with the pathological inability of the body to appropriately counter tumour development. This view is supported by the fact that the relative amount of Tregs within the CD3 cell compartment behaves equal for all three tumour entities, despite the vastly different cell counts in ovarian compared to lung and colorectal tissue. The observed average increase in lung, colon and ovarian cancer compared to the respective healthy tissues for the relative Treg to overall T cell ratio is at 2.5, 2.8 and 5.4 fold, respectively. For the matched pair analysis, possible only in bronchial and colorectal tissues, the inventors find app. 95% and 81% of the measured pairs (43 out of 46 pairs and 39 out of 48, respectively) show an increase in the bronchial tumour compared to the healthy adjacent tissue and more than in the colorectal cancer samples. Since sample quality of colorectal tissue was comparably low, one can argue that the differences in the matched pair analysis stems from variances in tissue quality rather than from a biological phenomenon. Thus, the data suggest a remarkable disturbance and a consistent shift of the Treg ratios in all three tumour entities. Taken together with the accepted high specificity of FOXP3 TSDR assay for the measurement of Tregs, the inventors considered this dysbalance an essential determinant rather than a bystander effect of the development of solid tumours. Thus, the data portend that the ratio of the tolerogenic-to-effector immune system may also be a strong candidate for a targeted anti-tumour strategy, whereupon it is important to consider that the known surface markers may not be the ideal candidates since they deplete both activated effector and regulatory T cells. It is further observed that tissue infiltration of all three measured cell subpopulations is significantly higher in lung and colon than in ovarian tissue. Overall this is both true for healthy and tumour tissue. However, while healthy ovarian tissue has extremely low lymphocyte counts (app. 15× less Treg, 7.5× less CD3 cells, and 10 times less granulysin cells when compared to bronchial cancers), in tumour tissues, the difference is significantly lower (3× less Treg, 2.8× less CD34 cells and 3.5× less granulysin positive cells, when compared to bronchial cancers). The low lymphocyte count in healthy ovaries is likely due to significantly low vascularization, compared to lung and colon, which are highly vascularized already in the healthy states. The observed adaptation within the tumour tissue, suggests that vascularisation in ovaries significantly changes tissue composition, while this is not true for either lung or colorectal tissue, which overall appear to have a stable vascularisation. The relative decrease of CD3 and granulysin positive cells further support the view that the immune system is repelled during tumourgenesis and only a very limited adaptive and innate immune response is mounted, with the increasing immune cell count in ovarian cancer credited to vascularization rather than an increased immune response. The data corroborate various previous reports that indicated that an increased number of CD3 positive cells is advantageous to effectively counteract tumours and thus leads to a better prognosis when tested in a univariate analysis. The inventors also show that this trend is independent of the particular tumour entity. Thus, the data support the notion that an enhancement of the adaptive immune response appears to be a useful anti-tumour approach. Despite dramatic influence of the Treg proportion and the Treg to CD3 ratio in tumour tissues when compared to healthy tissue of the same organ, the inventors did not see an inverse relation between Treg count and survival as had been reported previously. Instead, a statistically non-significant, but clear trend for a survival benefit of tumour patients with higher Treg levels was found compared to patients with lower Treg counts in two studies with 110 ovarian and 86 colorectal cancer patients. This trend is more significant for ovarian cancer, where both Treg count and Treg to CD3 cell ratio point to a better survival for patients with higher Tregs. The data for colorectal cancer are more ambiguous, and only a slight trend is observed for the overall Treg count, and no trend at all is observed for the ratio of Treg to CD3.
Combining the different datasets uncovers an intrinsic difficulty for the previously reported correlation: the inventors observe a direct linear correlation between Treg number and the overall T-lymphocytes count in all healthy and tumorous tissues. Although this direct correlation is varying in strength and ranges from 0.325 for colorectal cancer to 0.76 in ovarian cancers, it is statistically significant for all healthy and diseased tissues. Therefore, it is not plausible to assume that an increasing CD3 level and, at the same time, decreasing Tregs leads to an improved outcome—at least not if tested univariately as had been done in previous studies. However, differing results between the present and previous data also require also functional and technical explanations. With respect to this, epigenetic counting usually is a relative and three dimensional measures, whereas cell counting of absolute cell numbers regardless of overall cell density in tissues. In this case, the previous data may correlate higher Treg counts that are due to higher tumour cell density. Also, higher density of Tregs may occur in particular areas of the tumour, a phenomenon that epigenetic studies cannot pick up. Biologically, Treg counts were conducted by counting cells that express Foxp3. It is now known that the specificity of Foxp3 expression to Tregs is limited, since activated effector T cells also express Foxp3, while these cells do not have an accessible FOXP3 TSDR. It is thus feasible that a low number of activated effector T cells correlates to improved outcome.
Taken together, the inventors believe that epigenetic immuno-phenotyping according to the present invention is a novel method that may add new comparability and ease to immuno-phenotyping in blood and, possibly more importantly, in solid tissues. Due to its relative robustness, this may also be the better method for clinical routine measurements, where requirements for good laboratory practice is difficult to achieve. However, before this technology can readily be applied to address immunological questions in solid tissues and blood, a major and orchestrated effort to discover novel and specific immune cell markers equivalent to CD3 or FOXP3 and standardizable as shown on GAPDH according to the present invention to provide a widely comparable standard system is imperative. Biologically, the data presented here firmly establish an outstanding role of CD3 and, in particular, Tregs in tumour development. Despite the doubtful prognostic significance of Treg, the present inventors argue that the dramatic Treg increase in all tumours is not an unfortunate bystander effect, but instead is a required prerequisite for successful tumour establishment. The inventors thus suggest that Treg ought to become a prime target for anti-tumour strategies, although such drugs should probably be designed to act not necessarily systematic but rather specific in the tumour microenvironment. All current data support an important role of CD3 cells in anti-tumour responses. However, the inventors regard the association of Tregs to CD3 cell counts as a severe inhibition to the efficacy of T cell based drugs, since an unselected homing of CD3 cells without an inhibition of Treg homing threatens to always co-attract activator and its suppressor alike.
In the context of the present invention, it could thus further be shown that promoters, promoter-proximal regions or coding sequences of particular genes show a cell type- or state-specific chromatin structure. The inventors revealed gene regions that behave principally similar to what is stated above for T cell differentiation and the immune status. In particular, a number of fourteen gene regions have been discovered that are sensitive indicators of cell type/status depending on their degree of chromatin accessibility as exemplified by their resulting accessibility to conversion of cytosine to uracil during bisulfite treatment. The accessibility translated into an epigenetic epiphenotype of these regions that are found to be variable between cell types, but maintain a consistent phenotype within cell types. They have been validated on multiple samples of each cell type and will serve as marker regions that can be specifically targeted in assays to determine cell identity, status, purity and potency. Furthermore, they are indicators of essential changes in cellular character and thus are useful in tissue engineering and for monitoring compounds that effect such changes. The inventors describe a series of genetic regions as described herein whose accessibility of chromatin and epigenetic (e.g. methylation) patterns are statistically correlated to the type or status of a cell. As such, these regions represent particularly sensitive and useful marker regions for distinguishing cell type.
SEQ ID No 1 shows the DNA sequence of the insert as cloned into the plasmid as used in the examples.
SEQ ID No 2 to 15 show sequences of the primers as used in the experiments that are specific for bisulfite-converted DNA.
SEQ ID No 16 to 22 show sequences of the amplicons as analyzed in FIG. 5.
SEQ ID No 23 to 29 show sequences of the regions of interest which also can be analyzed in the context of the invention. The ROIs correspond to AMP1583, AMP1584, AMP1588, AMP1589, AMP1594, AMP1599, and AMP1601, respectively.