The leading preventable cause of death and disability in the United States is the chronic use of tobacco products, in particular, cigarettes. In addition to lung cancers, tobacco use plays important direct and indirect roles in the etiology of a wide range of other cancers, including those of the upper aerodigestive tract (e.g., oral cavity, pharynx, larynx, and esophagus), kidney, stomach, bladder, pancreas, uterine cervix, and blood (e.g., certain leukemias). Exposure to tobacco carcinogens and toxins is also a major cause of other diseases of the pulmonary system (e.g., bronchitis, emphysema, chronic obstructive pulmonary disease), the cardiovascular system (e.g., stroke, atherosclerosis, and myocardial infarction), and the female reproductive system (e.g., increased risk of miscarriage, premature delivery, low birth weight, stillbirth, and infant death). While numerous studies have elucidated some of the chemical and biological properties of cigarette smoke that result in its ability to induce this range of pathologies in the smoker, little is known about the nature and temporal association of molecular events that drive specific stages in the multi-step processes that result in clinically evident disease. This is due, in part, to the limited number of individual tobacco constituents such as benzo[a]pyrene that have been assessed for genetic impact, and the fact that few studies have attempted to address the synergistic relationships between the thousands of individual compounds that constitute the various classes of carcinogens in the vapor and particulate phases of tobacco smoke on gene expression.
Cigarette smoke is primarily a mixture of gases (e.g., nitrogen, oxygen, and carbon dioxide) and suspended particulate material that consists of a wide variety of condensed organic compounds (e.g., ‘tar’). This particulate phase contains the majority of compounds [at least 60] for which there is sufficient evidence of carcinogenic potential in animals or human. Presumably, the inherent chemical complexity of cigarette smoke results in an equally complex biological response involving a number of signaling pathways and checkpoints that respond to the direct and indirect stress on the genome in exposed tissues.
There are many available approaches to analyze gene expression after cells are exposed to toxicants. Analysis of gene expression after exposure to cigarette smoke is nontrivial, however, due to the complexity and size of data sets and the fact that technical variation can be introduced at different stages of analysis. Establishing well-specified and carefully validated procedures for standardization and normalization of the data from individual specimens is very important. Selection criteria based on the ratio of measured expression levels fails to account for intra-group variations (e.g., normal biologic variance) and can result in false positive selections, for example. (See Dozmorov et al., J Gerontol A Biol Sci Med Sci 57: B99-108, 2002; Kerr et al., J Comput Biol 7: 819-837, 2000, each of which are expressly incorporated by reference in their entirety).
Many available statistical methods also do not adequately address the mutually exclusive characteristics of sensitivity and specificity. The common practice of using low thresholds for selection of significance (p<0.05) can result in a large number of false positive selections, for example. This is especially problematic for high-density array analysis as the number of false positive selections expected to occur by chance may limit the ability to perform higher order analyses, such as that required to identify molecular pathways that contribute to disease or disease sub-phenotyping, which require the accurate prediction of groups of differentially expressed genes. Attempts to increase stringency by raising the threshold of significance above this value can also be problematic, as it will cause a compensatory decrease in sensitivity and a resultant increase in false negative selections. The use of large numbers of replicates is able improve the analysis, however, this approach is expensive and labor intensive.
Hypervariable analysis (HV), which uses statistical robust delimiters for defining biologically-relevant changes in gene expression, can also be used to analyze cells after exposure to a toxicant. (See Dozmorov et al., Physiol Genomics 12: 239-250, 2003; and Glynne et al., Curr Opin Immunol 12: 210-214, 2000, each of which are expressly incorporated by reference in their entirety). Hypervariable analysis is predicated on the observation that a biologically relevant stimulus will alter gene expression such that homeostasis of the transcriptome is disrupted. Accordingly, these stimuli will modulate the levels of mRNAs of affected genes such that their expression variance over time exceeds the variance observed in the majority of genes in an unstimulated state. Using HV analysis, relatively small biologically relevant changes in gene expression can be identified. Despite current advances in gene expression analysis, there remains a need to identify the genetic events and molecular pathways induced by exposure to tobacco (e.g., cigarette) smoke and tobacco (e.g., cigarette) smoke condensates.