Lung cancer is the leading cause of cancer death worldwide and NSCLC accounts for nearly 80% of the disease (1). Based on cell morphology, adenocarcinoma and squamous are the most common types of NSCLC (2). Although the clinical courses of these tumors are similar, adenocarcinomas are characterized by peripheral location in the lung and often have activating mutations in the K-ras oncogene (3, 4). In contrast, squamous cell carcinomas are usually centrally located and more frequently carry p53 gene mutations (5). Furthermore, the etiology of squamous cell carcinoma is closely associated with tobacco smoking while the cause of adenocarcinoma remains unclear (6, 7). Although many molecular changes associated with NSCLC have been reported (8, 9), the global gene expression pattern associated with these two most common types of lung cancer has not be described. Understanding gene expression patterns in these major tumor types will uncover novel markers for disease detection as well as potential targets for rational therapy of lung cancer.
Several technologies are currently being utilized for gene expression profiling in human cancer (10). SAGE (11) is an open system that rapidly identifies any expressed transcript in a tissue of interest, including transcripts that had not been identified. This highly quantitative method can accurately identify the degree of expression for each transcript. Comparing SAGE profiles between the tumor and the corresponding normal tissues can readily identify genes differentially expressed in the two populations. Using this method, novel transcripts and molecular pathways have been discovered (12-14). In contrast, cDNA arrays represent a closed system that analyze relative expression levels of previously known genes or transcripts (15, 16). Because many thousands of genes can be placed on a single membrane or slide for rapid screening, studies have recently demonstrated molecular profiles of several human cancers (17-20).
Hierarchical clustering is a systematic method widely used in cDNA array data analysis where the difference between the expression patterns of many genes is generally within a few fold (21). We reasoned that because SAGE is highly quantitative, hierarchical clustering might be used to organize gene expression data generated by SAGE from just a few tissue libraries. To test this, SAGE tags from two of each libraries derived from primary adenocarcinomas, primary squamous cell carcinomas, normal lung small airway epithelial cells (SAEC), or normal bronchial/tracheal epithelial (NHBE) cells, and a lung adenocarcinoma cell line were used. SAGE tags showing the highest abundance were subjected to clustering analysis. Although each library was derived from a different individual, normal and tumor samples clustered in two separate branches while tissues of different cell types clustered together. Furthermore, SAGE tags clustered into biologically meaningful groups revealing the important molecular characteristics of these two most common NSCLC subtypes.