In recent years, fluorescence in-situ hybridization (FISH) has emerged as one of the most significant new developments in the analysis of human chromosomes. FISH offers numerous advantages compared with conventional cytogenetics techniques because it allows detection of numerical chromosome abnormalities during normal cell interphase.
Historically, FISH and other in situ hybridization results played a primary role in mapping genes on human chromosomes. Results from these experiments were collected and compiled in databases, and this information proved useful during the annotation phase of the Human Genome Project (HGP). Now that the HGP is complete, investigators rarely use in situ hybridization simply to identify the chromosomal location of a human gene. (In species for which the genome has not been sequenced, however, FISH and related in situ hybridization methods continue to provide important data for mapping the positions of genes on chromosomes.) Currently, human FISH applications are principally directed toward clinical diagnoses. Biomedical applications include identification of genetic bases for developmental disabilities, diagnosis and prognosis of diseases such as cancer, identification of pathogens, deduction of evolutionary relationships, and microbial ecology.
The basic elements of FISH are a DNA probe and a target sequence. Before hybridization, the DNA probe is labeled by various means, such as nick translation, random primed labeling, and PCR. Two labeling strategies are commonly used: indirect labeling and direct labeling. For indirect labeling, probes are labeled with modified nucleotides that contain a hapten, whereas direct labeling uses nucleotides that have been directly modified to contain a fluorophore. The labeled probe and the target DNA are denatured. Combining the denatured probe and target allows the annealing of complementary DNA sequences. If the probe has been labeled indirectly, an extra step is required for visualization of the nonfluorescent hapten that uses an enzymatic or immunological detection system. Whereas FISH is faster with directly labeled probes, indirect labeling offers the advantage of signal amplification by using several layers of antibodies, and it might therefore produce a signal that is brighter compared with background levels.
An important application of FISH is dot counting, i.e., the enumeration of signals (dots) within the nuclei, where the dots in the image represent the inspected chromosomes and, more particularly, the locations at which hybridization with one or more labeled probes has occurred. Dot counting is used for diagnosing numerical chromosomal aberrations, for example, in haematopoietic neoplasia, solid tumors and prenatal diagnosis.
A major limitation of the FISH technique for dot counting is the need to examine large numbers of cells. The large numbers are required for an accurate estimation of the distribution of chromosomes over the cell population, especially in applications involving a relatively low frequency of abnormal cells. Visual evaluation by a trained cytogeneticist of large numbers of cells and enumeration of hybridization signals is tedious, time consuming and expensive. Ideally, the analysis process could be expedited by automating the procedure. Unfortunately, there are many obstacles to be overcome before automated analysis of FISH images can be implemented on a widespread basis. Because signals are distributed in three-dimensions within the nucleus, valid signals can be missed. Additional obstacles include that the cells are not usually well defined, there are no clear boundaries between cells, overlapping of cells is common, and cells are frequently only partially visible in an image.
A neural network was proposed by B. Lerner, et al. for distinguishing between real signals and artifacts resulting from noise and out-of-focus object within FISH images. The neural network is trained to discriminate between in and out-of focus images. Images that contain no artifacts are the in-focus images selected for dot count proportion estimation. This assay emphasizes on classification of real signals and artifacts. It does not address the problems of further analysis, such as separating overlapping nuclei or dot counting. (“Feature representation and signal classification in fluorescence in-situ hybridization image analysis”, IEEE Trans. on Systems, Man and Cybernetics A, 31, pp. 655-665 (2001). David and Lerner have applied support vector machines (SVMs) for analysis of FISH signals (“Signal discrimination using a support vector machine for genetic syndrome diagnosis”, in 17th International Conf. on Pattern Recognition (ICPR2004), 2004, 23-26 Aug., Cambridge, UK. In this work, the authors used SVM classification to separate real signals from artifacts and red signals from green signals. Overlap among the simple binary classes presented a problem that significantly reduced accuracy. An extension of the prior study was reported in “Support vector machine-based image classification for genetic syndrome diagnosis”, Pattern Recognition Letters 26 (2005) pp. 1029-1038.) As before, overlap of classes resulted in rejection of patterns which may have been important for accurate classification. The reduction of the available patterns for classification leads to a small and heavily imbalanced database. This imbalance produces biased training in which majority classes dominate the decision boundaries, further reducing the prediction accuracy of the overall method. Accordingly, the need remains for a method and system for accurate automated computer classification of FISH data.