Glioblastoma is the most predominant and most aggressive malignant brain tumor in humans, accounting for 52% of all brain tumor cases and 20% of all intracranial tumors. Meningioma, on the other hand, although benign, accounts for more than 35% of primary brain tumors in the United States, and occurs in approximately 7 of every 100,000 people, with an approximate 5 year survival timeline of the diagnosed patient. Optimal surgical resection is primarily based on accurate detection of tumor tissue during the resection procedure.
Recently, Confocal Laser Endomicroscopy (CLE) has emerged as a promising in-vivo imaging technology that allows real-time examination of body tissues on a scale that was previously only possible on histologic slices. Neurosurgeons could now use CLE as a surgical guidance tool for brain tumors. However, as a manual examination task, this can be highly time-consuming and error-prone. Thus, there has been an increasing demand in employing computer vision techniques for brain tumor tissue typing and pathology in the CLE probing process.
Tissues affected by Glioblastoma and Meningioma, are usually characterized by sharp granular and smooth homogeneous patterns, respectively. However, the low resolution of current CLE imaging systems, coupled with the presence of both kinds of patterns in the probing area, makes it extremely challenging for common image classification algorithms. Besides the great variability between images from the same tumor class, the differences between the two classes of tumors are not clearly evident when both granular and homogeneous patterns are present in the image.
CLE technology itself being at a nascent stage, there are only a handful of research efforts that address automatic analysis of imagery under this modality. Most prior works in this direction adapt a generic image classification technique based on bag-of-visual words to perform this task. Within this technique, first images containing different tumors are collected and low-level features (characteristic property of an image patch) are extracted from them as part of the training step. From all images in the training set, representative features (also known as visual words) are then obtained using a vocabulary learning method usually either unsupervised clustering or by a supervised dictionary learning technique. After that, each of the collected training images is represented in a unified manner as a bag or collection of visual words in the vocabulary. This is followed by training a classifier to use the unified representation of each image. Given an unlabeled image, features are extracted and the image in turn is represented in terms of already learned visual words. Finally, the representation is input to a pre-trained classifier, which predicts the label of the given image based on its similarity with pre-observed training images.
While conventional cell classification procedures provide adequate results for some cases, they are limited in some applications. For example, because different kinds of brain tumors are characterized by different textures, it is practically impossible to use one universal feature space that is discriminative for a given tumor class. Thus, it is desired to provide a way to combine feature spaces to capture these salient attributes from different tumor classes more effectively.