The present disclosure is related to an image classifying apparatus, an image classifying method, and an image classifying program that classifies three dimensional images into a plurality of classes, by employing a neural network in which a plurality of processing layers are connected in a hierarchical manner.
Recently, high quality three dimensional images having high resolution are being employed for image diagnosis, accompanying advancements in medical technology (multiple detector CT (Computed Tomography), for example). Here, a three dimensional image is constituted by a great number of two dimensional images, and the amount of data is large. As a result, there are cases in which some time is required for a physician to find a desired portion for observation to perform diagnosis. Therefore, the visibility of organs as a whole or lesions is increased, to improve the efficiency of diagnosis, by specifying organs of interest, and then employing a method such as the MIP (Maximum Intensity Projection) method and the MinIP (Minimum Intensity Projection) method to extract organs of interest from a three dimensional image that includes the organs of interest and to perform MIP display. Alternatively, the visibility of organs as a whole or lesions is increased, by performing VR (Volume Rendering) display of a three dimensional image, to improve the efficiency of diagnosis.
In addition, in the case that a three dimensional image is VR displayed, structures of interest, such as organs, tissues, and structures, are extracted, and a color (R, G, and B) and an opacity is set for the signal value of each pixel, according to the signal value at each pixel position within the three dimensional image of the extracted structures. In this case, color templates, in which colors and opacities are set according to portions of interest, may be prepared, and a desired color template may be selected according to the portion. If such a configuration is adopted, a portion of interest can be visualized within a VR (Volume Rendering) image.
In addition, it is necessary to detect a structure within a three dimensional image, in order to extract the structure from the three dimensional image. Here, a calculation processing apparatus that executes calculations employing a neural network, which is constructed by hierarchically connecting a plurality of processing layers for classifying pixels of interest within an image into a plurality of classes, has been proposed. Particularly, a CNN (Convolutional Neural Network) has been proposed for use in calculation processing apparatuses that classify two dimensional images into a plurality of classes (refer to Japanese Unexamined Patent Publication Nos. 2015-215837 and 2016-006626).
In a convolutional neural network, a plurality of pieces of different calculation result data which are obtained with respect to input data by a previous level, that is, feature extraction result data, undergo a convoluting calculation process employing various kernels in a convoluting layer. Feature data obtained thereby are further pooled by a pooling layer, to decrease the amount of feature data. Then, the pooled processing result data undergo further calculation processes by subsequent processing layers, to improve the recognition rates of features, and the input data can be classified into a plurality of classes.
For example in a convolutional neural network that classifies two dimensional images into a plurality of classes, a convoluting process that employs various types of kernels is administered on input images at a convoluting layer, and a feature map constituted by feature data obtained by the convoluting process is pooled at a pooling layer. Further calculations are performed on a feature map which is obtained by pooling in subsequent processing layers following the pooling layer, to classify pixels of a processing target within the input images. Here, pooling has the effects of decreasing the amount of data, absorbing differences in geometric data within a target region, and obtaining the features of the target region in a robust matter. Specifically, calculating the maximum value, the minimum value, or the mean value of four pixels within 2 by 2 pixel regions within the feature map obtained by the convoluting process may be the pooling process.
Applying such a convolutional neural network to the aforementioned three dimensional images to classify the three dimensional images into a plurality of classes may be considered. For example, a convolutional neural network may learn to classify pixels of a processing target within a three dimensional image, which is employed as input, into a class that belongs to a structure of interest and a class that belongs to structures other than the structure of interest, when detecting the structure of interest within the three dimensional image. Thereby, it will become possible to accurately classify target pixels within the input three dimensional image as those of a structure of interest and those of other structures.