An image segmentation system is a necessary part in many image analysis and processing systems. If an image is described as pixels arranged in a matrix, the function of an image segmentation system is to classify these pixels. The number of categories is set as needed. For example, software that recognizes human faces often needs to segment the human face, firstly to distinguish the pixels belonging to an anatomical region (foreground area) and the pixels belonging to a non-anatomical area (background area). And software that identifies natural landscape photos often needs to segment the image into different regions of the sky, mountains, rivers, and animals, etc.
Image segmentation systems are not only used in everyday life, but also have important applications in many areas, including maritime, military, meteorological, aerospace and medical fields. In the medical field, for example, the diagnosis system of cardiovascular disease first segments the vascular tissue; and the lung disease diagnosis system first segments lung trachea, pulmonary blood vessels, and the potential lung nodules. Accurate segmentation facilitates three-dimensional model reconstruction and visualization to assist physicians in the judgment, and is the fundamental guarantee for the accuracy of subsequent quantitative analysis of important clinical parameters such as size, shape, pixel statistics, and so on. Also for example, in the field of aerospace, the analysis system of the sky image first segments the image to distinguish the area of the star, planets and galaxies from the background area, and the analysis system of the atmospheric satellite remote sensing image needs to segment the clouds, land, waters and other areas. Regardless the application, the accuracy is an important indicator in the design of these segmentation systems, and another important indicator is the speed.
In order to obtain higher accuracy, the newly developed segmentation methods are equipped with data-driven methods based on machine learning. In such a system, the developer will deliver the pre-annotated image segmentation and the original image as training samples into the system together, calculate the statistical model among the data to find the rule, and complete the segmentation of the test image based on the learned rules.
In many machine learning methods, neural networks (i.e., deep learning) methods are applied in more and more image processing algorithms in recent years, because of their excellent performance. Among them, the convolution-based neural network (referred to as “convolutional neural network”) method is particularly prominent. Neural network is a special computing network structure, which consists of multiple layers of computing units, wherein numerical values of an upper layer calculation unit are weighted and superposed, and then transferred to the next layer through a non-linear activation function. FIG. 1(a) shows a fully connected neural network with a three-layer structure; FIG. 1(b) shows a convolutional neural network, unlike a fully connected neural network, the connections in the convolutional neural network are relatively sparse, each calculation unit is connected only with computing units spatially adjacent to it in an upper layer, and the weights (a1, a2, a3, b1, b2, b3) for the connections are shared among different calculation units. The parameters required to be trained for convolutional neural network are significantly reduced compared to the fully connected neural network, and the training are much less difficult. At the same time, such a structure also conforms to the needs of image processing. In the traditional image processing method, the convolutional operations are often used to extract features such as edge, average brightness, etc., as shown in FIG. 1 (d). A specific convolution kernel is employed for detecting edges. Convolutional neural networks also use a similar principle, as shown in FIG. 1 (c). The difference is that a convolution kernel of a convolutional neural network is obtained through training by means of machine learning, and it can describe image features such as rounds, polygons, and even irregular shapes and the like by means of superposition of multi-layers of convolutional operations.
Convolutional neural networks are widely applied into image classification tasks. As shown in FIG. 1 (e), such a network is mostly composed of two parts: in the first part, the image is subject to a multi-layer convolution network and a maximum down-sampling operation to extract features. In the second part, the extracted features will be used to generate the final classification results via the fully connection layer. To implement the image processing tasks, in the general method, the target pixel is set as the center, a fixed sized image is extracted from the surrounding area of the center, and then the fixed sized image is classified. However, this method has significant drawbacks: the input image must be of a specific size due to the presence of the fully connection layer; the amount of computation required to perform a separate calculation for all the pixels is extremely large, and the same convolutional operations will be repeated in an area in which the images of the surrounding regions of the adjacent pixels overlap each other; in addition, since classification is performed for the surrounding regions of the fixed size, the convolutional neural network is usually used for recognition of a region (e.g., a human face region), rather than segmentation on a pixel level.
Recently, the fully convolutional neural networks have also been applied to image segmentation tasks. As shown in FIG. 1 (f), in the fully convolutional neural network for the segmentation system, the entire original image is directly input to the network for convolution and down-sampling operations to extract the features. In order to ensure that the size of segmented image, which is finally outputted, is consistent with that of the input image, de-convolutional operations and/or up-sampling operations are added in the downstream part of the network. When the final output is generated, the convolution kernel (convolutional layer 4) of size 1 is used to replace the fully connection layer. Different from the traditional machine learning methods that require manual intervention during pre-processing methods, feature extraction, and post-processing and require manually selecting a variety of segmentation parameters including threshold, the fully convolutional neural network is an end-to-end solution. That is, the input is the original image, and the output is a segmented image. Once the structure of the neural network is determined, all the rest of the process is automatically optimized by computational process, without the need for more manual intervention.
A fully convolutional neural network has at least the following advantages compared to conventional convolutional neural networks: (1) highly generalized model, the same system may be used for different segmentation tasks by adjusting the training samples and re-training; (2) high computational efficiency: eliminating the redundant computing operation in the overlapping area compared to the conventional convolutional neural network; (3) flexible image size: different from the traditional deep learning methods, the fully convolutional neural network does not require fully connection layers, and thus a fixed sized image is unnecessary; (4) short development cycle.
However, a fully convolutional neural network is computationally complex. Due to the need for a large number of convolution calculations, the requirement for the memory in the whole calculation process and the amount of calculation increases in a geometric progression with the increment of the image size. For example, for the processing of a three-dimensional CT image of normal size and a thin-slice, even with the top graphics card accelerator (GPU), the operation time is often still up to tens of minutes or even hours. This greatly limits the practical application of such methods in a variety of fields including medical images (especially three-dimensional images) or the like that have strict requirements on operation time and/or limited computational resources.
The distribution of the segmentation objects is often relatively sparse in a large proportion of an image. This disclosure provides a method and system based on an optimized fully convolutional neural network, which can complete the image segmentation task in a quick, efficient and accurate manner.