Advances in computed tomography (CT) allow early detection of cancer, in particular lung cancer which is one of the most common cancers. As a result, there is increased focus on using regular low-dose CT screenings to ensure early detection of the disease with improved chances of success of the following treatment. This increased focus leads to an increased workload for professionals such as radiologists who have to analyze the CT screenings.
To cope with the increased workload, computer-aided detection (CADe) and computer-aided diagnosis (CADx) systems are being developed. Hereafter both types of systems will be referred to as CAD systems. CAD systems can detect lesions (e.g. nodules) and subsequently classify them as malignant or benign. A classification need not be binary, it can also include a stage of the cancer. Usually, a classification is accompanied with a confidence value as calculated by the CAD system.
CAD systems typically follow a number of general steps. In an optional first step, the input imaging data is segmented, for example to distinguish lung tissue from the background signal. Then, regions of interest are identified, for example all lung tissue with nodule-like forms in them. It is also possibly to simply examine every data point, without a pre-selection of region of interest. For a selected data point a number of input values is calculated, the so-called feature vector. This feature vector is used as input in a decision function, which projects the feature vector to a classification.
Hereafter the term “model” will be used to indicate a computational framework for performing one or more of a segmentation and a classification of imaging data. The segmentation, identification of regions of interest, and/or the classification may involve the use of a machine learning (ML) algorithm. The model comprises at least one decision function, which may be based on a machine learning algorithm, which projects the input to an output. For example, a decision function may project a feature vector to a classification outcome. Where the term machine learning is used, this also includes further developments such as deep (machine) learning and hierarchical learning.
An example of a suitable model is the convolutional neural network (CNN), which is primarily used in computer vision fields. For the case of two dimensional (2D) images, 2D CNN has been widely used in many applications. The principles of 2D CNNs can, however, also be extended to process three dimensional (3D) images such as the earlier mentioned medical imaging data.
Whichever type of model is used, suitable training data needs to be available to train the model. In many applications, there is not enough training data available or the available data is not fully representative of the problem field. For example, in the case of training data to detect nodules there may not be enough samples of a particular type of nodule, leading to a trained model which is not capable to reliably detect that type of nodule.