Advances in computed tomography (CT) allow early detection of cancer, in particular lung cancer which is one of the most common cancers. As a result, there is increased focus on using regular low-dose CT screenings to ensure early detection of the disease with improved chances of success of the following treatment. This increased focus leads to an increased workload for professionals such as radiologists who have to analyze the CT screenings.
To cope with the increased workload, computer-aided detection (CADe) and computer-aided diagnosis (CADx) systems are being developed. Hereafter both types of systems will be referred to as CAD systems. CAD systems can detect lesions (e.g. nodules) and subsequently classify them as malignant or benign. A classification need not be binary, it can also include a stage of the cancer. Usually, a classification is accompanied with a confidence value as calculated by the CAD system.
CAD systems typically follow a number of general steps. First, the input imaging data is segmented, for example to distinguish lung tissue from the background signal. Then, regions of interest are identified, for example all lung tissue with nodule-like forms in them. For each region of interest a number of input values is calculated, the so-called feature vector. This feature vector is used as input in a decision function, which projects the feature vector to a classification.
Hereafter the term “model” will be used to indicate a computational framework for performing one or more of a segmentation and a classification of imaging data. The segmentation, identification of regions of interest, and/or the classification may involve the use of a machine learning (ML) algorithm. The model comprises at least one decision function, which may be based on a machine learning algorithm, which projects the input to an output. For example, a decision function may project a feature vector to a classification outcome.
A problem with CAD systems based on a ML algorithm is that the system can only function well if sufficient data is available for training. In many cases, academics working on such systems lack sufficient training data. This hampers deployment and acceptance of CAD systems.