Many thousands of women die needlessly each year from breast cancer, a cancer from which there is theoretically a high probability of survival if detected sufficiently early. If the presence of cancerous tissue is missed in a sample, then, by the time the next test is undertaken, the cancer may have progressed and the chance of survival significantly reduced. The importance of detecting cancerous tissue in the samples can therefore not be over-emphasised.
A typical national breast screening programme uses mammography for the early detection of impalpable lesions. Once a lesion indicative of breast cancer is detected, then tissue samples are taken and examined by a trained histopathologist to establish a diagnosis and prognosis. This is a time consuming, labour intensive and expensive process. Qualification to perform such examination is not easy to obtain and requires frequent review. The examination itself requires the interpretation of colour images by eye, a highly subjective process characterised by considerable variations in both inter, and intra-observer analysis, ie. variances in observation may occur for the same sample by different histopathologists, and by the same histopathologist at different times. For example, studies have shown that two different histopathologists examining the same ten samples may give different opinions on three of them, an error of 30%. This problem is exacerbated by the complexity of some samples, especially in marginal cases where there may not be a definitive conclusion. If sufficient trained staff are not available this impacts upon pressures to complete the analysis, potentially leading to erroneous assessments and delays in diagnosis.
These problems mean that there are practical limitations on the extent and effectiveness of screening for breast cancer with the consequence that some women are not being correctly identified as having the disease and, on some occasions, this failure may result in premature death. Conversely, others are being incorrectly diagnosed with breast cancer and are therefore undergoing potentially traumatic treatment unnecessarily.
It is thus an aim of the invention to provide a method of image analysis which can be embodied in a robust, objective and cost-effective tool to assist in the diagnosis and prognosis of breast cancer, although as previously indicated the invention may also find application in other fields.
To aid in the understanding of this aim reference is made to the accompanying FIG. 1 which is a simplified representation of the kinds of objects which typically appear in a histological slide of breast tissue. Tubule formations are present comprising ducts such as indicated at 1 surrounded by epithelial layers 2. The ducts appear as small, bright regions of various shapes while the epithelial cells appear substantially more textured and darker. Fat cells such as indicated at 3 appear of similar intensity to the ducts 1 but are generally substantially larger. Elongate regions of similar intensity to the ducts 1 and fat cells 3 may also be present, such as indicated at 4, and are characteristic of tears in the tissue or cracks due to shrinkage. The remainder of the slide comprises “background” tissue 5 which generally appears darker than the ducts 1, fat cells 3 and tears/cracks 4 but lighter and more uniform in texture than the epithelial cells 2. Healthy tissue should contain a significant number of tubule formations comprising ducts usually having a boundary of two epithelial cells. In cancerous tissue the tubules tend to break down and epithelial cells proliferate so the area ratio between these structures in any given sample can be used as an indication of the presence and severity of cancer. More particularly, histopathologists conventionally make a subjective assessment of a metric M, given by:
                    M        =                  T                      D            +            E                                              (        1        )            where T is the surface area in the slide covered by tubule formations (the ducts plus boundary of two epithelial cells), D is the surface area covered by ducts and E is the surface area covered by all epithelial cells (including those in T), and relate their assessment of the value of this metric to a grade of cancer using thresholds typically as follows:
TABLE 1Histopathologist thresholds for cancer severityMetric valueCancer grade≧75%Grade 1≧10%, <75%Grade 2 <10%Grade 3where Grade 1 is the least serious and Grade 3 is the most serious.
If an objective assessment of the same or a similar metric is to be achieved through an automated method of image analysis it is necessary to distinguish inter alia those objects in an image which comprise epithelial cells and in one aspect the invention accordingly resides in a method for the automated analysis of a digital image comprising an array of pixels which includes the steps of: generating a property co-occurrence matrix (PCM) from some or all of said pixels, using the properties of local mean and local standard deviation of intensity in neighbourhoods of the selected pixels; and segmenting the image by labelling the selected pixels as belonging to specified classes consequent upon analysis of said PCM.
The invention also resides in apparatus for the automated analysis of a digital image comprising means to perform the foregoing method and in a computer program product comprising a computer readable medium having thereon computer program code means adapted to cause a computer to execute the foregoing method and in a computer program comprising instructions so to do.
Property co-occurrence matrices (PCMs) are described e.g. in Electronics and Communication Engineering Journal, pp 71-83, Vol 5, No 2, 1993 (Co-occurrence Matrices for Image Analysis, J F Haddon and J F Boyce), and are an extension or generalisation to the standard grey level co-occurrence matrices described e.g. in IEEE Trans. Syst., Man, Cybern., Vol SMC-3, pp 610-621, 1973 (Texture Features for Image Classification, R M Haralick, K Shanmugan and I Dinstein). They are multidimensional histograms in which each element is the frequency with which selected properties co-occur. By generating a PCM using the properties of local mean and local standard deviation of intensity in neighbourhoods of image pixels, analysis of such a PCM can thus distinguish pixels contributing to regions of, say, relatively low local mean and relatively high local standard deviation (such as the dark, textured regions representing epithelial cells in the preferred implementation of this aspect of the invention) and pixels contributing to regions of, say, relatively high local mean and relatively low local standard deviation (such as the lighter, more uniform regions representing “background” tissue in the preferred implementation of this aspect of this invention), or to regions of other combinations of those properties in other applications of the invention.
These and other aspects of the invention will now be more particularly described, by way of example, with reference to the accompanying drawings and in the context of an automated system for grading cancer on the basis of tubule formations in digital images of histological slides of potential carcinomas of the breast.