1. Field of the Invention:
The invention relates generally to a method and system for computerized assessment of breast cancer risk.
Breast cancer risk assessment provides an opportunity to devise appropriate surveillance plans that may include enhanced screening for women at increased risk of breast cancer. Computerized analysis of mammographic parenchymal patterns may provide an objective and quantitative characterization and classification of these patterns, which may be associated with breast cancer risk. Computerized assessment of breast cancer risk based on the analysis of mammograms alone or combined with epidemiologic risk factors (for example, age) may serve as an alternative to current existing clinical methods, which are costly and/or information-dependent, in predicting breast cancer risk.
2. Discussion of the Background:
The breast is composed primarily of two components, fibroglandular tissue and fatty tissue. The average breast consists of 50% fibroglandular tissue and 50% fat. Fibroglandular tissue is a mixture of fibrous connective tissue and the glandular epithelial cells that line the ducts of the breast (the parenchyma). The major breast diseases develop from the terminal ductal lobular units of the breast, and arise predominantly from the epithelial cells that line the ducts; however, the fibrous or connective tissue can also be involved. It is thought by most experts that malignant breast disease develops through a process that starts with epithelial hyperplasia, i.e., an increase in the number of epithelial cells. Epithelial hyperplasia can progress to atypical hyperplasia in which the epithelial cells not only increase in number, but also change in a way that is not normal for these cells. The process, at this stage, is believed to be reversible. Once a certain criterion level of atypia is reached, the diagnosis of carcinoma-in-situ can be made, in which there is no invasion of malignant cells outside of the duct. The process of malignant transformation is considered irreversible at this stage. In the last phase of development, the cancer cells break out of the ductal walls and invade the surrounding stromal tissue, and at this point the disease is called infiltrating or invasive carcinoma. Most (80%-85%) breast carcinomas can be seen on a mammogram as a mass, a cluster of tiny calcifications, or a combination of both. Other mammographic abnormalities are of lesser specificity and prevalence than masses and/or calcifications, and include skin or nipple changes, abnormalities in the axilla, asymmetric density, and architectural distortion.
Early detection of breast cancer can improve survival rates. The overall five-year survival rate for women diagnosed with breast cancer is 84%, but when found at a small, localized stage, the 5-year survival rate is 97% [1]. Studies show that use of screening mammography can reduce lesion size and stage at detection, improving the prognosis for survival. Currently, mammography is a well-established imaging technique for early detection of breast cancer. Annual screening mammography is recommended by the American Cancer Society for all women over the age of 40 [1].
Clinical acquisition of x-ray mammograms is a rather complicated procedure and requires specific techniques in order to obtain high quality images. Attenuation differences between various structures within the breast contribute to image contrast. Due to the similar composition of breast structures and the physical manifestations of breast carcinoma, screen-film mammographic imaging must be substantially different from general radiographic imaging. Low-energy x-rays are required to enhance the ability to differentiate between normal tissues and carcinoma. The radiological appearance of the breast varies between individuals because of variations in the relative amounts of fatty and fibroglandular tissue. Since fat has a lower effective atomic number than that of fibroglandular tissue, there is less x-ray attenuation in fatty tissue than in fibroglandular tissue. Fat appears dark (i.e., higher optical density) on a mammogram, while fibroglandular tissue appears light (i.e., lower optical density) on a mammogram. Regions of brightness associated with fibroglandular tissue are normally referred to as "mammographic density".
Screening mammography typically includes two standard radiographic projections, medio-lateral oblique (MLO) and cranio-caudal (CC), that are taken of each breast (right and left) for a total of four images. The purpose of these two views is to completely image the breasts and, if any lesions are present, allow localization and preliminary characterization.
Breast cancer risk assessment provides an opportunity to devise appropriate surveillance plans that may include enhanced screening for women at increased risk of breast cancer. Computerized analysis of mammographic parenchymal patterns may provide an objective and quantitative characterization and classification of these patterns, which may be associated with breast cancer risk. Computerized assessment of breast cancer risk based on the analysis of mammograms alone or combined with epidemiologic risk factors (for example, age) may serve as an alternative to current existing clinical methods, which are costly and/or information-dependent, in predicting breast cancer risk.
As the best method for early detection of breast cancer, annual screening mammography has been recommended for women over 40 years of age [1]. Mammographic surveillance for women under age 40 years who are at very high risk of developing breast cancer, however, still remains an issue, since the benefit of screening women in this age group has not been proven. Women at high risk of developing breast cancer tend to develop breast cancer at a younger age [2]. Identification and close follow-up of these high-risk women may provide an opportunity for early breast cancer detection. Thus, computerized methods that are capable of assessing breast cancer risk may allow women and their physicians to devise an individualized surveillance plan that may include enhanced screening for women at high risk for early detection of breast cancer. These plans may lead to improvements in the overall efficacy of screening mammography for early detection of breast cancer. Further, knowledge of which women are at high risk of developing breast cancer has important implications in the study of breast cancer.
There are two widely used methods to measure risk: relative risk and absolute risk [16]. Relative risk is defined as the ratio of age-specific breast cancer incidence rate among women with specific risk factors to the incidence rate among women without known risk factors. Relative risk estimates are useful for measuring the relative magnitude of effect of a given risk factor as a population risk. However, relative risk estimates do not directly approximate the underlying probability of a diagnosis of breast cancer for an individual over time.
Absolute risk (or cumulative risk) is defined as the probability that a woman with given risk factors and given age will develop breast cancer over a defined time period. Absolute risk estimates give women a realistic and individualized estimate of the chance of developing cancer over various time horizons. An assessment of cumulative risk over different periods of time can help a woman understand the extent of her risk and therefore, can be useful in helping the woman and her doctor define an acceptable surveillance plan for the future.
For decades, it has been known that all breast cancers are genetic, i.e., the development of breast cancer is the result of alteration of chromosomal DNA through mutation or damage with the resultant loss of normal growth regulation [5]. Sporadic breast cancer results from somatic changes that are specific to the tumor cells, i.e., the epithelial cells of the breast, which are not found in other cells of the patient. Recent molecular studies demonstrate that breast cancer may be inherited [2,6,7]. In a landmark article published in 1990, King et al. used genetic linkage analysis to identify a gene named BRCA1 (breast cancer 1), which was found to be responsible for the breast cancer diagnosed in women who inherited a mutated form of the BRCA1 gene in all cells (germline mutation) at birth. Since then, four other genes responsible for breast cancer, including the BRCA2 (breast cancer 2) gene, have been identified [8]. In general. hereditary breast cancer appears earlier than purely sporadic breast cancer, because among women with inherited susceptibility, one of the cancer-causing mutations is present from birth. Thus, fewer somatic mutations specific to breast cancer cells need to occur.
It is estimated that women who inherit a mutated form of the BRCA1 gene have as much as a 20% risk of developing breast cancer by age 40 years, a 33%-73% risk of developing breast cancer by age 50 years, and an 56%-87% risk of developing breast cancer by age 70 years [9,10], which is about up to 8 times higher than the lifetime risk for the general population. The recent isolation of BRCA1 and BRCA2, and the acknowledgment that additional breast cancer susceptibility genes may exist, provides a molecular basis for counseling some high-risk women.
Although the evidence of familial aggregation of breast cancer suggests that there is an important hereditary component, there are many families in which breast cancer (familial breast cancer) has appeared more than once purely by chance and not as the result of inherited susceptibility. Studies show that truly hereditary breast cancers accounts only for 5%-10% of all breast cancers [11,12], and most breast cancers occur sporadically and are likely the result of random events on the cellular level. In addition to age, many factors have been identified to be related to breast cancer risk. However, the basic mechanism underlying the association between breast cancer and these risk factors is not well understood. It has been recognized for some time that varying levels of endogenous and erogenous estrogens have been associated with the risk of developing breast cancer. Higher levels of endogenous hormones, in particular estrogens, are an important factor in the etiology of breast cancer [13,14].
Risk factors for breast cancer can be classified broadly as being of either personal or environmental origin. Personal risk includes aspects of individual biological histories, such as family history of breast cancer, reproductive history, menopausal status, and breast disease history. Environmental risk factors are exogenous influences, such as diet and exposure to environmental carcinogens. Table 1 lists factors identified on the basis of large epidemiologic studies and that have a strong or well-established association with breast cancer. [3]
TABLE 1 Selected factors for breast cancer risk. Factor Comparison Approximate Age group relative risk 40-44 Age 25-29 16 50-54 28 60-64 44 70-74 56 Western country Japan 5 Family history of breast cancer One affected first-degree relative No affected 1.4-3 first-degree relative Two or more affected first-degree 4-6 relatives Early age (30 yrs old) of onset in Age 50 2.6 affected relative Reproductive history Age at menarche, II Age 16 1.3 Age at first live birth 20-24 &lt;20 1.3 25-29 1.6 &gt;30, nulliparous 1.9 Age at menopause After 55 Age 45-55 1.5 Before 45 0.7 Evidence of breast pathology Any benign disease No biopsy 1.5 Proliferative disease or aspiration 2 Atypical hyperplasia 2-4 History of cancer in contralateral breast No history 5 of cancer Percent dense parenchyma on mammography 5%-24.9% &lt;5% dense 1.7 25%-44.9% regions 2.5 45%-64.9% 3.8 &gt;65% 4.3 Exposures Radiation, 100 rads No special 3 exposure Alcohol, two drinks/day Nondrinker 1.7
Among these risk factors, age has been identified as the single most important risk factor for the development of breast cancer in women. The incidence of breast cancer increases with age. Studies show that diagnosis of breast cancer is rare before age 25 years [15]. The incidence of breast cancer increases rapidly between the ages of 25 and 44. Near the age of menopause, the rate of increase in incidence for successive age groups is slower compared with the observations in premenopausal women. In addition to age, risk factors such as family history of breast cancer, personal history of breast cancer, biopsy-confirmed benign proliferative breast disease, and age at first live birth and at menarche have been identified and have been used in clinical risk prediction models [3,4,16] to estimate an individual's risk of developing breast cancer.
Increased mammographic density is another factor that has been found to be associated with an increased risk of breast cancer. It has been shown in several studies that women with increased mammographic parenchymal density are at a four- to six-fold higher risk over women with primarily fatty breasts [17-22]. At present, the reason for this increased risk is unclear. One possibility is that increased density reflects a larger amount of tissue at risk for developing breast cancer. Since most breast cancers develop from the epithelial cells that line the ducts of the breast, having more of this tissue as reflected by increased mammographic density may increase the chances of developing breast cancer.
In the inventors' study, the Gail and the Claus models were used to estimate individual risk over a woman's lifetime (up to 79 years old) and during the next 10 years of her lifetime, which are referred to as the lifetime risk and the 10-year risk of developing breast cancer.
The Gail model [25] was developed based on case-control studies involving 2,852 white women with incident breast cancer and 3,146 white controls selected from the Breast Cancer Detection Demonstration Project (BCDDP) population data. The risk factors used in the Gail model are age, age at menarche, age at first live birth, number of previous breast biopsies, number of first-degree relatives with breast cancer and history of biopsy with hyperplasia [3,25]. These risk factors are broadly consistent with those selected from other large population-based studies [3]. Because the Gail model was developed from a database which includes only white women who tend to return for annual mammographic screening [3], it is anticipated that this model would overpredict risk in younger, unscreened women since the BCDDP population had a higher prevalence of women with adverse risk factors than the general population [3,25].
The Claus model [4] was derived from the Cancer and Steroid Hormone (CASH) Study, which was a multicenter, population-based, case-control study. The data consists of 4730 patients with histologically confirmed breast cancer, age 20-54 years, and 4688 control subjects. The control subjects were frequency-matched to patients according to geographic region and 5-year categories of age. The aim of the study conducted by Claus et al. differs from that of Gail et al. in that Claus et al. intended to address the issue of risk calculation solely for a subset of women who are at potentially high risk for breast cancer, i.e., women with a family history of breast cancer. For these women, it appears that the number and the type of relatives affected with breast cancer as well as the ages at onset of any affected relative may be the most important risk factors, more so than risk factors such as age at first live birth or age at menopause that are used in the Gail model. Claus et al. found in their data that risk of individuals increased as "age at onset" of their affected relatives decreased [4]. On the other hand, Gail et al. did not find, in their data, that age at onset was helpful in the prediction of risk once the number of relatives affected was considered [3,25].
Because the risk factors used in the Gail model are more consistent with those selected from other studies, the Gail model was able to be validated on other large databases. Validation studies [27,28] have shown that the Gail model predicts risk most accurately in women who undergo yearly mammographic screening and overpredicts risk for women who do not undergo yearly mammographic screening. Another validation study, which involved 109,413 women from the Nurses' Health Study, showed that the correlation coefficient between observed risk from the database and predicted risk from the Gail model was 0.67 [28]. These validation studies demonstrated that. for accurate estimation, the Gail and Claus models should be applied only to a population similar to those from which the models were derived.
With the increasing awareness of breast cancer risk and the benefit of screening mammography, more women in all risk categories are seeking information regarding their individual breast cancer risk. The need exists for primary care clinicians to be able to assess an individual's risk of developing breast cancer and offer an appropriate surveillance program for each individual [23,24]. Identification and close surveillance of women who are at high risk of developing breast cancer may provide an opportunity for early cancer detection.
Breast cancer risk assessment is an emerging service which includes determination of risk, recommendations for surveillance, and counseling for women at elevated risk. Currently, several prediction models based on large epidemiologic studies [16] have been developed to predict risk using known risk factors such as a woman's age, her family and personal histories of breast cancer, and gynecological information. Among them, the Gail model and the Claus model are the most commonly used for prediction of an individual's breast cancer risk [23]. These models are used by clinicians for counseling women who are seeking information regarding their individual breast cancer risk. The Gail model was used to identify women at high risk for the entry to the Tamoxifen Prevention Trial. Recently, Offit and Brown [16] reviewed four major models of risk prediction and provided a comparison of the different models. Since each of these models was derived with a different study design and used different factors to calculate risk, risk estimates for a given individual obtained from each of the models differed slightly. It was anticipated and confirmed that these models, which use a few selected risk factors, only predict risk accurately for the populations similar to those from which the models were developed [3,4,25-28]. Clinicians have been instructed to select models carefully since each of these models was designed based on a particular population. Further, the risk predicted from these models must be justified according to clinical observations since information such as a positive result from a DNA test for the BRCA1/BRCA2-mutation supersedes routine projections from a model [23,26].
Nevertheless, the models provide an epidemiologic basis for risk prediction and serve as guidelines for counseling patients until more refined predictions based on molecular characterization or other methods become available.
Over the past twenty years, the association of breast cancer risk with mammographic parenchymal patterns has been investigated. In 1976, Wolfe first described an association between risk for breast cancer and different mammographic patterns [86]. He described four patterns of breast parenchyma (N1, P1, P2, and DY) associated with different risk levels of developing breast cancer. An N1 (lowest risk) pattern indicates a breast in which the breast is composed entirely of fat tissue. P1 (high risk) and P2 (high risk) patterns refer to increasing ductal prominence (a P1 pattern consists of ducts occupying less than 25% of the breast and a P2 pattern consists of ducts occupying more than 25% of the breast). A DY pattern (highest risk) refers to a breast which is largely occupied by diffuse or nodular densities. Many investigators have used Wolfe patterns to classify the mammographic appearance of breast parenchyma for risk prediction [30]. Others have used qualitative or quantitative estimates of the proportion of the breast area (percent density) that mammographically appears dense to assess the associated breast cancer risk.
Since Wolfe's work, interest in the possible association of mammographic parenchymal patterns with breast cancer has varied [31-33]. Wolfe's initial reports were landmark studies in this field. However, the results provoked various criticisms, for example, possible bias in the results due to the "masking" effect. Studies showed that breast cancer was most easily detected by mammography in fatty breasts and was most difficult to detect in breasts with dense parenchyma, thus there were more cancers missed by mammography in women with dense breasts [34]. The hypothesis of the "masking effect" [31] said that the observed greater risk of breast cancer in women with dense breasts was due to the fact that these missed cancers in the dense breast at the initial classification declared themselves on subsequent follow-up.
Several groups [20,35] have conducted experiments to examine the masking hypothesis. Whitehead et al. [35] examined the masking hypothesis by using data from the Breast Cancer Detection and Demonstration Project (BCDDP). They found that the masking of cancer did occur in breasts with dense parenchyma; however, their results showed that the effect of the masking on estimation of breast cancer risk was small. They concluded that women with dense breasts have two disadvantages: 1) they were at increased risk of developing breast cancer, and 2) cancers occurring in dense breast parenchyma were more difficult to detect.
During the time of this controversy, many investigators studied the relationship between the mammographic patterns and breast cancer risk using the Wolfe method or percent density methods. Considerable variations were observed in reported results. In 1992, Warner et al. [30] carried out a meta-analysis using 35 publications to examine the effect of different methods on the assessment of breast cancer risk. They grouped the studies according to their designs and methods used, and determined the magnitude of the risk of breast cancer associated with mammographic density for the studies in each group. They found that the estimated relative risk of developing breast cancer depended on the methods that were used to classify mammographic patterns and ranged from 0.53 to 5.19. Based on the meta-analysis, they concluded that women with dense breasts have an increased risk of breast cancer relative to those with fatty breasts.
While visual assessment of mammographic patterns has remained controversial due to the subjective nature of human assessment [36], computer vision methods can yield objective measures of breast density patterns. Computerized techniques have been investigated to quantitatively evaluate mammographic parenchyma and identify women that are at risk of developing breast cancer. Computerized density analysis of mammographic images has been investigated by various investigators including Magnin et al. [37], Caldwell et al. [38], Taylor et al. [39], Tahoces et al. [40], and Byng et al. [41,42].
Magnin et al. [37] tried to classify mammograms into four categories (Wolfe patterns) using texture parameters extracted from co-occurrence matrices, the spatial gray level dependence method (SGLDM), and the gray level difference method (GLDM). They claimed that their result was inconclusive because a limited number of cases (27 mammograms) were used and the quality of the images used in the study was poor [37]. Caldwell et al. [38] used fractal dimension analysis to classify mammograms into the four patterns described by Wolfe, yielding 84% agreement with that of radiologists. Tahoces et al. [40] investigated the ability of linear discriminant analysis to quantify Wolfe patterns by merging texture measures obtained from Fourier transform method, local contrast analysis, and gray-level distribution. Their results showed that agreement (22%-77%) among radiologists and the computer classification varied depending on the Wolfe patterns. Taylor et al. [39] used local a skewness measure to separate fatty and dense breasts, yielding 85% classification accuracy for 106 mammograms. Byng et al. [41,43] investigated a semi-automated interactive thresholding technique based on visual assessment and computerized texture analysis (a local skewness measure and fractal dimension analysis) to quantify the percent density of breasts. Their results showed that computerized assessment of mammographic density using the texture measures .RTM.=-0.60) correlated well with the visual assessment (subjective classification) of the projected area of mammographically dense tissue. Furthermore, they showed that increased mammographic density was associated with an increased relative risk by a factor of 2 to 4. Their results also showed that the relative risk estimates obtained using the two computer-extracted texture measures were not as strong as those from their subjective mammographic classification method.
Development of a computerized method to automatically extract features that characterize mammographic parenchymal patterns and relate to breast cancer risk would potentially benefit women seeking information regarding their individual breast cancer risk. In this study, 14 computer-extracted texture measures were used to characterize mammographic parenchymal patterns. Selected texture measures were then related to breast cancer risk via two different approaches. i.e., First, these measures were used to differentiate mammographic parenchymal patterns of BRCA1/BRCA2 -mutation carriers from those of women who are at low risk of developing breast cancer and then they were used to predict breast cancer risk as determined from the Gail or Claus model.