The present invention relates to a method for predicting the future occurrence of medical conditions that have not yet occurred or which are clinically occult.
Neural networks are well known and have been used to implement computational methods that learn to distinguish between objects or classes of events. The networks are first trained by presentation of known data about objects or classes of events, and then are applied to distinguish between unknown objects or classes of events. While neural networks have been applied in medicine to diagnose diseases based on existing symptoms, and to prescribe treatments for the diagnosed diseases, to date, there has been no application of such networks to predict future occurrence of disease which is clinically occult or which has not yet occurred, or to predict the relapse of disease that has presumably been cured.
Such prognostication is important in all branches of medicine. For example, it is useful in the field of oncology in order to improve the prediction of prognosis of patients so that appropriate therapy can be selected. This goal is of particular importance in the selection of treatment of breast cancer patients who are presumably rendered disease free after the removal of the primary tumor within the breast, and who have no pathological evidence of axillary lymph node involvement. Most of these patients will have been surgically cured, but a substantial minority will relapse.
Several recent studies suggest that certain breast cancer patients without axillary lymph node involvement can benefit by adjuvant chemotherapy or hormonal therapy. However, not all the individual patients actually benefit from this therapy, and a majority of these patients receive therapy that is not necessary.
Prior efforts to predict breast cancer prognosis use a number of biochemical, molecular biologic and biophysical input variables that can be used to describe the cells in a tumor. When such multiple input variables are available, typically various combinations of the input variables are assessed using multivariate analysis. Multivariate analysis is a powerful tool but suffers from the disadvantage that it is often unable to effectively analyze outcome based on a highly non-linear input variable. In addition it is at particular disadvantage in analyzing interactions between several non-linear variables (where for example multiple peaks and troughs of recurrence probability may exist). All of this is particularly true when a given input variable is included in one of two input states (as is commonly done in clinical medicine), with an optimum threshold or cut-point between the two states being determined by maximizing a likelihood function using regression analysis. While such multivariate analysis is not without advantage, it suffers from drawbacks because defining a single cut-point between two states of an input variable effectively ignores important non-linearities in the input variable. In addition, multivariate analysis can miss cross-correlation effects between input variables.
Other clinically occult diseases which have known multiple risk factors, for example, coronary heart disease or diabetes, would also benefit from improved prognostication methods.
Other methods have also proven to be important tools in the prediction of prognosis in breast cancer, and other tumor types. Such methods, known as DNA cytophotometry, process images of cells or cell components to quantitatively estimate a number of nuclear and cellular parameters. Of particular interest is DNA flow cytometry. The basis of DNA flow cytometry is the measurement of the level of DNA in individual cells. The technique results in DNA histograms indicating the number of cells having different levels of DNA. Conventionally, DNA histograms obtained through flow cytometry are interpreted as having cells in three basic regions: cells in the G1/G0 phase of the cell cycle before replication of DNA; cells in the S-phase which are actively replicating DNA; and cells in the G2/M phase of the cell after DNA replication but before cell replication.
Tumor cells are conventionally interpreted as diploid if they have a G0/G1 peak with a DNA content that is that of normal cells, if there are no other peaks in the histogram with an arbitrary cut-off percentage of counts (usually 10%) of that peak value, and if the G0/G1 peak in the histogram is narrow enough to be considered to represent cells of one population. S-phase counts of a DNA histogram lie in that region between the G0/G1 peak and the G2/M peak.
Several complex mathematical formulae have been developed to count the number of S-phase events while subtracting out events due to the tails of the G0/G1 and G2/M peaks, and while subtracting out the effects of contaminating cell debris. A particular sophisticated method known as SFIT uses second degree polynomials to perform this subtraction. These mathematical formulae are particularly complex for aneuploid histograms when they often have to deal with cell kinetics from cell populations that are both diploid and aneuploid. All of these mathematical approaches are however based on a mechanistic view of cells being in either the G0/G1, S or G2/M phases of the cell cycle.
As such, present techniques for analyzing DNA histograms resulting from flow cytometry ignore other patterns occurring in the DNA histograms which correlate with the risk of cancer relapse.