It would be desirable if the onset of future health problems could be predicted for an individual with sufficient reliability far enough into the future so that the chances could be increased for preventing future health problems for that individual rather than waiting for actual onset of a disease and then treating the symptoms. At present, the overwhelming fraction of medical research funding is directed toward improving methods of diagnosis and treatment of disease rather than toward discovering preventive measures that could be directed toward reducing the risk of disease long before any of the typically observed symptoms of the disease are evident. Although the emphasis on treatment of diseases may have led to enormous advances in the medical sciences in terms of the large number and great sophistication of the techniques and methods developed for diagnosing existing diseases as well as for treating the diseases after diagnosis, such advances continue to lead to ever-increasing costs for treatment. Such costs can have staggering financial consequences for individuals as well as for the entire society. Such staggering costs have led to increasing public pressure to find ways of reducing medical costs.
Thus, in addition to the benefit to be gained by an individual who could be informed of the high risk of the onset of disease far enough in advance so that effective preventive steps could be taken, substantial reductions in overall medical costs might be realized by entire communities and/or countries.
Until now, two of the problems inherent in attempting to assess or predict an individual's future health are: (a) such predictions are imprecise because they are based on data obtained from relatively small study samples, consisting of a few hundred or even a few thousand subjects, and (b) the predictions require extrapolation to individual persons from the mean (and other parameters) of that sample. Such extrapolations are highly problematic with respect to reliably estimating the risk of a specific individual, even within a group at high risk for a specific disease. This is true, in part, because the statistical procedures that are typically used are designed to make inferences about population means, not about individual members of the population.
To obtain quantitative predictions, an "individual's future health" must be designated as the occurrence of a specific event within a specified timeframe. Two examples are: (a) occurrence of a myocardial infarction within the succeeding five years, (b) the individual's death within the next year. Predictions of such events are necessarily probabilistic in nature.
Two types of probability are important in this context. The a priori probability of an event is the probability of the event, before the fact of the event's occurrence or non-occurrence. The post hoc probability of an event is the probability of the event after the event is realized, i.e., after the event's occurrence or non-occurrence. Clearly the post hoc probability of an event is 1 if the event occurred and 0 if the event did not occur. The distinction between the a priori probability and post hoc probability is worthy of note.
The a priori probability of an event occurring in the subsequent year, or other time interval, can be important information. Knowledge of the probability of an event can modify behavior or, put another way, the actions one takes (behavior) can depend on the a priori probability of an event. This principle is made self evident by considering two extreme cases. One would almost surely exhibit different behaviors (take different actions) under the two scenarios: one is informed that one's probability of death in the coming year is (a) 0.9999, or (b) 0.0001.
The a priori probability of an event depends upon the information available at the time the probability is evaluated. To illustrate the point, consider the following hypothetical "game."
A living, person will be selected at random from all U.S. residents and followed for a period of one year. At the end of the year the person's vital status (alive or dead) will be ascertained. The "event" is "the person died during the year." At the end of the year the event either occurred (person died) or did not occur (person survived) with post hoc probabilities of 1 and 0, respectively. Before the person is selected, the U.S. mortality statistics can be used to estimate the a priori probability that the person will die in the year. This probability is computed as p=d/N, where N is the total number of persons in the at risk group (here, all the persons in the U.S. population who were alive at the beginning of the year) and d is the total number of deaths among the at risk group. For example, the data from calendar year 1993 are (approximately), d=2,268,000, N=257,932,000, and the a priori probability of the event is approximately p=0.0088. [Data from Microsoft Bookshelf1995 Almanac, article entitled, "Vital Statistics, Annual Report for the Year 1993 (Provisional Statistics), Deaths." and Vital Statistics of the United States, published by the National Center for Health Statistics.] In this game, the a priori probability of the event is based upon very little information, simply that the person would be a member of the at risk group, consisting of all persons who would be alive and a U.S. resident at the time of selection.
Additional information about the at risk group, from which the subject is selected at random, implies additional information about the subject and modification of the a priori probability of the event. For example, continuing the "game" above, based on 1993 data:
If the at risk group were the group of U.S. males, i.e., if the subject is known, prior to selection, to be a male, the a priori probability of the event is approximately p=0.0093, which is about 6% higher than the case where gender is unknown or unspecified. PA1 If the at risk group were the group of U.S. males aged 75-84, i.e., if the subject is known, prior to selection, to be a male in the age interval 75-84, the a priori probability of the event is approximately p=0.0772, or about 8.3 times as high as for males where age is unknown or unspecified. PA1 (a) a computer comprising a processor containing a database of longitudinally-acquired biomarker values from individual members of a test population, subpopulation D of said members being identified as having acquired a specified biological condition within a specified time period or age interval and a subpopulation D being identified as not having acquired the specified biological condition within the specified time period or age interval; and PA1 (b) a computer program that includes steps for: PA1 (a) a computer comprising a processor containing a plurality of biomarker values from an individual; and PA1 (b) a computer program that includes steps for applying a statistical procedure to said plurality of biomarker values so as:
These examples illustrate the general principle that the a priori probability of an event depends upon the information available at the time the probability is evaluated. The most accurate estimate of an a priori probability is typically the one based on all of the available information.
A very accurate estimate of an a priori probability does not guarantee a specific outcome; that is, the a priori probability for a specific individual may not be very close to the post hoc probability. Consider the extreme case cited above, where the a priori probability of death of a specific individual in the succeeding year is 0.0001. Although survival is highly probable, it is not guaranteed: of all individuals in this "game," approximately 9,999 of each 10,000 will survive the year and have a post hoc probability of 0 (which is close to the a priori probability, 0.0001) and approximately 1 of each 10,000 will die and have a post hoc probability of 1, which is very different from the a priori probability. To further elucidate this principle, consider a fair coin toss in which the a priori probability of "heads" is exactly 0.5. The post hoc probability of "heads" is either 0 or 1, neither of which is very close to 0.5. Thus, the a priori probability for one individual should not be considered an approximation of the post hoc probability for that individual. However, if a very large number of individuals "play the game," the mean of the post hoc probabilities, which is also the proportion of individuals for whom the event occurs, will be very close to the a priori probability.
In some cases a person can change an a priori probability by "moving" to a group with a different a priori probability. For example, epidemiologists have shown that a U.S. resident, middle-aged male with a high total cholesterol level, including a high low-density lipoprotein level, has a higher a priori probability of death from myocardial infarction in the succeeding five years than a comparable person with a much lower cholesterol level. Clinical trial research has shown that if the high-cholesterol person can reduce his cholesterol level substantially, i.e., "move" to a much lower cholesterol "group," he substantially reduces his a priori probability of death from myocardial infarction in the succeeding five years.
In succeeding paragraphs and sections the word risk will be used in place of the phrase "a priori probability of a specified event within a specified timeframe." This corresponds to the statistical definition of risk as expected loss, where the loss function takes value 1 if the event occurs and 0 if the event does not occur.
The foregoing comments illustrate the principle that differing levels of information lead to differing a priori probabilities. The risk for a person about whom much is known (i.e., a member of a small subpopulation with many known characteristics) may be very different from the risk for a large subpopulation with few known characteristics. However, there is yet another problem confounding the ability of traditional scientific research studies on populations to ascertain risk of disease for individuals. This problem results from a commonly over-simplified understanding of the causation of disease, particularly the causation of chronic degenerative diseases such as cancers, cardiovascular diseases, diabetes, etc. That is, there is a tendency to believe, for a variety of reasons, that such diseases can either be controlled or be clinically indicated by single constituents or by prescribing a single pharmaceutical compound. For example, it has been suggested that breast cancer can be controlled by a modest reduction of fat intake, that colon cancer can be controlled by adding specific dietary fiber components, that heart disease is clinically indicated by elevated blood cholesterol, and that stomach cancer can be clinically indicated by low blood levels of vitamin C. These over-simplified views too often prove to be inadequate for identifying causation, particularly for an individual person. There are too many confounding variables to be taken into consideration, to say nothing of the great difficulties of extrapolating population data to individuals within the population. Testing and investigating single constituents, among a milieu of thousands if not millions of possible constituent causes, is fraught with great uncertainty, especially when attempting to extrapolate these data to the estimation of disease risks for individuals.
These dual difficulties, (a) of extrapolating data for experimental populations of individuals to a randomly selected individual and (b), of relying on single indicators or causes of disease occurrence, seriously compromise estimation of future disease risk for a randomly selected individual. If an individual's risk for a specific disease could be determined more reliably, it then would be possible to provide information to this individual who could then make more informed decisions on his or her personal behavior. In essence, much more reliable methods of predicting future health could become an unusually powerful means for individuals to internalize their own health situation and, thus, to take more effective control of their own well being.
Moreover, for those individuals identified as being at high risk for a particular disease because they may fall within several categories, wherein each individual category is highly correlated with a specific disease, such as heart disease, the currently available methodology typically does not allow one to quantitatively predict when the disease will strike or become fatal for a specific individual with a sufficient reliability or level of confidence to motivate that individual, in general, to take effective steps far enough in advance to significantly reduce that risk. It would, therefore, be desirable to have an effective general purpose tool that would not only reliably predict onset of a specific health problem within a specified time period, but such a tool would also be useful for monitoring the preventive measures that are taken based on such predictions.