The invention relates to computer data retrieval systems, and more particularly to methods for performing temporal logic queries using statistical reliability criteria.
In analyzing data about certain real-world problems, it is useful to view the data as temporal data. Temporal data typically describe how an observed quantity changes over time or when certain conditions occur. Temporal data are useful for answering questions such as xe2x80x9chow often,xe2x80x9d xe2x80x9chow long until,xe2x80x9d or xe2x80x9cis this getting better with time.xe2x80x9d
Examples of temporal data and possible questions that could be asserted against that temporal data include the following:
Dates of heart attacks (does treatment X reduce the risk of fatal heart attacks?)
Monthly blood pressure readings (does treatment X reduce high blood pressure?)
Periods of hospitalization (how long are patients with condition X staying in the hospital?)
Arrest records (does community policing have a favorable effect? for how long?)
Point-of-sale information for a retail store (which ad campaigns improved sales?)
Temporal data may be organized into xe2x80x9cchronologies,xe2x80x9d which are sequences of related temporal data. For xe2x80x9cperiod chronologiesxe2x80x9d each datum has a start- and end-time. For xe2x80x9cevent chronologies,xe2x80x9d each datum has an xe2x80x9ceventxe2x80x9d time. Each datum usually has associated information. The sequence of hospital stays for a patient would be a period chronology, and its associated data might include information regarding diagnosis, facility, and admitting physician. The dates of heart attacks for a patient would be an event chronology, and its associated data might include a severity score.
Temporal data may also contain xe2x80x9ceternalxe2x80x9d data. These are temporal elements that are time-invariant, per-patient data. Eternal data is true for all observed time. For example, gender, county of birth, date of birth, date of death, index date, censoring date, etc. are stored as eternal facts. These eternal facts have proven useful in temporal logic manipulations, and also provide a way of packaging some of the xe2x80x9ccriticalxe2x80x9d data, such as an index or right-censor date, as well as computationally useful data, such as date-of-birth, which is used to convert date-of-event into age-at-event. Such temporal objects thus include: period chronologies, event chronologies and eternal facts. These are all segregated by patient. Additional patient-independent objects may also be provided.
Temporal logic is a useful way of manipulating temporal data. Two periods can have a relationship to each other. For example, two periods can have a containment relationship, where period A completely contains period B, or an overlap relationship, where period A and period B partially overlap each other. Temporal logic extends these notions by providing operations that combine temporal data. For example, if it is desirable to know which patients had heart attacks while being treated for diabetes, this can involve an operation that combines temporal information about heart attacks with temporal information about periods of treatment for diagnosis.
Table 1, below, shows some representative operations and their application.
Operations can be cascaded. For example, xe2x80x9cpatients who were not on beta blockers in the 30 days before a heart attackxe2x80x9d can be identified by:
converting the heart-attack event to a 30-day period (xe2x80x9cconvert event to time periodxe2x80x9d),
finding the periods when the patient was not on beta blockers (xe2x80x9cinvert time periodxe2x80x9d), and
Intersecting the two resulting chronologies (xe2x80x9cintersect time periodsxe2x80x9d).
Temporal data introduces a rich variety of complexities into analysis. Causality is often associated with temporal order. Many scientific studies are based on the paradigm xe2x80x9cintervene, then observe what happens afterward.xe2x80x9d Similarly, with temporal data assertions can often be made about duration: xe2x80x9cthe patient was pain-free for 4.5 hours.xe2x80x9d
Implicit in analyzing temporal data are the concepts of truncation and censoring: data is often gathered from only a limited period of time. Knowledge of what happened before observation begins is limited, and is probably completely unknown after the period in which data is gathered. This phenomenon is called xe2x80x9ctruncationxe2x80x9d of data, referring to the period of observation for the aggregate of units (e.g. subjects, participants, or patients). xe2x80x9cCensoringxe2x80x9d refers to starts and stops of observation at the level of the unit. Thus, individual units may have varying lengths of observation, which can profoundly alter techniques appropriate for statistically valid uses of temporal data.
Consider a study comparing two methods for preventing heart attacks in high-risk patients. The patients are treated and then monitored for a finite period of time, such as four months. The data would be left-truncated before treatment and right-truncated after the four months. At the end of the study, it is desired to compare the average time-to-heart-attack for the two treatment groups. During the four months, some patients will have reported a heart attack. Assuming they have had continuous observation since treatment began, their contribution to the average time-to-heart-attack for their treatment group is obvious. However, special considerations may have to be given to the patients who may have not been under observation for the entire four months duration. Perhaps some patients died, dropped out of the study or migrated from the area. Their survival time during the four months must be measured, but it""s often not clear how their observation time should be incorporated into the comparison of average time-to-heart-attack. Using four months for each of them is plausible but may underestimate the true average time-to-heart-attack, just as assuming that they will never have an attack overestimates the time-to-heart-attack. This lack of data throughout the aggregate observational duration is termed xe2x80x9ccensoring.xe2x80x9d Specifically, right-censoring occurs when patients are observed from the left-truncation date, but are not monitored through the entire period up to the right-truncation date. When individual units are observed until the aggregate right-truncation date (such as the official end of the study), their right-censoring and right-truncation dates are identical, and in practicality synonymous.
A similar situation can occur when the time after the beginning of aggregate observation is relevant to an analysis, but the observed or collected information about the past is subject to a xe2x80x9cbegin datexe2x80x9d cutoff. This is called xe2x80x9cleft censoring,xe2x80x9d and may occur, for example, when a patient is a late entry into an ongoing study.
The heart attack study example also introduces the notion of an index date, which is the date that a subject formally entered the study. The time after the index date is often called the xe2x80x9clongitudinal follow-up period;xe2x80x9d as noted, this time period can be limited by right-censoring, and is ultimately limited by right-truncation for the aggregate.
Information pertaining to the period before the index date is often called the xe2x80x9cbaseline period.xe2x80x9d The baseline period usually extends only a limited time into the past, for various reasons such as: lack of data, limitations of recall, or because the earlier data is not informative.
Critical data, such as the date data discussed above, depend on factors such as the design of the data or study or the data subject (e.g. patient). The left- and right-truncation dates, for example, depend on the data or study design. Each individual in a study cohort may have different index, right-, and left-censor dates, yielding differing durations of longitudinal follow-up.
Statisticians have developed methods for the analysis of temporal data, which is often referred to as xe2x80x9canalysis of longitudinal data.xe2x80x9d For example, statistical programs such as SAS and SPLUS incorporate routines for analyzing longitudinal data (e.g. survival analysis or Kaplan-Meier methods). The routines use information such as right-censor date, but they also require that the user has correctly computed each unit""s observation time. Moreover, these systems do not offer any automated facilities for use of critical dates without prior manual computation of the critical dates. These systems also do not offer any automated facilities for integrating temporal logic, nor for developing computations based on temporal logic, which is particularly difficult to formulate in a manual process.
Relational database systems often have a data type for dates. The 1992 SQL standard defines temporal data types, such as INTERVAL. These data types are potentially useful as building blocks for a temporal logic system. For example, Snodgrass, R., Developing time-oriented database applications in SQL. Morgan Kaufmann, California 2000, describes an approach where SQL fragments can be used to express basic temporal operations, such as intersection of periods. Also, Nigrin, Daniel J, and Kohane, Isaac S, Temporal expressiveness in querying a time-stamp-based clinical database, J American Medical Informatics Association, 2000; 7(2); 152-163, describes an implementation of temporal logic using a relational database system. However, these database system references do not extend the temporal operations to incorporate the statistical concepts necessary for correct analysis of longitudinal data. Without automatic incorporation of the appropriate statistical concepts, the expressivity of temporal logic can be used to make statistically inappropriate uses of temporal data.
In order to properly analyze temporal data using these statistical analysis methods, it is useful to take into consideration statistical reliability criteria, such as the index date, left- and right-truncation dates, left- and right-censoring dates and other such critical data. It is also useful to ensure that the data before and after the index date (respectively xe2x80x9cbaselinexe2x80x9d and xe2x80x9cfollow-upxe2x80x9d) is regarded separately.
In an aspect of an embodiment of the invention, critical data from a data source is automatically merged with a temporal data request, to produce a statistically reliable temporal data query.
In another aspect of an embodiment of the invention, the data source is annotated with metadata, the metadata identifying critical data within the data source, the critical data embodying a statistical reliability criterion.
In another aspect of an embodiment of the invention, baseline information contained in the data source is summarized.
Other and further aspects of the invention will become apparent in view of the following drawings and detailed description of preferred embodiments.