Virtually all medical test data are subject to variability and bias introduced by the human clinicians responsible for generating these data. Exemplary of this heretofore unsolved problem are medical test data generated in connection with fetal screenings for the risk of birth defects.
First-trimester (usually between 11 and 13 weeks of the pregnancy) screenings are often performed to test for risk of Down syndrome, trisomy 18, and elevated risk of other chromosomal defects, congenital heart disease, and other genetic and congenital disorders. These screenings generally comprise a blood serum analysis component and an ultrasound component. The blood analysis, conducted by a laboratory, measures maternal blood levels of several analytes; commonly, free-beta human chorionic gonadotropin (hCG) and pregnancy-associated protein A (PAPP-A). For instance, levels of PAPP-A tend to be decreased, and hCG increased, with Down syndrome. The ultrasound component, conducted by a physician or technician, involves measuring nuchal translucency (“NT”), which is the thickness of the fluid space in the tissue at the back of the fetus's neck. Increased NT is generally associated with Down syndrome, other chromosomal abnormalities, and several other genetic and congenital disorders.
The risk for Down syndrome and other defects is calculated based upon the combined results of the blood serum analysis and ultrasound components. Relative to each of the NT and blood serum components of the screening, there is a certain likelihood ratio (“LR”) associated with the results. The LR is an historically-derived ratio representing the number of healthy to abnormal fetuses for a given result. With the NT measurement, for instance, there is, for a given fetal crown-rump length (“CRL”), an LR for each NT measurement within that CRL. The LR is multiplied by an a priori, or background, risk factor based on maternal age (the risk of birth defects is documented to increase with the age of the mother) and gestational age to yield an adjusted risk specific to the patient (“patient-adjusted risk”). Again with respect to the NT measurement specifically, it is generally the case that the smaller the NT measurement, the lower the adjusted risk. Conversely, larger NT measurements generally equate to a higher adjusted risk.
In laboratory medicine it is routine to constantly quality control the data. Clinicians understand that every time they send a blood test to the lab, the lab is on a regular basis double-checking their results against known controls. This has not been the case with NT measurements, which in the United States are generally uncertified or controlled. Unfortunately, this translates into patients being falsely reassured that their pregnancies are normal when they may actually be having a baby with Down syndrome and/or one or more other serious birth defects or, conversely, being falsely warned that the pregnancy is abnormal when, in fact, this is not the case.
As the use of NT screening has increased, human-introduced variability has had a profound negative impact upon overall performance, creating considerable controversy in how to account for biases in these measurements, e.g. individual vs. national curves.
What is more, these differences in measurements are costly. It has been shown that a 3% improvement in NT screening performance in the United States could produce an annual cost savings of $100 million for combined first trimester screening.
To address this problem, the Fetal Medicine Foundation (“FMF”) established a training process requiring a written test, submission of images for grading, and continuing, periodic recertification. This has been adopted without much opposition in much of the world outside of the United States. In this country, the Society for Maternal Fetal Medicine set up the Nuchal Translucency Quality Review program (NTQR) to oversee training and review of U.S. physicians. NTQR provides an internet-based program for educating, testing proficiency, and reviewing the quality of NT screening professionals. NTQR monitors the quality of participating members, with members whose quality is found to be deficient as compared to the prevailing standard being identified for remediation. Nevertheless, while remediation may improve future NT screening results, the problem of patients being provided presently false data in respect of those screenings known to be of deficient quality remains unresolved.