The background description includes information that may be useful in understanding the present invention. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed invention, or that any publication specifically or implicitly referenced is prior art.
With the advent of the numerous “-omics” sciences: genomics, proteomics, glycomics, immunomics, or brainomics, for example, larger amounts of data are available than ever before, making analysis and even detection of relevant information overwhelming. For example, the amount of genomic data when sequenced to a statistically significant degree can easily exceed several terabytes of information, rendering any meaningful non-automated analysis impossible. To overcome this problem, automated systems can be used to identify anomalies by comparing data with reference thresholds. While such automated systems will identify outliers such as false positives and false negatives, the identification of outliers are, in most cases, still too frequent for one expert to review. For example within genomic, one mutation may be an indicator of a disease-causing genotype or it may be a silent mutation, which is relatively common.
To reduce the quantity of relevant information, an at least partially automated system can focus on single diseases or disorders to arrive at a dataset manageable for clinicians. For example, moles on the skin can be benign or malignant and can be imaged by a patient as described in U.S. patent application publication 2012/0008838. Here, a user registers and provide images of their skin to a system that then automatically analyses the images for characteristics of melanoma. A confidence value is generated, and if the value exceeds 50%, then the user can receive a recommendation to consult a physician or a referral to one or more specialists in the user's geographic location. While such a system provides a relatively robust analysis and expert follow-up, various drawbacks still remain. Most significantly, the diagnostic scope of such systems is limited to specific diseases, and within such disease to cases where the most determinative characteristics are already known.
In another example of partially automated analysis (see e.g., U.S. patent application publication 2004/0122790) a dataset is analyzed via a computer-assisted data operating algorithm to generate a result dataset identifying a feature of interest. Changes in the result dataset are then monitored based on input from a human expert. In one embodiment, the algorithm includes accessing image data derived from a medical imaging system, and supplemental data from an integrated knowledge base including clinical and non-clinical data from a plurality of controllable and prescribable resources. Although this method improves data analysis by integrating data from multiple sources, human input, a limiting resource is still required to refine the analytical algorithms. Still further, and as already noted above, such systems are typically limited to a limited set of conditions and findings.
Automated analysis is also known for non-imaging uses, as for example, discussed in U.S. patent application publication 2008/0091471. The '471 system assesses the immunological status of individuals in a patient population by establishing a database comprising a plurality of records of information each representative of the immune status of an individual in the population, processing the information in the database to find trends or patterns relating to the immune status of individuals in said patient population, and using the trends or patterns as part of a health care related decision-making process. Correlations are then generated between variables or fields in the database, and for each correlation a hypothesis is generated that may explain that correlation. Additional steps can include: automatic refuting, supporting or stating that there is insufficient data to analyze each hypothesis by further processing of the database, and reporting the correlations, their associated hypotheses and the determination to a user. While the '471 analysis advantageously improves discovery of patterns in relatively large datasets various difficulties still remain. One example difficulty includes, the analysis is generally limited to immunologic analysis. Another difficulty is that the correlations and hypotheses are reported to a user, which lacks a component of matching each report to a specific user who is qualified to take action in a timely manner.
Likewise, a method of assessing an individual's genotype correlations was disclosed in U.S. patent application publication 2010/0293130 that generates a genomic profile for an individual from a sample, determines the individual's genotype correlations with phenotypes by comparing the individual's genomic profile to a current database of human genotype correlations with phenotypes, and reports the results. Although this method provides an individual or a health care manager to information such as the individual's susceptibility to various diseases, this method lacks a discovery component, where the individual's genetic information becomes part of a basis for discovery of new traits. Moreover, single known genotypes may be silent or have a distinct phenotype, depending on other factors present in the patient. Such otherwise silent changes are not detected by the '130 system.
All publications herein are incorporated by reference to the same extent as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference. Where a definition or use of a term in an incorporated reference is inconsistent or contrary to the definition of that term provided herein, the definition of that term provided herein applies and the definition of that term in the reference does not apply.
Thus, there is still a need for systems and methods that automatically validate previously detected anomalies as significant and to connect experts with the validated findings for further action or analysis. Moreover, there is also a need for systems and methods that maximize the utility of experts, a limited resource, by filtering out false positives, false negatives, and outliers.