The present invention relates to techniques for filling important gaps in data, particularly in health-related data, including electronic health record (EHR) data and patient reported outcomes (PRO) data, but also in a wide range of data types, such as omic data, personal data, demographic data, and so forth. The invention also relates to platforms and approaches to aggregating data from contributing members of a community, compensating or motivating contributing members, and utilizing the data to aid both members and a larger community.
The value of big data has become generally accepted, creating an industry drive to aggregate different types of data in order to use machine learning and other tools to drive discovery. At scale even weak data can be valuable in unlocking links between our health and our genomic information and our daily habits, such as diet, exercise, drinking, smoking, hours of sleep, etc. Therefore, it is valuable to collect an individual's data from health institutions and through direct surveys and interviews. Additionally, it is now possible to collect implicit data through grocery store loyal customer tracking of purchase records, credit card records, online search habits, etc.
In all cases, even though at an aggregate level the data is valuable, there is a significant amount of missing information. Some examples are EHR and PRO data. EHRs are typically filled out by doctors online while meeting with patients. There are at least two types of challenges with these records. First, EHRs are designed primarily as billing systems for medical institutions. They record patient information such as test results, symptoms, and doctors' observations and recommendations, including treatment prescriptions. They do not typically record outcomes after treatment or whether patients followed the prescribed treatment regime. Moreover, doctors and other health professionals do not always use digital entry fields to record information. They often simply write information in unstructured comment fields. Additionally, the nomenclature and ontology used when inputting data is not standardized. There is a major challenge associated with deciphering and decrypting these free form comments.
In the case of PRO records, often patient outcomes are never reported in a system. There is no current method for identifying even when a PRO should be sought. Additionally, PROs might be required periodically over a long period of time. Some diseases and corresponding treatments may last months, years, or be persistent over a lifetime. The longitudinal information that would come from PROs is valuable in determining efficacy of treatments and in stratifying diseases diagnosed based on symptoms, to determine the underlying molecular basis for the disease.
To data solutions to fill in the holes and acquire the missing information have all been focused on using humans to review the data, identify mission information, and manually attempt to fill the gaps. People can for instance read EHR records, including fee form fields, and then populate the digital fields accordingly. Additionally surveyors call individuals and through in person interview they identify PROs, and input the corresponding information into digital database fields. In all cases the solution requires the intervention of an individual, and it is therefor not scalable in terms of labor hours or labor dollars for use with databases of tens of thousands to millions of individuals.
Beyond EHR and PRO data, many other data types may be extremely useful in piecing together an overall picture of the condition, state, or health of an individual and of groups of individuals, as well as for assessing possible pathways for maintaining health, avoiding or treating disease, recognizing and developing treatments, and so forth. But such pathways are hindered by missing or inaccurate data, and by typical “siloing” of data by separate sources, institutions, and so forth, often with no ill intent, and many times with patient or individual confidentiality in mind. At the same time, social media platforms may almost certainly share data, but again typically silo the data for their own purposes, and quite often even without any control by the individuals involved, and little or no quality control.
There is a need for improved technologies for data gathering and quality control, which scalable data aggregation and use. There is a particular need for such technologies that may enhance the control and motivate participation by contributing individuals, while protecting their privacy.