The present invention relates generally to the field of active learning, and more particularly to predicting the risk of patients to certain diseases using electronic health records (EHR) along with active learning with relative similarities.
Active learning has been extensively studied and successfully applied to solve real world problems. The typical setting of active learning methods is to query absolute questions. The key idea of active learning is that a machine learning algorithm can achieve higher accuracy with fewer training labels if it is allowed to choose the data from which it learns. Active learning extends machine learning by allowing learning algorithms to typically query the labels from an oracle (e.g., a human annotator) that already understands the problem for currently unlabeled instances. Though enormous progress has been made in the active learning field in recent years, traditional active learning assumes that the questions prompted by a machine can be confidently answered by human experts, which may not be the case in many real world applications.
In a medical application where the goal is to predict the risk of patients on certain diseases using EHR, the absolute questions may take the form of, “Will this patient suffer from Alzheimer's later in his/her life?” or, “Are these two patients similar or not?” Due to the excessive requirements of domain knowledge, such absolute questions are usually difficult to answer, even for experienced medical experts. In addition, the performance of absolute question focused active learning methods is less stable, since incorrect answers often occur which can be detrimental to the risk of the prediction model.