The present embodiments relate to machine learning and application of machine-learnt algorithms. In particular, computer assisted medical decision support incorporates a medical ontology.
Ontologies and machine learning constitute two technologies for domain-specific knowledge extraction actively used in knowledge-based systems. Ontologies are a result of the knowledge elicitation process from an expert by knowledge engineers, and data is not necessarily involved in this process. Machine learning is data-driven. The search for patterns is usually automatic and may not involve substantial interaction with the expert. The aim of these two technologies is generally the same—the extraction of useful knowledge.
By establishing an explicit formal specification of the concepts in a particular domain and relations among them, ontologies provide the basis for reusing and integrating valuable domain knowledge within applications. Medical ontologies provide information associated with one or more diseases and numerous medically relevant concepts (e.g., laboratory and diagnostic procedures; physiologic, biologic, genetic, molecular functions; organs and body parts; diseases, symptoms, and medical findings; and others). Different relationships between concepts are reflected by the medical ontology. For example, different names for a same disease are provided in an “IS A” type relationship. Related morphologies (e.g., inflammation) and body location are other types of relationships in the medical ontology. Medical ontologies may also contain various terms associated to a medical concept representing the same (or similar) meaning for the concept.
Medical ontologies provide information for computer assisted medical decision support. Computer assisted medical decision support systems may be deterministic. For example, a rule-based system alerts clinicians to drug-drug interaction. The rules are determined manually from the medical ontology.
Machine learning algorithms are applied in order to extract useful knowledge in different problem domains by searching for interesting patterns (dependencies) in large volumes of data. The principle of instance (patient) similarity is the basis for many machine learning algorithms. The main assumption in supervised, unsupervised and semi-supervised machine learning algorithms is that the instances of the same class (cluster) are more similar to each other than the instances of different classes (clusters).
Traditional machine learning algorithms are not able to incorporate background domain knowledge, but instead work with a sequence of instances, where each instance is represented by a single feature (attribute) vector describing the instance.
The use of ontologies in data mining is focused on homogeneously represented cases and use taxonomic distance and ontologies with “is_a” relations. These techniques are not particularly suitable for mining complex medical data. The focus in the so-called knowledge-intensive similarity measures is on creating a customised distance function for each particular feature, and not on the total aggregated distance (similarity). One-level feature grouping and either building a separate model for each semantic group (ensemble learning) or aggregating partial distances calculated within each group have been proposed.