Databases often contain different entries for similar data sets, which often results in relevant data not being returned for a query. Take for example a medical records database, where the proliferation of medical terms is a major obstacle in the sharing of medical information among different shareholders (e.g., hospitals, clinicians, pharmaceutical companies etc.). Different clinicians within a hospital often use distinct terms to refer to the same diagnosis, while symptoms are often recorded to a patient's record in varying levels of granularity. For example, one clinician might describe a patient diagnosis using the term “Pineoblastoma”, while another might use the (synonym) term “PNET of Pineal Gland”. Therefore, a query for records comprising “Pineoblastoma” usually only returns the record including “Pineoblastoma” and not the record including “PNET of Pineal Gland”. Also, a generic term such as “Brain Neoplasm” might be recorded in a record instead of the more specific “Pineoblastoma” (where the latter term is said to be a hyponymn of the former). Therefore, a query for records comprising “Pineoblastoma” usually would not return the record comprising “Brain Neoplasm” even though the term “Brain Neoplasm” includes “Pineoblastoma”.
As can be seen data sets in a database can be represented using different terms, which usually results in a query only returning records that exactly match the query terms even though additional records are relevant to the query. This incomplete query result does not provide all relevant information to the user and can cause critical information to be missed.