1. Field of the Invention
The present invention generally relates to relational databases. In particular, the present invention relates to a system for database-based semantic query answering where individual data in the database is enriched using ontological knowledge and the method thereof, which can implement database-based semantic query answering efficiently.
2. Description of the Related Art
With the mass application of databases, efficiently retrieving data required by a user of the database has become an urgent problem. Particularly, as EMRs (Electronic Medical Records) are widely used, efficient retrieval of clinical documents according to a user's requirement has become an urgent need.
The IHE XDS (Cross Enterprise Document Sharing) provides an architecture for managing the sharing and retrieval of clinical documents between healthcare enterprises. In the XDS, the query of clinical documents is restricted to the metadata provided during the submission of the documents, such as the submission time and patient ID. However, many of the user's query requirements focus on the contents of the clinical documents, for example, finding patients with some clinical observations who are eligible for a clinical trial.
In general, keyword-based search is used to retrieve the content-based clinical documents. Compared with the formal query languages, such as SQL (Structured Query Language) in database systems and the query language in logic systems, keyword-based search suffers in that: (1) the keywords cannot fully capture the user's requirements; and (2) the completeness of the results cannot be guaranteed.
The Health Level 7 Clinical Document Architecture (CDA) proposes a widely adopted standard to represent the electronic medical record. In addition to the hierarchical structure of documents, CDA also specifies the semantic meaning of the document content to avoid ambiguity in information exchange. A key characteristic of CDA is the frequent use of ontological (terminological) references, such as SNOMED-CT (Systematized Nomenclature of Medicine-Clinical Term), which is a well-known ontology in healthcare domain. Fragments of CDA documents are associated with the ontological concepts defined in SNOMED-CT whose expressivity is of the Description Logic Language EL+[1]. For example, the following CDA document fragment states an observation of finger injury for a patient:
<Observation><code code=“ASSERTION” codeSystem=“2.16.840.1.113883.5.4”/><value xsi:type=“CD” code=“52011008” codeSystem=“2.16.840.1.113883.6.96”codeSystemName=“SNOMED-CT” displayName=“Finger injury”></value></Observation>
This document fragment includes an ontological reference to the concept of “Finger injury” originally defined in SNOMED-CT as follows:
Finger injuryis-aDisorderfinding-siteFingerThe concept “Finger injury” is a sub-concept of “Disorder” and each instance of “Finger injury” has a finding site that is an instance of “Finger”. In SNOMED-CT, the body structure “Finger” is also defined with respect to the role “partOf”: “Finger” is defined as part of “Hand” (that is, the role “partOf”), and “Hand” is defined as part of “Upper Limb”. More expressively, the role “partOf” is transitively defined, which means if a is part of b (a partOf b) and b is part of c (b partOf c) then a is part of c (a partOf c).
The ontological references in CDA documents are the key enabler for semantic query of the CDA documents because the CDA documents can be interpreted as fact assertions about the ontology. For example, the above CDA fragment can be interpreted as a clinical document having an observation that is an instance of the concept “Finger injury”. These assertions can be represented by the RDF (Resource Description Framework) triples as:
ex:CDA_doc_1rdf:typeex:CDADocument .ex:CDA_doc_1ex: hasObservationex:obs_1 .ex: obs_1rdf:typesct:FingerInjury .
A sample query for CDA documents is shown below as an example, e.g., querying which documents having observations of disorders with the finding site at “Finger”.                Q(x):—ex:CDADocument(x), ex: hasObservation(x, y), sct:Disorder(y), sct:findingSite(y, z), sct:Finger(z).        
There are only assertions about “FingerInjury” in the RDF triples of the above document, without any reference to the “findingSite”. Therefore, direct data retrieval cannot help in returning the above CDA document as a result, and those documents implicitly describing the “findingSite” at “Finger” cannot be queried.
Query answering on healthcare data is critical. Currently, healthcare data has been widely annotated by healthcare ontologies, such as SNOMED-CT and Gene ontologies, whose expressivity is of the Description Logic Language EL+, and thus query answering on healthcare data should leverage ontology reasoning to provide sound and complete answers. By SNOMED-CT ontology reasoning, the assertion about the finding site in the above CDA document may be obtained in the example above. However, the above method with SNOMED-CT ontology reasoning requires ontology reasoning on each CDA document. Due to the fact that healthcare ontologies and data are often large-scaled, the above method may generate a huge amount of reasoning results, which may degrade the performance of the query answering system, making it unable to handle queries efficiently.
Similarly, in other fields with large-scaled ontologies and data, a similar problem exists in producing complete reasoning for ontologies while efficiency processing query answering.