1. Field of the Invention
The present invention relates to a data management technology, and in particular, the present invention relates to a method and system for caching semantic data in terminology services.
2. Description of Related Art
Terminology service is a service for returning specific contents from a terminology knowledge set in a specific professional domain as required by a user. Semantics-based applications, for example semantic query, need a terminology server to provide semantics support, i.e., the terminology server provides definitions of concepts and definitions of the relationships between the concepts to a client. Standard terminologies provide common understanding to the domain knowledge and facilitate efficient information processing and knowledge sharing across different parties, thus a terminology server typically holds a large amount of data. The client retrieves terminology data from the terminology server in accordance with a request regarding a terminology in an application, for supporting execution of semantic applications. In order to achieve better performance and scalability of semantic applications, it is necessary to use a local client memory device to cache the terminology data, such that the terminology data received from the terminology server is enabled for possible later use.
However, the cache space for keeping terminology data in the client memory device is limited, and it is impossible for a user to cache all terminology data in the terminology server into the client memory device. For the purpose of rapid response to a request for terminology data access and minimizing resource consumption due to network interaction with the terminology server, it is essential to determine proper cache policies, which mainly include: what data to be retrieved from the terminology server for caching in response to a terminology request, and how to cache.
A traditional cache technology, for example page and tuple buffer, can not satisfy the requirements on accessing terminology services. Page caching and tuple caching use statically defined physical units with fixed lengths, while the requested data items are specified directly using a list of physical pages or tuple identifiers. In a cache for terminology services, it is necessary to perform dynamic management on the defined data items with terminology-based semantic relationships, because components of terminology data include: a concept which is an abstract and universal idea and notion for an object, wherein term is a representation of a concept and a concept can be indicated by different terms; properties to the concept, which are attributes, characteristics, parameters and among others of the concept per se, for example properties of the concept include a term for naming the concept and other attributes of the concept; and relationships, which represent correlations among concepts. Usually, after a user requests for information of a concept in terminology data, he typically will then request for information of another concept whose dependency relationship is correlated to the concept. Thus, it is hard to reflect the semantic relationships of the terminology data in caching terminology data in accordance with the traditional caching technology, and thus it is impossible for effective response to the request for semantic application, i.e., decreasing hitting rate to the cached data in the system, and high redundancy level of the cached data per se of the client, etc.
Further, the relationships among terminology concepts include non-transitive relationships and transitive relationships, wherein a transitive relationship may be defined as: for any concepts A, B, and C and a particular relationship r, r (A, B) and r (A, C) are known, and if r (A, C) also establishes, then the relationship r is a transitive relationship, for example, “is a kind of” and “is a part of” represent a transitive relationships between correlated concepts; whereas a non-transitive relationship may be defined as: for any concepts A, B, and C and a particular relationship r, r (A, B) and r (B, C) are known, and if r (A, C) does not establish, then the relationship r represents a non-transitive relationship, for example “interacts” represents a non-transitive relationship of correlated concepts.
It should be understood that, the specific expression for a transitive relationship is: concept A concept B, concept B concept C, then there is concept A concept C. The current caching mechanism can not embody the transitive relationship present among concepts which are not directly correlated, i.e., the relationship between concept A and concept C. Thus, it is impossible for effective response to the request involving transitive relationship, for example, the client needs to retrieve information of all concepts correlated to a given concept through a transitive relationship.
Thus, as required by terminology services, a new solution is needed to cache at the client terminology data from the terminology server.