1. Field of the Invention
The present invention relates to data handling in a data processing system. More particularly, the present invention relates to a system and method for metadata searching.
2. Description of Related Art
With the development of data warehouse applications, service-oriented architecture (SOA), etc., metadata in an information system, namely, data about the data, is becoming more and more important. Especially in an enterprise scenario, various metadata are stored and managed in different repositories. For example, WebSphere Service Registry and Repository (WSRR) is a place for storing metadata information on services; WebSphere Business Glossary (WBG) manages general glossaries and classification information with respect to enterprises and IT users; and WebSphere DataStage is for developing and storing metadata of ETL jobs. The Metadata administrator and users may be overwhelmed by the large amount of metadata in the enterprise without an effective metadata search method; and it may be difficult to find the important metadata information and the users may create redundant metadata. Therefore, a metadata search engine and a metadata search method are indispensable for successful metadata management.
There are various types of metadata resources, for example, ComplexTypeDefinition in an XSD document, Service descriptions in a WSDL document, BusinessTerm definitions, BusinessCategroy definitions, and the like. Each metadata resource has some attributes, e.g., its label, and an annotation for describing it. Different metadata resources may be associated with each other. For example, a BusinessTerm may classify a Service, a BusinessCategory may contain a BusinessTerm, etc. Such association information may be considered as the structural information of the metadata.
If each metadata resource is considered as a node and the relationships among the metadata resources are considered as edges among the nodes, then the metadata resources may be linked to generate a metadata graph. In this way, the problem of the metadata search lies in finding out the relevant metadata resources in such a metadata graph. Since the number and the types of metadata are usually huge, it is very difficult for the metadata administrator and users to find the desired metadata information in the graph. FIG. 1 schematically shows various metadata resources and their complex relationships in an exemplary enterprise scenario including a design and development phase and a runtime phase.
Several tools for performing metadata search exist in the prior art. The metadata search methods adopted by these tools can be classified into the following two categories. The first category is a search engine based on keywords. The description of metadata resources can be published as HTML pages, and in this way, a current Web search engine based on keywords in the contents of HTML pages can be used to perform the metadata search. FIG. 2 shows an exemplary HTML page for a metadata resource BusinessTerm AddressNumber. IBM OmiFind Yahoo! Edition provides a crawler and a simple search engine for Web sites. The OmniFind can be configured to acquire all the HTML pages for the metadata resources, and then its search engine can be used to search the metadata.
For example, using the keywords “street address” contained in the page of FIG. 2 to perform search, the metadata resource BusinessTerm AddressNumber may be returned. Such a conventional keyword search method does not require the user to know the structure of the metadata, and hence has an advantage of simple operation. However, since it only uses small text segments in the metadata, without using the structural information within a metadata resource and the structural and semantic information among different metadata resources, it is usually insufficient to search out useful or relevant metadata information.
The second category is a search engine based on query. If the user knows the structure of the metadata, it is possible to find the target metadata resources by issuing a structure based query. For example, if the Resource Description Frame (RDF) format is used to represent a metadata graph, then a SPARQL query may be used to obtain the metadata resources. FIG. 3 shows an exemplary metadata graph. With respect to the metadata graph, if a Service that uses the ComplexTypeDefinition D and is classified by the BusinessTerm T is to be found, then the following SPARQL query may be constructed and issued:
Select ?x where {?x implement ?y. ?y interfaceOperation ?z. ?z interfaceMessageReference ?w. ?w use ?v. ?v name D. ?u classify ?x. ?u name T}
Such query-based search method has the disadvantage that the user needs to know and designate an accurate path from one metadata resource to another. Since users of the search engines generally have no clear idea of the structure of the data they intend to find, it is difficult for them to construct such a query.