1. Field
This application relates to data searching and, more particularly, to a method and apparatus for document matching.
2. Description of the Related Art
Data communication networks may include various network elements, such as routers and switches, configured to facilitate the flow of data through the network. Networks may also include other network elements, such as computers and printers, configured to receive and transmit data over the network. Network elements may have different levels of intelligence, depending on the particular manner in which the network element is intended to operate on the network. For example, a network element may be relatively intelligent and configured to execute software to enable particular applications to be run on the network element, so that the network element can provide services on the network. Alternatively, the network element may be provided with less intelligence and configured to perform a particular service on the network. An example of a less intelligent network element may be a printer connected to the network and configured to provide print services on the network. Optionally, different network elements may work together to collectively provide services on the network.
As networks have developed, it has become possible to provide a greater variety of services on the networks. Network services are a class of services that are published, discovered, and executed, as well as operated and managed, all through the network. The services can be implemented as one or more network elements (for example, a printer to provide a printing service), as software running on the network elements (for example, a hotel reservation service), or as a combination of the two (using above examples, this may be a service to reserve a hotel and a service to print out the confirmation of the reservation).
To enable a service provider to provide network services, the service provider needs to have a way to operate and manage the services, and to have a way for consumers to discover and execute these services. Conventionally, a matching system has been used to match service offerings with customer requests. For example, a service provider may describe the available services and store the descriptions of service offerings in a database. At a later time, when a customer would like to obtain services or the service provider would like to change the service offerings, a request may be created and compared against service offerings in the database to locate the available service offerings. One common way to describe network services is to use a markup language such as XML (eXtensible Markup Language). An XML document may be used to represent network services, applications, and network elements. One reason for the increasing use of XML is because XML provides a flexible manner to describe the services, yet is able to maintain a hierarchical structure. If the service offerings have been described using XML, when a user would like to obtain network services, the user will need to generate an XML document (request) describing the desired services. The request will then be matched against available service offerings by comparing the XML request document against XML documents representing the available services which are stored in a database system. If a match is found, the service may be provided to the user.
Documents created using a markup language such as XML are defined not only by their content, but also by the way in which the document is configured. For example, the document may contain particular relationships between data elements. Thus, to find a matching document, the matching system must look not only for documents that have the same content, but also for documents that have the same structure. Stated another way, two XML documents may be considered to match each other only if they have the same data and the same structural relationships between the pieces of data contained in the document.
There are several technologies that have been developed to perform service matching, mainly in the area of service discovery. Several examples of such technologies include UPnP (Universal Plug and Play), SLP (Service Location Protocol), Konark, Jini, Salutation, Bluetooth SDP (Service Discovery Protocol), and UDDI (Universal Description, Discovery and Integration) that is defined in the service-oriented architecture (SOA). A summary of the matching techniques used in these technologies is shown in Table 1.
TABLE 1Using XML toKeyword or attribute basedTechnologyTargetdescribe servicematching approachesUPnPNetwork devicesUsing XML toUsing SDP, matching is based on 4describe deviceattributesfeatures andService type URI, unique servicecapabilitiesname (USN) URIExpiration and locationKonarkWireless ad hocUsing XML toMatching is based on somedevices andenable services toattributes included in two messagessoftware servicesexplain theirService discovery messagecharacteristicsPath or keyword, PortAdvertisement messageService name, Path, Type,URL and TTLJiniNetwork devicesNot using XMLLookup is based onRequires deviceService IDto run Java orTypeexecute JVMAttributesBluetoothSpecific to onlyNot using XMLSearching bySDPBluetooth devicesService classAttributesBrowsingSLPSolely for IP-Not using XMLString-based querying for servicebased networkattributesQuery operator (AND/OR) is morepowerful than Jini and UPnP whichcan be done only against equalitySalutationNetwork devicesNot using XMLCapability exchangesSimilar to Jini lookupUDDISOA for webWSDL (usingKeyword-based searching, VersionserviceXML to describe3 extended to support single-stepweb services)complex queries and wildcardqueriesAs shown in Table 1, existing matching approaches generally do not handle XML, and those that do are generally based on key words or attributes rather than a document-based approach. Further, the services that are able to be described using these technologies are generally related to network devices rather than complex network services. Although these other technologies exist and some of them use XML to describe services, none of them uses document-based matching approach. Accordingly, it would be advantageous to provide a method and apparatus for document matching.