1. Field of the Invention
Embodiments of the present invention generally relate to maintaining distributed knowledge bases. More specifically, embodiments of the present invention relate to a system and method for maintaining distributed knowledge bases, topologically distributed as overlay trees, by use of REST web services.
2. Description of the Related Art
There is a huge amount data in enterprises that can be harvested into knowledge bases using well-defined semantic web technologies including Resource Description Framework (“RDF”) and ontologies. In most cases, these knowledge bases will be managed by different organizations without any central coordination. These knowledge bases will also be highly dynamic as each organization changes them frequently and independently. To address these issues, there is a need for a method to transform the dynamic and complex knowledge bases into reusable knowledge services that can be easily integrated with other enterprise applications.
As the Web is becoming a communication and collaboration platform, there is an acute need for an infrastructure to disseminate real-time events over the Web. However, such infrastructure is still seriously lacking certain capabilities, because conventional distributed event-based systems are not designed for the Web.
An architectural style that underlies the Web is REpresentational State Transfer (“REST”). A web service that is compatible with REST is said to be “RESTful.” Event-based distributed systems using REST services have been studied. Recursive REST service composition frameworks have been developed for supporting semantic based proactive search in enterprise. But those systems and frameworks do not provide Virtual Knowledge Bases (“VKBs”) for clients to select or synchronize updates to knowledge bases.
Systems to store large RDF triple stores in networked computers have been studied. However, these systems are designed to support SPARQL queries, not knowledge virtualization and synchronization for REST services.
As used herein, the term “MapReduce” refers to a scalable relevance search algorithm as known to persons of skill in the art. In particular, one implementation of MapReduce may be described at least in part by U.S. Pat. No. 7,650,331 to Dean et al., the entire content of which is hereby incorporated by reference in its entirety. As used herein, the term “Hadoop” refers to an open-source version of MapReduce, as known to persons of skill in the art. Hadoop is a software system that supports data-intensive distributed applications. Hadoop enables applications to work with thousands of nodes and petabytes of data. However, MapReduce is a programming model, not a REST service framework. Also, MapReduce does not provide knowledge virtualization or synchronization for the map and reduce functions because the data used by MapReduce is not controlled by it.
Message-Passing Interface (“MPI”) is a message-passing library interface specification. MPI may provide portability and ease of use in a distributed memory communication environment in which the higher level routines and/or abstractions are built upon lower level message-passing routines. Scalability may be enhanced by providing vendors with a clearly defined base set of routines that they can implement efficiently, or provide hardware support for. Interior nodes in a two-way overlay tree can be regarded as massively parallel computers, and MPI can be used for communication and synchronization between nodes. However, MPI does not follow REST architectural style, therefore at least some of the benefits of REST are not available.
The VKB can be managed by parallel and distributed database systems. However, such an approach introduces database coupling into a distributed system that is not based on REST services. Database coupling refers to servers used by a distributed database system, in particular to an additional layer of communication between the servers, used by the distributed database system.
Some Peer to Peer (“P2P”) networks allow a node to join or leave the network at random. Some systems use structured P2P technologies to support distributed databases and RDF stores. However, these systems are based on Distributed Hash Table (“DHT”) techniques that partition the nodes and data into the same key space, such that the partition of data and the topology of the nodes is not completely independent. Furthermore, the P2P protocols are not based on REST.
Thus, there is a need for a system and method to maintain distributed databases, for example, in enterprise systems and networks, using REST web services.