A trend in the information, communication, and automation industries is for increasingly distributed solutions. Recent examples of this trend are the proposal for networked sensors and the suggestion that large groups of such data sources could form large distributed information systems called xe2x80x9cnetworks of data sources.xe2x80x9d In the article xe2x80x9cNext Century Challenges: Mobile Networking for Smart Dustxe2x80x9d (published in MobiComm 1999), authors Kahn et al. discuss an example of a distributed network of data sources in the form of a network of sensors.
The primary idea of a network of data sources is that individual data sources, or perhaps small groups of data sources, would be connected to computer networks, using standard communications protocols, such as the Internet Protocol (IP). Other devices on the network would then be able to access the data provided by the data sources, either individually or in aggregate depending on the application. In the most ambitious proposals, wireless networks of data sources define their topologies dynamically as they are deployed, and continuously redefine their links and routing schemes to account for new and failing nodes and optimal power management. Rudimentary forms of networks of data sources are already being used in some industrial process control systems, and future applications for networks of data sources are widely predicted in many domains.
One difficulty in the current art regarding networks of data sources is how to manage data from the data sources. This can be contrasted with the current art in the world of information technology (IT), in which data management techniques are rich. In the IT world, techniques such as data aggregation, type-based data access, query optimization, transaction management, data filtering, data mining, and many others are well established tools. Thus, for data that is resident in memory, on disk, or in well-structured distributed databases, data management is a well-understood art. However, networks of data sources present difficulties for data management, as the data sent by a potential multitude of data sources can be overwhelming to the network or data management system used. Moreover, it is likely that data sources in the network continually provide information that will change or be updated frequently.
Prototype networks of data sources use protocols from the computer network world, based on TCP and UDP, which have been developed for communication rather than data management. Industrial systems use different but similar protocols such as Modbus. These protocols offer mechanisms to deliver messages in a point-to-point manner or to broadcast messages to all members of a network. However, these protocols often can be inadequate and have limited value in providing efficient data management in the network of data sources world. Accordingly, these types of systems developed for communication are problematic in that more complicated, extensive computer programs and systems used for data management in networks of data sources must often be created from scratch by programmers and system managers in order to implement more sophisticated data management techniques that have been available in IT technologies but by-and-large have not been available for networks of data sources.
One approach to limited data management, such as described for the field of industrial process control in U.S. Pat. No. 5,301,339 issued to Boasson, is to provide a system for communication among subsystems in which requests are fulfilled based on the type of the data. In such systems, although data of a certain type may be requested, there is no mechanism for constraining the desired values of the requested data. Such a mechanism is critical to implement traditional database queries such as, for example, asking for all temperature sensors reading temperatures higher than 100 degrees F. The system described by Boasson lacks any database view of the data to the data processing subsystems, any relational or object schema describing the data, and any facility to request data in traditional query language. Thus, the system of Boasson provides only limited data management capabilities far below the capabilities of traditional databases.
Another approach to data management, such as the real-time communication system for industrial processes that is described in U.S. Pat. No. 5,093,782 issued to Muraski et al., is one in which applications view remote data sources as parts of a database.
In these types of systems, subsystems that access and use remote data sources, such as controllers, maintain a database of these remote values. The systems are primarily concerned with updating these databases. One drawback of these schemes is that each controller must maintain its own database, which must have predefined references to each data source. For each subsystem that needs to access data, the address of all relevant data sources must be hard-coded. The database can also be a centralized failure point. A further drawback of these systems is that data must be continuously polled from data sources to maintain the correctness of the local database. This continuous polling from various data sources puts a heavy traffic burden on the network.
Although some potentially useful techniques to add richer data management tools to networks of data sources can be found in the distributed database community, the conventional work in distributed databases has been inadequate or incomplete for use with networks of data sources.
Some of this distributed database work has been designed to allow the structured contents of databases to spread over a computer network, and then be accessed as a single database. Unfortunately, the bulk of this work, such that described in the textbook xe2x80x9cPrinciples of Distributed Database Systems,xe2x80x9d (M. Ozsu, P. Valduriez, Prentice-Hall, Inc., Upper Saddle River, N.J., 1999), incorporated herein by reference, has concentrated on the problems associated with dividing up a known database (called database fragmentation) in such a way as to optimize later accesses to this data. These techniques are not particular useful in the network of data sources world, because each data source provides only the data that it produces and thus there is no possibility to choose how the data is to be fragmented.
Another distributed database approach, such as described in U.S. Pat. No. 4,635,189 issued to Kendall and in U.S. Pat. No. 5,179,660 issued to Devany et al., provides mechanisms for dividing traditional database queries among distributed databases to which the data has not been carefully fragmented ahead of time. However, these types of systems are not suitable for networks of data sources for several reasons. First, because this approach pertains to more traditional distributed databases, it is simply assumed that the network location of any particular piece of data is known ahead of time. Such an assumption does not hold in many dynamic applications of networks of data sources, in which the data sources may be added at any time. For example, consider a network of data sources, formed by multiple automobiles each equipped with a data source and a wireless link, that is used for traffic monitoring. If a new automobile having a data source enters the highway and thus joins the network, the data that it provides desirably should be included in the query results, but the conventional systems of Kendall and Devany et al. do not provide a mechanism to dynamically account in the query results for data inputs from data sources added to the network of data sources. Second, the systems of Kendall and Devany et al. assume that significant processing capabilities are resident at each network node. This assumption does not hold for many networks of data sources, where a data source, analog-digital converter and network interface may constitute the entire network node. A similar approach is taken by Bonnet et al. (see P. Bonnet, J. Gehrke, T. Mayr, P. Seshadri, xe2x80x9cQuery Processing in a Device Database System,xe2x80x9d Cornell Technical Report TR99-1775, October 1999), who attempt to address networks of data sources specifically, but whose approach has the same limitations.
Accordingly, it is seen that a system and methods for providing more efficient, sophisticated query capabilities or techniques are desirable for useful data management of networks of data sources. Further, it is desirable to have a more economic way of providing a network of data sources system with data management that does requires neither each network node to have high processing capability nor the implementation of complicated programming in order to achieve data management techniques found in traditional database query languages. It is also desirable that such a system for networks of data sources be capable of richer data queries and communication without placing a heavy traffic burden on the network.
The above discussed problems and disadvantages are overcome by the present invention. The present invention allows traditional information technology data management techniques to be applied directly within networks of data sources. More specifically, the present invention allows a program, running on a device logically connected to a network which also logically connects the networked data sources, to issue a traditional database query onto the network and to receive back from the network the result of that query as it applies to the data produced by those data sources.
According to a specific embodiment, the invention provides a method for information management of a distributed data sources network database that includes multiple nodes. The nodes include a querying node and multiple data sources. The method includes steps of providing a schema for the distributed data sources network database, entering a query in a database language at the querying node in the network, decomposing the query into at least one network message, and transmitting the network message only to data sources relevant to the query. The method further includes steps of receiving the network message at the data sources relevant to the query, sending a reply message to the network message when the query is met, and providing a query result in the database language at the querying node from the reply message.
According to another specific embodiment, the present invention provides a system for information management of a distributed data sources network database. The system includes a network, multiple data sources coupled to the network, and at least one querying node coupled to the network. The data sources are capable of providing information according to a schema for the distributed data sources network database. The lo querying node is capable of receiving a query in a database language and decomposing the query into at least one network message that is transmitted over the network only to data sources relevant to the query. The data sources relevant to the query send a reply message over the network in response to the network message when the query is met, and the querying node provides a query result in the database language from the reply message.
These and other specific embodiments of the present invention and the features and advantages of the invention will be described in more detail below with reference to the accompanying drawings.