1. Field of the Invention
This invention relates to the field of data management and data processing and describes a method of integrating application or service components and sources and consumers of data records, normally over data or computer networks. The sources of data might be software applications, hardware devices, sensors, or data streaming into files or databases, or transaction records or transaction logs. The data might relate to a wide range of industries such as stock market data in financial services, service health, status or usage records in telecommunications, or plant operating status in industrial automation to name just a few. In particular, the invention relates to ways and means of managing, viewing and processing streams of data records flowing between the elements making up information processing systems, and applications thereof. The concept of a relational data stream management system (RDSMS) is described and the invention is a specific method for performing data processing using a specific distributed RDSMS approach.
2. Description of the Related Art
With the advent of the Internet, there are many new ways for designers of computer information systems to connect, integrate and manage the components of the information systems and computer applications.
There are a number of university research projects (see web link http://www-db.stanford.edu/sdt/) which are, in the main, focusing on extending databases to allow for stream processing, treating RDBMS (relational database management systems) relations as infinite tables.
The work so far published is focused on either extending relational databases (or other databases) to add streams capabilities (such as the STREAM project at Stanford University http://www-db.stanford.edu/stream/ which is not distributed and does not support the manageability or plug-in capabilities described below), or to devise ways of improving query performance and scheduling and the theoretical resource management challenges (predicting how much processing can be performed within given memory and other computing resources). There are also some papers looking (from a mainly theoretical perspective) at a few monitoring-style applications.
This invention differs from the existing published work in a number of important ways. First, the focus here is on an invention, method or means for managing a distributed collection of relational stream processing nodes that work together as a single system to create a complete Distributed Data Stream Management System (“DDSMS”). This DDSMS operates as a single, manageable, extensible infrastructure, processing and managing multiple streams of records along with multiple views of those record streams including their routing across the network of stream processor nodes. It differs from other systems described by providing a novel combination of facilities, including an SQL interface (SQL is supported by most relational database systems today), and operating as a single system managed and configured from a central configuration server where the single system itself comprises a dynamically extensible set of interoperating stream processing nodes each of which supports a plug-in capability for dynamically extending the capabilities of the stream processing engines. Each node has not only input and output interfaces to support streams, but also has a control and a configuration interface to support dynamic external control and management of nodes, and to allow nodes to control the behavior of one another and interoperate with one another, with the goal of behaving and appearing like a seamlessly integrated single complete system. The system manages multiple sources and destinations for the streams, and covers specific business and technical applications thereof. Rather than concentrating on the design, method or means for a specific relational stream processing node, this invention focuses on how to design a whole DDSMS comprising a set of such or similar nodes with specific capabilities that are configured and work together as seamless complete system. In comparison with systems such as Aurora (see web references link earlier), this approach differs in its treatment of the distributed nodes as a seamless single system with a central configuration and management service, its support for a plug-in extensibility to allow specialization of the system for specific application domains, and its inclusion of control and configuration interfaces for each processing node.
Finally, the invention includes a short list of applications of this DDSMS which offer novel solutions to existing problems, and offer compelling business value and clear advantages over existing approaches and solutions.