1. Field of the Invention
The present invention relates generally to the analysis and transformation of information in data processing environments and, more particularly, to systems and methodologies for analyzing, filtering, enhancing and processing streams of data, including live time-ordered data streams, stored time-ordered data streams, and unordered tables of data.
2. Description of the Background Art
Stream processing is concerned with filtering, enriching and enhancing continuous data streams. In order to detect opportunities and threats as early as possible, stream processing systems often need to analyze complex, fast moving, heterogeneous streams of data in real-time. In many cases, they also need to be able to rapidly process historical streams. For example, the ability to rapidly process and analyze historical streams of data is useful in refining trading strategies in the financial services sector. Stream processing software and systems need to run continuously, providing analytics, statistics, filtering, inference, deduction, connection, pattern matching, tracking and tracing.
Stream processing is complementary to databases, data warehousing, data mining, and search engines. The emphasis in stream processing is on continuous time-based information, and on continuous time-critical analysis. In stream processing, one is often looking to pinpoint a rare and important opportunity or threat, without drowning in the relentless flow of data that is typically received by most users and organizations.
Processing of real-time and historical data streams is a critical component of information technology solutions in many application areas, including the following:
Applications in which continuous live data streams from people, sensors, systems and networks are automatically monitored, filtered, analyzed, enhanced and enriched in real-time.
Systems where continuous analytics on massive volumes of real-time data enable businesses and financial services organizations to intelligently discover and immediately respond to opportunities and threats, manage risk, ensure compliance, and deliver the best possible personalized customer experience at all times.
Solutions wherein continuous inference on semantic graphs allows intelligent real-time discovery of deep and important connections within live data streams.
Applications involving continuous real-time analysis of streams of data from networked wireless sensors, such as those that will enable a new intelligent, secure, optimized and highly energy-efficient infrastructure—a new generation of smart buildings, homes, factories, utilities, energy networks, IT systems, data centers, networks and other equipment, each with always-on energy saving, intrusion prevention, and predictive maintenance capabilities.
Solutions utilizing continuous real-time intelligent tracking and tracing (GPS, RFID) enabling powerful new location-based services to be launched, theft and counterfeiting to be reduced, and transportation and distribution to be optimized.
The following is a list of specific application areas that may involve stream processing, although it is by no means a complete list:
Business (Dynamic Pricing, Mobile Advertising, Customer Experience Management, Supply Chain, Logistics, Marketing Intelligence, Personalized Advertising, Risk Management, Compliance, Counterfeit Prevention).
Web and Telecommunications (Marketplaces, Online Games, Social Networks, Personalized Newsfeeds, Semantic Web, Virtual Worlds, Location-Based Services, Fraud Prevention).
Government (Homeland Security, Intelligence, Defense, Compliance).
Financial Services, Banking and Insurance (Algorithmic Trading, Risk Management, Compliance Tracking, Live Oversight, Fraud Prevention, News Services).
Infrastructure (Datacenter Monitoring, Network Monitoring, Telecommunications, Energy Grids, Traffic Management, Transportation and Distribution).
Healthcare (Patient Data, Pharmaceuticals, Patient Monitoring).
Machine-to-Machine Computing (Sensors, Smart Buildings, Remote Monitoring, Predictive Maintenance, Intrusion Prevention, Location Tracking, RFID, Process Control and Optimization, System and Network Monitoring, Environmental Monitoring).
High Performance Computing (Research, Supercomputing, Experimental Scientific Data, Bioinformatics, Modeling, Simulation).
Spreadsheets such as Microsoft Office Excel offer a powerful and widely used tool that helps individuals and organizations analyze information to make more informed decisions. Using tools such as Excel, users can share and manage analysis tasks and insights gleaned from analysis with coworkers, customers, and partners worldwide. This makes spreadsheets an important productivity tool that offer a highly complementary capability to that provided by stream processing systems. Existing spreadsheet applications, however, are not integrated with stream processing systems.
IBM's System S Research Project is developing a prototype aimed at providing the “middleware” required to coordinate a wide range of distributed stream processing applications. The System S research project aims to produce a stream processing framework that is general-purpose. System S assumes that there are many user-developed stream processing components in use across the Internet, and the main goal of the System S Stream Processing Core is to provide middleware coordination software that can tie these numerous components together in useful ways. However, System S does not provide a solution that can be used to build a scalable stream processing architecture. In a recent paper (IBM InfoSphere Streams: Based on the IBM Research System S Stream Computing System, IBM Corporation, Mar. 2009), IBM outlines how the SPADE toolkit of System S can be used to build workflow assemblies without needing to understand the lower-level stream-specific operations. This middleware toolkit can be used through an Eclipse-based Workflow Development Tool Environment that includes an Integrated Development Environment (IDE) of the kind familiar to professional software developers, but which would be quite unfamiliar to spreadsheet developers. System S does not provide any kind of system or methodology for data processing that combines stream processing and spreadsheet computation, nor does it provide any means of developing stream processing systems, or controlling stream processing systems, from within a spreadsheet. System S also does not provide a system or methodology for data processing wherein the stream processing systems are cloud computing services, and where the stream processing can be carried out across one or more distinct cloud computing architectures, with all design, control and coordination of the multiple stream processing computations from within a spreadsheet.
What is needed is a solution that combines the functionality of a spreadsheet with stream processing systems in a seamless fashion. Such a solution is desirable in that it would provide users with an easy to use, yet powerful tool for processing data streams. Ideally, such a solution would provide connectivity to one or more stream processing systems spread across local and/or cloud-based architectures so as to allow for processing large volumes of data in parallel. The present invention provides a solution for these and other needs.