Real-time data refers to any digital or analog information that should be processed and/or transmitted within a certain amount of time after that data is originally created. The time elapsed from the moment that the data is created until it is processed and/or transmitted is known as latency. The maximum latency allowable for any particular real-time application is application-dependent. Applications where the maximum latency is a strict requirement can be referred to as “hard” real-time applications, while applications where the maximum latency is not a strict requirement can be referred to as “soft” real-time applications. Soft real-time applications need only satisfy an application-dependent, often subjective, measure of “fast enough”. Non-real-time data is data that is not required to satisfy any particular latency requirement.
The term “data” may refer to hard real-time, soft real-time or non-real-time data. “Real-time data” may refer to hard real-time or soft real-time data.
Real-time data is typically generated due to a physical process or a computer program external to the computer system that processed the data. For example, real-time data may include: information from an industrial process control system such as motor status, fluid tank level, valve position, conveyor speed and so on; prices, volumes, etc. for financial instruments such as stocks; user interface events such as an indication that a user has clicked on a button on a computer display; data entry by a human operator; and computer operating system status changes. Virtually any information that is changing over time can be treated as real-time data.
An originator of data may be described as a “data source”. For example, data may originate as a physical process, measured electrically, and converted to a digital representation, or data may originate in a digital representation. Generally, data is made available in a digital computer as a digital representation, following zero or more steps to convert the data into a digital representation. A data source may comprise all of the components and steps necessary to convert the data to a digital form accessible by a computer program.
Analogous to a data source is a “data sink”. A data sink consumes, or uses, data. Some examples of data sinks are: actuators in a process control system; trade processing software in a stock trading system; a user interface application; a database or other data storage system.
Many data sources are also data sinks. Accordingly, a data source may comprise a data source, a data sink, or both simultaneously. For example, when data is transmitted to a data source, the data source may also act as a data sink.
In computer applications, data is commonly managed by a “server”. The server can act as either a data source or a data sink, or both together, allowing “client” applications to interact with the data that the server manages.
Generally, a client application must initiate a connection with a server in order to interact with data. That connection can be “short-lived”, where the connection exists only for the duration of a single or few interactions with the data, or “long-lived”, where the connection persists for many interactions with the data, and possibly for the duration of the client application's lifetime. Long-lived connections are also referred to as “persistent” connections.
Data sources provide data in one or more “data formats” that define the digital representation of the data. The data format may conform to a published standard or be particular to the data source. Similarly, data sinks may require data in a published standard format or in a format particular to the data sink.
Data sources provide access to data through one or more “transmission protocols”. A transmission protocol specifies the mechanism by which data are transferred from a data source to a data sink. A transmission protocol may conform to a published standard or be particular to the data source. A data source may combine data formats and transmission protocols such that not all supported data formats can be transmitted via all supported transmission protocols. Generally, a “protocol” or “data protocol” refers to the combination of a particular data format transmitted via a particular transmission protocol.
A data sink must support at least one data protocol offered by a data source in order to use the data generated by the data source. Since a large number of data protocols exist, it is impractical for all data sources and data sinks to support all data protocols. As a result, client applications that make use of data are usually created only to support the most necessary protocols for their primary purpose. Similarly, data sources generally support only those protocols that are necessary for their primary purpose. So, for example, there is no way to directly connect a web browser that supports the HTTP protocol to a spreadsheet application that supports the DDE protocol.
A protocol conversion step must be performed to convert data from a protocol supported by a data source into a protocol supported by a data sink in order for the data sink to make use of the data offered by the data source. This conversion step can be performed by a “middleware” application. A primary purpose of a middleware application may be to facilitate communication between a data source and a data sink, usually by converting data from one protocol to another such that data sources and data sinks can interact indirectly when they share no protocol in common.
A data source may transfer data to a data sink using at least two methods:                1. On demand: the data source passively waits for a data sink to request some or all of the data available in the data source. When the data sink makes a request for data, the source responds with a result indicating the current state of the requested data. If the data sink needs to be informed of changes to the data, the data sink must repeat the request in order for the data source to respond with the updated data. This repeated request for the same data by the data sink is known as “polling”. A data sink may create either a short-lived connection to the data source for each new request, or a persistent connection over which many repeated requests are transmitted.        2. By subscription: the data sink creates a persistent connection to the data source, and subscribes to some or all of the data available from the data source. The data source transmits any changes to the data via the persistent connection as those changes occur. The data source will continue to send changes to the data until the data sink specifies otherwise or the connection is closed.        
It is understood that data transfer methods such as shared memory, message queues and mailboxes are variations on either the demand or subscription methods. It is also understood that the terms data transfer, data propagation, or data transmission all refer to the movement of data within a system, and these terms may be used interchangeably, as they relate to the specific data transfer method. It is further understood that these methods are independent of the underlying transmission protocol.
Computer applications dealing with real-time data must be reliable, responsive and easily connected to their data sources. This has meant that real-time data processing applications have historically been created as stand-alone applications connected directly or indirectly to the data source. This stand-alone architecture has also allowed the applications to take full advantage of the graphical capabilities of the computer to provide rich dynamic visualization of the real-time data. By contrast, applications based on web browser technology have proven unsuitable in terms of data connectivity and graphical speed. The HTTP protocol is intended as a request-response communication method where each request-response pair requires a web client (typically a web browser) to open a new socket to a web server, perform the communication and then shut down the socket. This paradigm works well for communication that is infrequent and not particularly time-sensitive. The HTTP protocol further limits the types of transactions to data retrieval from the web server or data submission to the web server, but not both in the same transaction. Methodologies such as AJAX that are based on this model are expected to make relatively few transactions and tend to scale to higher speeds very poorly. The computational and networking costs of establishing and closing connections for each transaction act as a limit to the speed of such systems.
Consequently, widespread real-time data processing, as well as display in a web browser, has been unavailable. Some developer efforts have provided access to data driven displays using ActiveX components in a web browser, but these components are generally poorly supported by modern browsers and subject to limitations due to the security risks that they represent.
Efforts have been made to display changing data in a web browser using the built-in Javascript engine of the browser. This is generally achieved using a methodology called AJAX (Asynchronous Javascript and XML), where the web browser polls periodically for new data and then updates its display accordingly. This polling mechanism is highly inefficient, and suitable only for relatively small data sets or for relatively slow-moving data. Lowering the polling rate to conserve CPU or network bandwidth has the effect of raising data latency, which is unacceptable for real-time applications.
Efforts to improve on AJAX, through a mechanism called Streaming AJAX, take advantage of a side-effect of the browser page loading mechanism to cause a browser page to grow incrementally by adding Javascript commands to the page overtime. Each Javascript command executes as it arrives, giving the impression of a continuous data stream. The web browser is effectively fooled into thinking that it is loading a very large web page over a slow network connection. This method has several drawbacks, including the fact that the web browser's memory and CPU usage can grow continuously over time due to the ever-larger page that is being transmitted. Holding an HTTP connection open to collect multiple asynchronous messages from a specially designed web server like this effectively makes the short-lived HTTP connection into a long-lived streaming connection. This allows much faster updates from the server to the client, as new data can be transmitted from the server asynchronously and does not require the client to open and close a connection for each new message. However, it does nothing to speed up the communication from the client to the server. Effectively it creates a fast uni-directional channel from the server to the client, while still retaining the negative performance characteristics of HTTP when communicating from the client to the server.
Both AJAX and streaming AJAX methods suffer from a lack of quality display options within the web browser. Web browsers are generally designed for the display of static pages and web “forms”, and do not offer high-speed or high quality graphic presentation options. Efforts to improve upon graphical display options have tended to be incompatible among web browsers, and generally very slow to execute.
All data transmission solutions based on built-in web browser capability are primarily targeted at receiving data in the web browser. The communication of data is uni-directional, in that the connection that receives data from a server cannot also be used to transmit data to the server. If the web browser needs to transmit data back to the server, it must do so by opening a new connection, transmitting an HTTP request, and then closing the connection again. Consequently, solutions such as Streaming AJAX are very slow to transmit data back to the data server, because of the large overheads and latencies incurred by having to emit a new HTTP request for every data transmission.
Some efforts at web-based data visualization attempt to improve the user experience by presenting slow-moving (high latency) data as if it were faster. This is achieved by displaying interpolated data in the web browser at higher frequency than the data is actually arriving. For example, a circular gauge representing a speedometer might receive the values 1 and 100, separated in time by 5 seconds. The web page could then draw the gauge dial 5 times per second, changing the value by 4 each time. This would give the viewer an impression of a smoothly changing speed, even though the underlying data delivery contains no such information. That is, such a display of interpolated data can be entirely misleading to the viewer. This kind of interpolation obscures the true behaviour of the underlying data, and is usually unacceptable in real-time applications such as process control and stock-market trading.
Rich Internet Application (“RIA”) frameworks such as Adobe Flash™ and Microsoft Silverlight™ offer improved platforms for both data processing and graphical display within a web browser. These RIA frameworks also support direct TCP/IP communications within the RIA. Surprisingly, the combination of these features makes it possible to process and display real-time information in a web browser. This processing and display capability has not been translated into real-time data systems due to a perception in the software industry that RIAs are suited primarily to video, advertising and games.
A common alternative to HTTP is to provide a secondary communication socket for high-speed data alongside the HTTP communication channel. Effectively, the web client communicates via HTTP for the presentation information, and via a separate dedicated socket for high-speed bi-directional data communication. This solves the speed issue, but introduces other issues:                A separate communication socket requires a separate TCP port to be open on the server. This means another opening in the corporate firewall, which IT departments commonly resist.        Rich Internet Application (RIA) frameworks, such as Flash or Silverlight, commonly implement limits on socket communication that require yet another well-known port to be open to act as an access policy server. This introduces a further opening in the corporate firewall, further limiting the usefulness of the technique.        An RIA framework operating within a browser (e.g., Silverlight) may not implement its own SSL layer, relying instead on the browser's HTTPS implementation for encryption. In such a case, a dedicated socket implemented by an RIA will not be secure.        Dedicated sockets will not pass through web proxies.        
The advent of high-speed or real-time data processing over the Internet has created a need for long-lived high-speed socket communication. This need has driven the RIA implementers to offer such sockets, but with the limitations described above. There remains an unmet need for long-lived bi-directional socket communication over HTTP or, more preferably, HTTPS to a web server.
The HTML5 specification includes a draft specification called WebSockets. This intends to provide two-way communication between a client and server using a HTTP-mediated socket. Although WebSockets are not universally supported at this time, they provide the possibility of creating bi-directional connections through forward and reverse web proxies. The current invention enables real-time data connectivity through WebSockets, providing successful connectivity even in instances where the data source or end user are isolated from the Internet via proxy servers and are unable to make a connection via an arbitrary TCP/IP port. This significantly broadens the set of network topologies on which the current invention may be usefully implemented while allowing an additional potential level of security on the client networks.
The present invention is suitable to augment industrial Supervisory Control And Data Acquisition (“SCADA”) systems. SCADA systems comprise data collection hardware such as sensors and other devices, communication networks, central processing systems, and display units to allow plant operators and engineers to view the data in their industrial processes in real time. SCADA systems often comprise interfaces that supports a supervisory level of coordination and control, such as uploading new recipes to a candy-making machine, changing global settings on a wind turbine, or acknowledging a high pressure alarm for a boiler.
SCADA systems have evolved over time. The first generation systems were “monolithic”, running on individual computers, connecting to field devices directly. The second generation allowed “distributed” processing, using multiple computers communicating with each other over a local-area network (“LAN”) and communicating with the field devices over proprietary control networks. The current, “networked”, generation uses personal computers and open standards such as TCP/IP and open protocols for local-area networking. Thus it is now possible to access SCADA systems and data from the Internet, although there are fundamental questions about security that are limiting the broad adoption of such capabilities.
Networked SCADA systems are designed using a client/server model. A server (device or software application) contains a collection of data items. These data items are made available to a client (device or software application) upon request by the client. The implicit assumption is that the server is the authoritative source of the data values, and has a-priori knowledge of which data values it will supply. The client is non-authoritative, and determines which data items it may use by querying the server. For clarity, the authoritative source of data has the responsibility to determine which data items it will contain and make available to its clients, and the data values held in the authoritative source are presumed to be correct and current. The client cannot determine which data items exist, and may only affect the values and/or properties of the data items defined within the server.
Importantly, the server is simultaneously the authoritative data source and also a listener for incoming connections from the client. In a networked system, this means that any client that uses the data must be able to initiate a connection to the server. In a SCADA system, this would mean, for example, that an operator workstation (acting as a client) must be able to make a connection to the SCADA server. This in turn requires that the SCADA server be reachable via the network from the client's location. In the case of an Internet-based or cloud-based system, this means that the SCADA server must be reachable from the Internet, posing an unacceptable security risk. For clarity, the terms “cloud” and “Internet” may be used interchangeably throughout this disclosure.
When the topic of cloud computing is raised among process control engineers, there are many justifiable concerns about security. SCADA and other manufacturing and control systems often support high-value production lines, where any interference or foul play could cost thousands or millions of dollars. Although recently some shop floors have begun to make their process data available to the rest of the company on corporate LANs, there is strong resistance to opening ports in plant firewalls to allow incoming connections from the Internet.
On the other hand, cloud systems generally require Internet access, typically using a web browser HMI (“Human Machine Interface”) or RIA or other kind of client to connect to a server on the process side. Until the present invention, this meant that a port had to be opened in the factory firewall to allow the web browser to connect. And this is a security risk that few plant engineers are willing to take. The primary source of security exploits is firewalls permitting inbound connections. Unless these are removed, the plant is exposed to attack.
Due to the mission-critical nature of SCADA systems, engineers and managers responsible for industrial processes are reluctant to expose them directly to the Internet, running behind secure firewalls to keep intruders and hackers at bay. Compounding the problem is that the architecture of most installed industrial systems was not developed with the Internet in mind.
Spreadsheet applications are computer software applications commonly used to analyze and generate information. A spreadsheet application (“SSAPP”) presents data as a multi-dimensional table of data, where each data item is presented as a cell within that table. Each cell can contain a value or a formula that references other cells to produce a computation from their values. A cell may also contain formatting that determines how that cell is displayed to a user, where the formatting could also be a formula that computes the formatting based on the result of a computation. Examples of a spreadsheet application include Microsoft Excel™ (Microsoft Corp.), Google Sheets™ (Google Inc.) and Open Office™ Calc (Apache Software Foundation). SSAPPs are typically used in scenarios where decision-making and analytical logic is encoded in the cells and then presented to a user. Users can then modify certain cell values to have others recalculate, thereby updating the content of the spreadsheet to reflect the newly entered information. SSAPPs are frequently used to produce dashboards of non-real-time information. In some environments, SSAPPs are used to collect real-time information to be used as part of the cell computations (e.g., stock trading applications).
Spreadsheet applications commonly provide mechanisms to share information among spreadsheets and other applications running on a single computer. For example, Excel has two inter-process communication mechanisms: DDE (Dynamic Data Exchange) and RTD (Real-Time Data). These communication mechanisms provide a means to analyse data originating from real-time data sources such as SCADA systems. However, these communication mechanisms have limitation, where, for example, DDE is a simple data exchange protocols based on Windows messages. This allows Excel to share information in real time with other applications using a simple (tag, value) representation. DDE is a client/server architecture where the server transmits data to the client based on subscriptions configured by the client. In the context of data from SCADA and/or networked real-time systems, this architecture suffers from a number of limitations, both by design and, in particular, by implementation in Excel, for example:                1. DDE is not a networked protocol. Excel and the communicating application must run on the same computer.        2. DDE does not transmit time stamp, data quality or data type information. This limits the data's usefulness when interacting with time sensitive data, with data whose type is not known a-priori and data where quality information is important.        3. DDE bindings in Excel consume the formula of the cell(s) into which the DDE data is bound. This exposes the DDE binding to accidental deletion. In addition, it makes it impossible to bind the same cell both to receive data and to transmit it. The act of modifying the cell value deletes the DDE binding.        4. Since a DDE binding consumes the cell formula, any Excel cell can be a participant in at most one DDE binding. That is, a cell cannot be updated from multiple sources.        5. When Excel is acting as a DDE server, modifying any DDE-bound cell in an Excel spreadsheet causes all DDE bindings on the sheet to re-transmit their values even if they have not changed. This may not be important when communicating with applications on the same computer, but it would generate unacceptable network bandwidth utilization if DDE were extended to a network.        6. DDE values are transmitted by subscription, meaning that for Excel to emit data to another application that application must have subscribed to a cell or cells in Excel. When the cell content changes (a data change event), Excel emits the new cell value. This means that configuration for the data communication must happen independently in two locations: in Excel for incoming data, and in the receiving application for outgoing data. Although Excel provides a means via scripting to “push” data via DDE to an application, this is challenging for a user to configure and produces “blocking” behaviour in Excel, which is highly undesirable.        7. There is no mechanism in Excel's implementation of DDE to reconnect to an application if the connection is lost. This produces a system that is not robust. The start-up order of Excel and the other application is crucial, resulting in a fragile system.        8. There is no mechanism in Excel to re-try a subscription for a DDE binding that did not exist at the time that the binding was originally attempted. When Excel starts it may attempt to subscribe to DDE items from another application. If the other application is not yet configured to satisfy that request, Excel will never re-try the request. The result is that some, but not all, of the DDE bindings in the Excel spreadsheet may be “dead”.        9. DDE contains no data inspection mechanism. A DDE client cannot determine which tags are available in the DDE server.        10. DDE can present an attack vector for malware, so there are risks associated with leaving DDE features enabled. Disabling DDE features can mitigate the risk, but doing so can break functionality and prevent data from updating.        
More recently, Excel has been modified to support RTD, a communication protocol based on COM (Component Object Model). Networking is supported in RTD using DCOM (Distributed COM). RTD is a client/server architecture where the client makes “read” calls to the server when it wants new data. Data is not transmitted by subscription. It uses a (tag, value) representation of the data. Although RTD provides some advantages over DDE, it still suffers from several limitations, particularly as it is implemented in Excel, for example:                1. DCOM networking is difficult to configure. This commonly results in insecure network configuration in an effort to make the connection succeed.        2. DCOM networking does not support network proxies.        3. DCOM networking is a blocking protocol. If the application to which Excel is connected is slow to respond, Excel will freeze waiting for a response. If the network communication is slow, Excel will similarly freeze.        4. RTD does not support cell ranges. Only individual values can be transmitted via RTD. RTD can transmit more than one value in a single message, but this does not substitute for cell ranges.        5. RTD is uni-directional. Excel cannot transmit data via RTD. In order to retrieve data from Excel, another application must use an alternate protocol, namely DDE, to subscribe to that data. This means that although RTD can operate on a network it cannot transmit data bi-directionally.        6. RTD provides a notification mechanism to let Excel know that data changes have occurred. Excel must then re-read all data values that it is interested in, to determine which specific data values have changed. This may be practical for communication within the same computer or on a LAN, but it is impractical on limited bandwidth connections or on connections where bandwidth usage carries a significant cost.        7. RTD bindings in Excel consume the formula of the cell(s) into which the RTD data is bound. This exposes the RTD binding to accidental deletion. In addition, it makes it impossible to bind the same cell both to receive data and to transmit it. The act of modifying the cell value deletes the RTD binding.        8. Since an RTD binding consumes the cell formula, any Excel cell can be a participant in at most one DDE binding. That is, a cell cannot be updated from multiple sources.        9. RTD does not transmit time stamp or data quality. This limits the data's usefulness when interacting with time sensitive data and data where quality information is important.        10. RTD contains no data inspection mechanism. An RTD client cannot determine which data tags are available in the RTD server.        
Accordingly, there is a need for an improved network communication means that overcomes the limitation of both DDE and RTD, so that real-time data can be exchanged using a spreadsheet application, such as Excel, bidirectionally and in a more robust manner.
There are some existing attempts at sharing Excel data via a cloud service. They generally consist of an add-in to Excel that implements a web service interface such that information from the spreadsheet can be periodically published to an external server and polled from that server based on user interaction or a timer (e.g., iPushPull—https://www.ipushpull.com). This type of application fails in the same ways as an AJAX application—they demand trade-offs between latency, volume and server resources. Such a system is inappropriate for high-volume low-latency applications like control systems and financial trading and analysis. These applications are further limited by a dependence upon the cloud service for their operation.
There are other applications that attempt to mimic Excel with a web-based application that allow users to collaborate, with each user viewing a copy of the same spreadsheet in a browser window. The browsers transmit and receive updates to the sheet via polling using AJAX (e.g., Google Sheets—https://docs.google.com/spreadsheets). Although these applications include web-based interfaces and cloud storage, they rely on polling technology that is inappropriate for high-volume low-latency applications. In addition, these applications require a user to export the document to another format in order to see the data in a different SSAPP, such as Excel, thereby breaking the real-time linkage to the exported spreadsheet.
There are applications that attempt to bridge the gap between web-based spreadsheets (like Google Sheets) and desktop spreadsheet applications (like Excel) by providing automated file format translation through a shared network storage location (e.g., Syncplicity). These applications simply automate an import and export step that would otherwise be performed manually. They do not address real-time data sharing.
None of the existing technologies provides a mechanism for high-speed, low-latency bidirectional communication between Excel and a cloud service, between Excel and an industrial control system, nor between two or more Excel worksheets.