This invention relates generally to systems having standby processors and in particular to computerized systems that have one or more standby processors for greater reliability in the event of a failure, and is more particularly directed toward computerized systems with standby processors that routinely update data relating to specific time intervals and have a need to preserve this data onto a redundant processor in the event that the currently active processor fails.
Some computerized systems, such as telecommunications systems, are required to provide high reliability service. Service reliability can be improved by having redundant processors in which one or more active processors are backed up by a standby (spare) processor.
Typically, any of the processors can serve in an active or standby role at any given time, but there is at least one on standby. If an active processor fails, or is deliberately removed from service (as, for example, when the circuit pack containing the active processor is pulled from the frame, interrupting its electrical connections to the remainder of the system), a standby processor immediately takes over and becomes an active processor. In duplexed systems, there is exactly one spare for each active processor. Exactly one of the pair of processors can be active at a time while the other acts as a spare.
Even with spare processors, service can be interrupted during the time it takes for the spare processor to come on-line. To minimize this time interval, the spare processor is typically initialized and running in a standby mode so that a cold start (i.e., processor boot) does not need to be performed during switchover to active status. The rapidity with which a spare processor can come on-line can also be affected by the need to preserve dynamic data. That is, the active processor may have dynamic (i.e., transient) data for in-progress activities, such as live phone calls in a telecommunications system, that can be lost during switchover. Thus, while service may resume quickly, in-progress activities may be prematurely terminated and have to be restarted. In the case of a telecommunication system, a phone connection may be lost and the subscriber would have to hang up and redial.
There are different approaches in the art for preventing the aforementioned problem. Typically, there is a communications link between the active processor and its spare. This can enable the spare processor to receive data on an ongoing basis during steady-state (normal or xe2x80x9csunny dayxe2x80x9d) processing so that it may be better prepared to assume in-progress tasks should a switchover take place. This link can be used for a newly installed spare to request initialization data from its active counterpart. In theory, this data may enable the spare processor to take over activities from the active processor more gracefully. In practice, however, this method of routinely conveying data to the spare is often uneconomical in terms of CPU and I/O usage on the active processor for activities that generate large amounts of data, or where the data changes frequently and must constantly be updated.
A more economical solution for duplexed systems is to have, in addition to a communication link, xe2x80x9cmirroredxe2x80x9d RAM (random access memory) across the processors with specialized hardware support. Mirrored RAM provides RAM on each processor. When data is written into the mirrored RAM on one of the processors, the specialized hardware duplicates the write on the other processor""s mirrored RAM. The active processor can simply write data into the mirrored RAM without any of the overhead of sending messages. Thus, only a small performance penalty is incurred.
It may be the case that only the active processor can read from or write to the mirrored RAM, while the standby processor does not have access. This helps to keep hardware cost and complexity down by eliminating problems associated with coordinating the activities of two processors attempting to access the same memory. When a switchover takes place, the formerly standby/newly active processor then has access to the mirrored RAM and can resume the activity of the active processor, while the formerly active/newly standby processor no longer has access. In some situations, there may not be any noticeable disruption in service.
Generally, from a system design standpoint, the mirrored RAM cannot be considered a substitute for ordinary RAM since it is much more expensive. Beside cost, another problem associated with mirrored RAM is that a software process or task does not have an area of mirrored RAM in its addressable space. Therefore, the mirrored RAM is a resource that must be managed. Partitions are allocated to certain applications and record layouts are defined, somewhat analogous to how a data base might be set up. Application software checks out, modifies, and writes back records to and from the mirrored RAM.
A software implementation will usually make use of ordinary RAM memory for its operations, but in addition will copy certain key data into the mirrored RAM during steady-state processing. Only data needed for the standby processor to resume a task would be stored in the mirrored RAM. Some software applications may collect data associated with a particular time period, such as traffic measurement statistics in a telecommunications system. At the end of the time period, the application must detect that the time period has elapsed and the data collected may be sent elsewhere for processing or storage. Alternatively it may be put into a log and kept for a certain period of time for retrieval on-demand within that time period, after which the data are lost.
There can be various kinds of data being collected by such an application. For example, the application may increment a count related to an event, such as call originations in a telecommunications system. The application may also actively, on a periodic basis, obtain information about something such as system activity. An example would be for it to take a periodic sample of system activity of some kind over the time interval and put the information into a usable form. An example of such periodic sampling in a telephone system is obtaining the number of currently active phone calls in 100 second intervals and summarizing the hour""s activity based on these periodically acquired counts. Another possibility is generating statistics internally, within the application. Whatever the specifics of the case, it must be considered how to preserve this data during a switchover.
Accordingly, a need arises for a technique that preserves critical operational data when a system""s primary processor is replaced by a standby processor. The technique should be economical in terms of system cost and complexity, and should minimize data loss during the switchover task.
These needs and others are satisfied by the method of the present invention, in which data are retained during switchover from an active processor to a standby processor in a system having redundant processors. The method comprises the steps of performing periodic data collection as a first independent task executing on the active processor, and performing memory operations as a second independent task executing on the active processor. According to one aspect of the invention, the method further includes the step of performing data transfer operations as part of the second independent task.
In one form of the invention, the step of performing periodic data collection as a first independent task further comprises the steps of waiting for expiration of a period timer, acquiring at least one designated data element, and transmitting the data element to the second independent task. The step of performing memory operations may further include writing collected data elements to both a first memory partition associated with the active processor, and a second memory partition associated with the standby processor. The step of performing data transfer operations may comprise transmitting collected data elements to the standby processor over a dedicated communication link.
In accordance with another aspect of the invention, the method further includes the step of performing end-of-interval processing on the collected data. End-of-interval processing may include performing statistical evaluation of the collected data upon expiration of a predetermined interval. The predetermined interval is preferably greater than the period between collection of successive data elements.