Multi-partition systems for network applications are often implemented through the use of networking System-on-Chip (SoC) devices composed of multi-core clusters and a networking sub-module, with multi-partition software running on the multi-core clusters. In the field of such multi-partition systems, there is a class of system that offers high availability for cases where a partition fails. The high availability property is typically achieved for a particular (primary) partition through the use of a secondary partition which during normal operation is put into a standby state. Upon detection of a failure condition within the primary partition, the secondary partition may be brought out of its standby state, and operation switched from the failed primary partition to the secondary partition. Detection of a ‘failure condition’ is usually implemented by a watchdog mechanism, whereby upon a watchdog timer expiring as a result of the partition failing to reset the watchdog timer, a failure condition is deemed to have occurred.
FIG. 1 schematically illustrates operating states of a conventional multi-partition networking device 100. The multi-partition networking device 100 comprises a first (primary) partition 110 running on a first set of hardware resources, illustrated generally at 115, and a second (secondary) partition 120 running on second set of hardware resources, illustrated generally at 125. The multi-partition networking device 100 is arranged to operate in a first, normal operating state 102, whereby the first set of hardware resources 115 are in an active state (i.e. powered up and functional) and the first partition 110 is arranged to process inbound network traffic, for example received via network sub-module 130. In this first, normal operating state the second set of hardware resources 125 are in a standby state (e.g. in a powered down mode) to minimise power consumption of the multi-partition networking device 100. Upon the detection of a failure condition 140, the multi-partition networking device 100 is arranged to transition to a second, failover operating state 104, whereby the second set of hardware resources 125 are transitioned from a standby state to an active state (e.g. powered up and brought into an operational condition), and processing of inbound network traffic is transferred to the second partition 120. The first set of hardware resources 115 may then be transition into a standby state, for example powered down to minimise the power consumption of the multi-partition networking device 100. The multi-partition networking device 100 may be transitioned back to the first, normal operating state upon a resume condition 145 being detected.
In many networking systems, the requirement for the high availability system is to prevent packet loss in the case of a partition failure, and specifically to ensure the switch from the primary partition to the secondary partition does not include any loss of networking traffic. During the period from the time when failure occurs within the primary partition to the time when the secondary partition undertakes responsibility for processing network traffic, received network traffic is not being served and received data packets are required to be stored within a buffer pool (e.g. within the networking sub-module 130). This period of time when network traffic is not being served includes:                (i) the time taken to detect the failure condition within the primary partition; and        (ii) the time taken to bring the secondary partition out of standby state and into an operational condition.        
The longer this non-serving period is, and the higher the rate of traffic served by the system, the larger the volume of data packets that are required to be held within the buffer pool, and thus the greater the required size of the buffer pool needed to store the incoming data packets in order to avoid loss of networking traffic.
The time taken to bring the secondary partition out of standby and into an operational condition typically includes:                (i) the time it takes to bring the secondary partition out of deep sleep (i.e. to power up);        (ii) the time taken to resume the relevant context; and        (iii) getting into a ‘hot’ state where the local register values etc. are set correctly.        
The time taken to bring the secondary partition out of standby and into an operational condition may be minimised by maintaining the secondary partition in a fully powered-up state. However, this significantly increases the power consumption of the overall system. As such, it is desirable for secondary partitions to remain powered down when not in use to minimise power consumption.
As such, in a conventional multi-partition system in which the secondary partition is powered down during normal operation, there can be a significant time lapse between the primary partition failing and operation being switched over to the secondary partition, requiring a large buffer to be implemented in order to prevent loss of network traffic. However, increasing the size of the buffer pool significantly increases the cost, power consumption and die area for the buffer pool. As such, there is a requirement to minimise the required size of the buffer pool in which received data packets are stored, and thus a requirement to minimise the period of time when network traffic is not being served.