Multi-node computer systems are often partitioned into domains, with each domain functioning as an independent machine with its own address space. Partitioning allows resources of a computer system to be efficiently allocated to different tasks. Domains in partitioned computer systems may dynamically share resources. When a fatal failure of a packet processing occurs in a domain, the processing cannot be continued in the system. As a result, a shared resource entry is left in an intermediate state. To reset and restart operation of the failing domain in the system, the shared resource must be reset entirely. This requires resetting all other domains, even if the other domains are running with no failure in the system.
One solution for error containment and recovery in a partitioned system is to use a dedicated resource for each domain so that if a failure occurs within a domain non-failing domains are not affected. However, using a dedicated resource for each domain to enable an error containment and recovery in a partitioned system requires a larger amount of resources than using a shared resource, because the amount of the resource has to accommodate the maximum requirements of all the domains in the system.
Therefore, it is desirable to provide a mechanism that would allow the system to contain an error in a failed domain so that non-failed domains remain unaffected.