Inter-processor communication (IPC) is often used to exchange data between processes running in different processor cores (e.g., different heterogeneous processor cores). These processes may result from programs running on a central processing unit (CPU) or system-on-a-chip (SOC) using different processing cores or threads. Even though the processes may be running separately, they may need to communicate information between each other.
The IPC communication is performed using interfaces of the processing cores or units upon which the processes are running according to communication protocols. At times, the communication protocols are such that the processing cores or unit communicate using peer-to-peer communication. The communication protocols used for IPC communication often include read and write operations to facilitate the exchange of information.
An issue arises when one of the peers in IPC communication with another peer is going to be reset or disabled. A peer may be disabled, for example, when it is being powered down. When this occurs, it is difficult for one peer to know that the other peer is being reset or is disabled. In such a case, the first peer may be waiting on information from the second peer, which could cause the first peer to get into a state in which it is endlessly waiting for a response from the second peer, thereby leading to a hang of the first peer or a heuristic timeout by the first peer (which in some cases may be wrong). In addition, this may lead to a situation where the second peer gets re-initialized but having lost its context, and the first peer keeps sending messages under a false assumption that the second peer is in a different state (e.g., the first peer believes that a session is in progress whereas the second peer lost any notion of that). The same mismatch in state may happen in the other direction as well. Finally, as messages in many cases are not atomic (e.g., made up of a series of register writes), if the first peer sends a non-atomic message and the second peer goes through reset in the middle, it is possible that the reset itself will include a reset in the second peer's message registers, while the first peer is continuing to populate them, leading to the second peer receiving a corrupted message. This may be avoided where the first peer is able to use read operations to read information from the second peer about its state and can determine based on the response to the read operations whether the second peer is unavailable or has gone through reset, particularly in the cases where a timeout or retries would cause the first peer to conclude that the other peer is not available.
While communication reset information between peers is easier to handle when read operations are allowed, read operations present other issues that are undesirable. For example, when a core of a peer is reset or becoming disabled (e.g., disabled due to entering a low power state), these actions can cause the read to fail due to, for instance, the isolation of the core. Also, a communication protocol containing reads may be more difficult to migrate across different underlying buses and/or fabrics.
Thus, it may be desirable to use a communication protocol between peers that doesn't include read operations. However, in situations in which peers only communicate through write operations, the problem of not knowing that the other peer has gone into reset or has otherwise become unavailable remains.