Currently, many electronic systems comprise one or more processors linked to a main memory by a bus or other type of interconnect. The main memory is arranged to organize information being stored, such as instructions and/or data for example, into blocks. Each “block” is separately addressable and may be of a fixed size of bytes. Information is typically moved about the multiprocessor system in units of blocks.
In theory, processors within a multiprocessor system are adapted to retrieve one or more blocks of information from the main memory, perform operations on such information, and eventually return the results back to main memory. However, retrieving information from main memory can take a significant amount of time, especially in light of the high operating speeds of modern processors.
To reduce such latencies, however, modern processors rely on one or more cache memories (hereinafter referred to as “caches”). A cache is a small, fast memory module that is placed in close proximity to a processor and is used to store information that the processor is currently using or is likely to use in the near future.
Because more than one processor of the multiprocessor system may request a copy of the same block of information, cache coherency protocols have been developed to ensure that no processor relies on a “stale” block, namely a block of information that is currently incorrect due to modifications or updates performed to the block by some other processor. One type of cache coherency protocol is a distributed directory-based protocol, where the multiprocessor system includes directories that are each used to store protocol state information pertaining to a range of blocks of memory. Examples of protocol state information include “Shared” (S), “Invalid” (I) and “Modified” (M) state values per standard MESI protocol.
A common technique employed in many multiprocessor systems utilizing a broadcast protocol is to not maintain directory entries for blocks in the Shared (S) state. Rather, only entries for blocks in the Modified (M) state are maintained. If a processor requires exclusive access for a cache block (e.g., for a STORE operation), and the block does not have a directory entry, then the STORE protocol involves the sending of an invalidation message (INVAL) out to all of the processors on a the ring interconnect to ensure that any shared copies of the cache block are placed in an ‘I’ state.
For instance, as shown in FIG. 1, for this STORE flow operation, if a CPU/cache complex 100 wants to store information into a cache block that it does not currently own, a WRITE REQUEST message 110 is transmitted. WRITE REQUEST message 110 is represented by “WRITREQ[BID] [NID],” where the term “BID” 112 represents an identifier of the requested cache block and “NID” 114 represents a network identifier of CPU/cache complex 100 requesting Modified (exclusive) access to the cache block.
The WRITEREQ message 110 is received by a “Home node” 120, namely the node having access to a directory that knows the state of the requested cache block. Home node 120, which is determined to have a specific protocol engine (PE) such as “PE5,” performs a look-up of directory for an entry associated with the requested cache block. If a directory entry for the cache block does not exist, meaning that the block is not exclusively owned, an Invalidate message 130 (INVAL[BID, ALL]) is sent out on the interconnect to all of the nodes to invalidate the block if it exists in their cache. The reason is that the block may exist in the ‘S’ state in any of the caching agents.
Invalidate message 130 takes a known fixed duration on a broadcast style interconnect like ring for all of the nodes on the interconnect to observe. The fixed duration would be equal to N*T+(N+M−1)*T′ cycles, where “N” is the number of CPU/cache complexes, “T” is the time (cycles) taken to process each message excluding decode time, “M” is the number of PE/directory nodes, and “T′” the time for “decode only’ operations.
After expiration of N*T+(N+M−1)*T′ cycles, Home Node 120 issues a Write Acknowledgement (WRITEACK[BID,NID]) message 140 to grant ownership of the block to CPU/cache complex 100 identified by “NID”. This takes “kT′” time, where “k” is the hop distance from the Home Node to the requesting agent. Hence, the complete transaction time is set forth in equation (1) and the worst case/best case time delays from observation of the Invalidate message to actual ownership by CPU/cache complex 100 is set forth in equations (2, 3) respectively.N*T+(N+M−1)*T′+k*T′  (1)(N+M−k−1)*T+(N+M−k−1)*T′+k*T′(Worst)  (2)(N+M−k−1)*T′+k*T′(Best)  (3)
An average case analysis for these times assuming “k” varies from 1 to N would yield the time periods set forth in equations (4) and (5) set forth below.
Average time for complete transaction (from INVAL issue) would be computed as shown in equation (4):
                                          N            *            T                    +                                    (                              N                +                M                -                1                            )                        *                          T                                                                                ⁢                ′                                              +                                    1              /              N                        *                                          ∑                                  k                  =                  1                                                                                                          ⁢                                      N                    +                    M                    -                    1                                                              ⁢                              kT                                                                                          ⁢                  ′                                                                    =                              N            *            T                    +                                    (                              N                +                M                -                1                            )                        *                          T              ′                                +                                    (                              1                /                N                            )                        *            0.5            *                          (                              N                +                M                            )                        *                          (                              N                +                M                -                1                            )                        *                          T                                                                                ⁢                ′                                                                        (        4        )            
An Average time for Read For Ownership “RFO” (from INVAL observation to ownership) would be computed as shown in equation (5):
                                          1            /            N                    *                      [                                                  ⁢                                                            ∑                                      k                    =                    1                                                                                                                      ⁢                                          N                      +                      M                      -                      1                                                                      ⁢                                                      (                                          N                      +                      M                      -                      1                      -                      k                                        )                                    *                                      T                                                                                                              ⁢                      ′                                                                                  +                                                ∑                                      k                    =                    0                                                                                                                      ⁢                                          N                      -                      1                                                                      ⁢                kT                            +                                                ∑                                      k                    =                    1                                                                                                                      ⁢                                          N                      +                      M                      -                      1                                                                      ⁢                                  kT                                                                                                    ⁢                    ′                                                                        ]                          =                                            (                              N                +                M                -                1                            )                        /                          (              N              )                                *                      [                                          (                0.5                )                            *                              (                                  N                  -                  1                                )                            *                              (                N                )                            *              T                        ]                                              (        5        )            
In addition, the approximate total cost for barrier synchronization, where all N processors participate, and follow a total schedule where the cache block representing the barrier is assumed to be not present initially in any of the private caches, and follows the state transition I−>M (ownership)−>S(test for barrier), and then repeatedly from S−>I−>S, until all processors have reached the “barrier” point, is provided below in equation (6):
                                              ⁢                              Equation            ⁢                                                  ⁢                          (              6              )                        ⁢                          :                                ⁢                                          ⁢                                          ⁢                                    Total              ⁢                                                          ⁢              Cost                        =                                          N                *                N                *                T                            +                              N                *                                  (                                      N                    +                    M                    -                    1                                    )                                *                                  T                  ′                                            +                                                ∑                                      K                    =                    1                                                        N                    +                    M                    -                    1                                                  ⁢                                  k                  *                  t                                                              ⁢                                          ⁢                      Or            ,                                          Total                ⁢                                                                  ⁢                Cost                            =                                                N                  *                  N                  *                  T                                +                                  N                  *                                      (                                          N                      +                      M                      -                      1                                        )                                    *                                      T                    ′                                                  +                                                      (                                                                  (                                                  N                          +                          M                          -                          1                                                )                                            ⁢                                              (                                                  N                          +                          M                                                )                                            *                                              T                        ′                                                              )                                    /                  2                                                              ⁢                                          ⁢                                          ⁢                                    Total              ⁢                                                          ⁢              Cost                        =                                          N                *                N                *                T                            +                              0.5                ⁢                                                                  ⁢                                  (                                      N                    +                    M                    -                    1                                    )                                *                                  (                                                            3                      ⁢                                                                                          ⁢                      N                                        +                    M                                    )                                *                                                      T                    ′                                    .                                                                                                    