The present invention relates generally to parallel processing systems and methods and, more particularly, to systems and methods for parallel processing which optimize the usage of barrier instructions.
Microprocessors, including general purpose microprocessors and digital signal processors (DSPs), are ubiquitous in today's society. Many different types of products incorporate microprocessors, including personal computers, toys and cars just to name a few. As microprocessors have evolved in their complexity, so too has the level of sophistication of software programs and the manner in which such programs and processors interact. Parallel processing refers to, for example, techniques for executing different parts of a software program at the same time using two or more microprocessors operating in parallel. Using parallel processing, a program can be executed more quickly than if it were executed on a single processor of the same type and/or more cheaply than if it were executed on a single processor having greater processing capabilities than the individual processors operating in parallel. Other forms of parallel processing involve running the same program on multiple processors against different data sets (single program, multiple data or SPMD).
When a program is developed for execution on parallel processors, one issue confronted by system designers and programmers is synchronizing the execution of the different parts of the program between the processors. For example, a result obtained by processor A in executing its part of the program may be needed by processor B to execute its part of the program. A barrier can be added to the program to ensure that each of the parallel processors or threads stops at the barrier until all other processors/threads have reached that point in the program. This creates a synchronization point for the program. Other kinds of synchronization are also available. For example, in point-to-point synchronization, a first processor can be made to wait until a second processor indicates that it is now allowable for the first processor to proceed. Barriers are often used to create synchronization points because (1) they are conceptually simple, (2) they are the only synchronization method provided as a built-in primitive by some languages (e.g., UPC), and/or (3) correct point-to-point synchronizations are not easy to write. Because of their simplicity, barriers are often used in cases where other synchronization techniques, e.g., point-to-point synchronization, would be sufficient. However, barriers hinder program performance because they result in one or more processors waiting at a barrier for another processor to catch up.
To better understand the issues associated with barriers, consider the following: