Embodiments of the present invention relate to data processing and more particularly to optimizing software applications.
Advances in processor technology have spawned the development of network processors (NPs), which are designed specifically to meet the requirements of next generation network equipment. In order to address the unique challenges of network processing at high speeds (e.g., the latency for a single external memory access in a NP is usually larger than the worst-case service time), modem network processors generally have a highly parallel multi-processor architecture. For instance, some network processors process packets in a microengine cluster that includes multiple microengines. Each microengine is a programmable processor to perform various activities, for example, packet processing. To process packets at high speeds, multiple microengines may run in parallel, each supporting multiple hardware threads.
Some processors such as network processors may include highly parallel architectures. Accordingly, such processors may be well-suited for use with applications that take advantage of parallelism. For example, a network application for packet processing may include the functions of receiving a packet, routing table look-up, and enqueueing of the packet. Such an application may be parallelized through pipelining and multi-threading transformations.
In implementing such a transformation, the data may typically be transmitted from stage to stage via a global resource such as one or more data transmission channels. However, not all data transmitted from stage to stage may be necessary. As a result, unnecessary overhead is incurred in transmitting data that are not needed. More so, when a transformation is multi-threaded, each thread must synchronize about certain critical areas of the processing, such as data transmission. Accordingly, excessive data transmission can negatively impact performance of multi-threaded applications.