Parallel data processing engines are powerful and efficient means of processing large volumes of data, for example, in data integration and data warehousing scenarios. The data processing applications executed by these engines are typically made up of a complex system of processes and/or threads, which are referred to as “operators”, working in parallel to perform all of the required data manipulations. Data is passed from one operator to another via record buffers. Each operator gets the data to be processed from its input buffer, and writes the data it has processed to its output buffer. These buffers are shared with the previous and subsequent operators as their output and input buffers, respectively. The overall throughput of the application is generally determined by the slowest operator in the set, as its rate of consumption and production have a ripple effect throughout the application. A slow operator can create a bottleneck in the process. The entire flow of the process may be affected by a bottleneck, but it is difficult to determine where the bottleneck occurs. It is also difficult to determine if multiple bottlenecks occur, and where they may occur.