Processing of data, such as video data or image data, may be performed for a variety of reasons. For example, video and/or image data may be compressed in order to save bandwidth during transmission or to save memory space during storage. In other examples, processing of video/image data may include reducing a noise component therein, or, performing any of scaling/de-scaling, color conversion, de-interlacing, composition/decomposition of the video/image data. Typically, a data processer includes a high level framework and a low level framework for performing such data processing. Generally, the high level framework includes a host processor which runs on a high level operating system such as LINUX operating system, whereas the low level framework includes a plurality of slave processors and hardware accelerators which run on a low level operating system such as Basic Input/Output System (BIOS) operating system. In non-tunneled data processing architecture, a completion of a processing stage by a slave processor/hardware accelerator is signaled by the low level framework to the high level framework. The high level framework then intimates the next slave processor/hardware accelerator in the low level framework for performing the next processing stage. If a processing rate of each processing stage is F frames/second and there are N processing stages, then a total latency of the data processing pipeline is (1/F*N) seconds. To reduce the total latency of the data processing pipeline, signals corresponding to output data produced after processing a part of a data frame (for example, a data sub-frame) are sent to the next processing stage. This allows processing to start in the next stage of the data processing pipeline earlier. If a particular processing stage produces S data sub-frames, the latency of the processing stage comes down from (1/F) seconds to (1/(F*S)) seconds. However, such an approach results in an increase in the number of times that signals are sent through the high level framework, which proportionately increases the processing cycles on the host processor, in turn affecting performance of the data processor.