Video processing, video coding, and graphics application technologies are markets that have been growing substantially over the last few years. The technologies are combined into many applications and are widely used. Video data bandwidth usage is high especially since video resolution enabled on televisions and personal computer monitors keep increasing all the time. For example, 1080 progressive (1080P) resolution is available now in most new televisions. An associated bandwidth for a simple display of 1080P video is about 3 gigabits per second. Digital signal processors performing video coding, video processing or graphics applications are sensitive to memory bandwidth criteria. The memory bandwidth criteria limit the performance of many systems rather than processing power. Therefore, memory bandwidth optimization is useful in order to enable such applications.
Many video processing techniques utilize several copies of a frame at several locations within a memory. Three-dimensional (3D) graphics applications also perform texture mapping over 3D scenes by considering the resolution from which to extract the current level of detail specified after a 3D warping. Furthermore, scalable Video Coding (SVC) uses multi-resolution representations of the video. The multi-resolution representations enable both error resilient transmission of the video and an ability to personalize video experience according to the edge device capabilities and type of service (i.e., standard or prime services).
Referring to FIG. 1, a block diagram of a conventional method 10 for creating multi-destination copies is shown. In the method 10, a frame stored at a location 12 is read directly from a memory 14 to two or more locations 16a-16b in another memory 18 using two independent transfers 20a-20b. The transfers 20a-20b are controlled by a direct memory access engine 22. A problem with the method 10 is that a bandwidth cost for the memory 14 is high, the total transfer is typically slow and a bottleneck is created for the application relying on the frames in the memory 18. In the method 10, the bandwidth involved is two frame reads from the memory 14 and two frames writes into the memory 18.
Referring to FIG. 2, a block diagram of another conventional method 30 for creating multi-destination copies is shown. In the method 30, the frame at the location 12 is read from the memory 14 to the location 16a using the transfer 20a. The direct memory access engine 22 then copies the frame from the location 16a to the location 16b in another transfer 32. The lack of the transfer 20b decreases the bandwidth consumption of the memory 14 compared with the method 10. However, method 30 still causes some issues. In particular, congestion is created in the memory 18, especially if both copies of the frame in the memory 18 are to be accessed temporally proximate each other. A synchronization issue is also created due to the transfer 20a writing to the location 16a while the transfer 32 tries to read from the location 16a. Furthermore, the internal memory bandwidth of the memory 18 is increased due to the added read from the location 16a at the start of the transfer 32.