The technology described herein relates to data processing systems, and in particular to the synchronization of the operation of hardware units in data processing systems.
In a data processing system, a “producer” processing unit may produce (generate) a data output that is then to be used (e.g. processed) by one or more other “consumer” processing units of the data processing system. An example of this would be in a multimedia subsystem where, for example, a video decoder may decode encoded video data representing a sequence of video frames to be displayed, with one or more other processing units, such as a graphics processing unit, then processing the decoded video frames in a desired manner, before those video frames are provided to a display for display.
FIG. 1 shows schematically an exemplary data processing system which includes a multimedia subsystem. As shown in FIG. 1, the data processing system 200 comprises a multimedia subsystem in the form of a system-on-chip (SOC) 202. The system generally also comprises off-chip (main) memory 216, a display device 218 and a video camera 220.
The multimedia subsystem SOC 202 comprises a central processing unit (CPU) 204, a graphics processing unit (GPU) 206, a video processor 208, a display controller (display processor) 210, an interconnect 212 and a memory controller 214.
As shown in FIG. 1, the CPU 204, GPU 206, video processor 208, and display controller 210 communicate with each other via the interconnect 212 and with the memory 216 via the interconnect 212 and the memory controller 214. The display controller 210 also communicates with the display device 218. The video camera 220 also communicates with the multimedia system SOC 202 via the interconnect 212.
In a data processing system as shown in FIG. 1, the video processor 208 may, for example, be operable to decode encoded video data that has been stored in the memory 216, and to then store the decoded video data in the memory 216 for subsequent processing by, for example, the GPU 206. The GPU 206 may correspondingly store the processed video data in the memory 216 for use then by the display controller 210 for providing to the display device 218 for display. In this case therefore, the video processor 208 will be acting as a producer processing unit producing, e.g., frames of decoded video data for consumption by the GPU 206, with the GPU 206 correspondingly acting as a producer processing unit to provide processed video frames for consumption (use) by the display controller 210.
In arrangements such as that illustrated in FIG. 1, a “producer” processing unit will typically store the data that it is producing in an appropriate memory that is shared with (also accessible to) the consumer processing units that are to use the data, with the consumer processing units then reading the data from the memory for use.
An important aspect of such operation is to synchronize the reading of the data from the memory by the consumer processing units with the writing of the data to the memory by the producer processing unit. For example, the consumer processing units must be controlled to avoid trying to read data from memory before the data is stored in the memory by the producer processing unit.
The synchronization of such operation may be provided by use of software “fences” to synchronize the operation of the different processing units. Such software fences are usually controlled by, and operate under the control of, respective drivers for the processing units (that are running on a central processing unit of the overall data processing system).
For example, in the case of a multimedia subsystem as discussed above in relation to FIG. 1, when the video processor 208 finishes its decoding of a video frame and has stored the decoded video frame in the memory 216, it may signal an interrupt to the video processor driver executing on the CPU 204, with the video processor driver recognising that interrupt as indicating that the production of the video frame has been completed, and accordingly communicating that event to the driver of the consumer processing unit (e.g. for the GPU 206) that is to use the decoded video frame. The driver for the, e.g. GPU 206, will receive that message and then trigger the, e.g. GPU 206, to process (use) the decoded video frame that is now present in the memory 216.
Correspondingly, once the GPU 206 has finished processing the decoded video frame and stored the decoded video frame in memory, it will correspondingly signal an interrupt indicating that the complete frame has been rendered to the GPU driver on the CPU 204, with the GPU driver then recognising that event and correspondingly signalling (e.g.) the driver for the display controller (display processor) 210 to cause the display controller driver to then trigger the display controller 210 to process the rendered frame from the GPU 206 for display.
FIGS. 2 and 3 illustrate this operation.
FIG. 2 shows an exemplary multimedia subsystem stack corresponding to the data processing system and multimedia subsystem of FIG. 1.
As shown in FIG. 2, an application 30 executing, e.g., on the CPU 204 of the multimedia subsystem 202 will interact via appropriate APIs 32 and corresponding drivers 33 for the hardware units 31 with the appropriate multimedia subsystem hardware 31 (comprising, e.g., the graphics processing unit 206, the video processor 208, the display processor 210 and the memory 216). As discussed above, as part of this operation, the communication and control of the hardware processing units 31 will be synchronized by means of software “fences” 34 that are enforced and implemented by the respective drivers 33 for the hardware units 31.
FIG. 3 illustrates this for the case of the graphics processor 206 drawing a frame that will then be used by the display processor 210 to display the frame on the display device 218.
As shown in FIG. 3, this operation will first comprise, after system boot up, initialisation of the graphics processor 206 (step 40), and correspondingly initialisation of the display processor 41 (step 41). The driver 35 for the graphics processor 206 will then prepare the appropriate commands and data for causing the graphics processor to draw the desired frame (step 42). As part of this operation, the GPU driver 35 will set a “fence” to identify and signal the completion of the frame (step 43).
Correspondingly, the display processor driver 36 will prepare the appropriate buffers 37 and wait on the “fence” generated by the GPU driver 35 (the display processor driver won't send the command to the display processor hardware until the waiting “fence” is signalled). The display processor driver may also set its own “fence” for synchronization if using the same buffer.
The graphics processor driver 35 will then issue the appropriate commands and data to the graphics processor hardware 206 which will then draw the frame (step 46) and write the frame into the appropriate buffer 47 in memory.
When the graphics processor hardware 206 finishes drawing the frame, it will signal an interrupt to the graphics processor driver (step 48). The graphics processor driver will accordingly signal that the “completion” fence has occurred (step 49) to the Android synchronization service (step 50) (which controls the synchronization “fences”) which will then signal that the graphics processor hardware fence has been completed (step 51).
It will correspondingly be signalled to the display processor driver 36 that the graphics processor completion fence has been signalled, and in response to that, the display processor driver 36 will trigger the display processor hardware 210 to read the completed frame from the buffer 47 and display it on the display (steps 52 and 53).
Arrangements of the type illustrated in FIGS. 1, 2 and 3 typically have relatively long latencies. For example, in the multimedia subsystem example described above, the display (consumer) processing unit will only access the completed data output from the graphics (producer) processing unit once the entire output (e.g. frame) has been completed. This will then lead to a latency of one or more frames (depending upon how many producing and consuming units are in the overall processing pipeline) between the initial generation of the, e.g. frame, and its display.
Such latency can be a problem, particularly in the case of lower powered, and mobile, devices. For example, longer latencies can reduce the user experience, especially in gaming and virtual reality (VR) use cases.
The Applicants accordingly believe that there remains scope for improved synchronization and handling of data outputs that are being shared between producing and consuming processing units in data processing systems.
Like reference numerals are used for like features in the drawings (where appropriate).