1. Field of the Invention
The present invention relates generally to an electronic system having a plurality of execution units requiring access to a shared resource, and more specifically to a method and apparatus for managing access to a resource that is shared by a plurality of execution units.
2. Description of the Prior Art
Semaphores are commonly used to enforce mutual exclusion rules for controlling access to shared resources in systems wherein a plurality of execution units, such as processing units and hardware engines, each require access to a shared resource. The shared resource is typically a memory space for storing information which may include a single bit of data, a byte, or a large data structure. The shared resources could also be the processing resources of a processing unit.
Computer operating systems (OS) commonly use semaphores in supervising software processes executed by a central processing unit (CPU), the processes functioning asynchronously and concurrently. Problems arise in an OS when concurrent processes attempt to use shared resources in an improperly synchronized manner. For example, it is generally necessary to prevent one process from writing to a shared resource while another process is reading from it. Semaphores generally serve as control flags for accessing the shared resource.
Another type of system for which semaphores are commonly used is a multiprocessing system which is a computer system having two or more processing units each executing an associated sequence of instructions, and each requiring access to a shared resource. A dedicated processing unit provides yet another example of a system for which semaphores have been used. A dedicated processing unit, such as a graphics processing unit, typically includes a plurality of execution units such as graphics engines, audio engines, video engines, and the like wherein groups of the execution units share resources.
FIG. 1A shows a block diagram generally illustrating a first example of a conventional semaphore based resource sharing system at 10, the system 10 including a plurality of processes 12 each requiring access to a shared resource 14. The shared resource 14, which is typically a portion of memory space, may be a system memory, a cache memory unit, a random access memory unit (RAM), a buffer, or a single register unit.
In the depicted example, a first process12 designated PROCESS_1, and a second execution unit designated PROCESS_2 each requires access to the shared resource 14. Each of the processes 12 communicates with each other in accordance with a set of mutual exclusion rules in order to access the shared resource 14 asynchronously. For example, consider that while PROCESS_2 is writing data to the shared resource 14 as illustrated by a line 16, PROCESS_1 needs to read data as illustrated by a line 18. If PROCESS_2 begins writing before PROCESS_1 is done reading, PROCESS_1 is likely to receive corrupted data.
The general rule for accessing the shared resource 14 in the system 10 is that only one of the processes 12 may write to the shared resource 14 at a time. Also, no process may read from the shared resource 14 while another process is writing. In accordance with one well-known resource sharing method, a semaphore 20 is used to indicate ownership of the resource 14 at an instance of time. The semaphore 20 is typically implemented as a semaphore value stored in memory (not shown) that can be accessed by each of the processes 12. In the case of a computer system, the OS usually defines the rules for accessing the shared resources. Each type of OS may have different sets of rules for accessing a shared resource using a semaphore.
In the depicted example, PROCESS_2 may claim ownership of the shared resource 14 by updating the value of the associated semaphore 20 if the semaphore 20 indicates that no process currently owns the shared resource. Thereafter, PROCESS_2 may write to the shared resource. After PROCESS_2 is done writing, PROCESS_2 relinquishes ownership of the shared resource by updating the semaphore 20 with an appropriate value. PROCESS_1, which is operative to sample the semaphore as indicated by a line 24, may subsequently determine that it may claim ownership of the resource, and eventually writes a value into the semaphore 20 to claim ownership. Note that the requirement that a process sample the semaphore can be problematic because time and processing power may be wasted. For example, if one of the processes 12 is a CPU, the CPU will xe2x80x9cspinxe2x80x9d while repetitively reading the semaphore in order to determine changes in ownership status.
FIG. 1B shows a block diagram generally illustrating a second example of a conventional resource sharing system at 30. In this example, a third process 12 designated PROCESS_3, and a fourth process 12 designated PROCESS_4 both need to write data to the resource 14 at the same time. If PROCESS_3 and PROCESS_4 were to write data to the shared resource 14 at the same time, the result might be that the shared resource 14 would include corrupted data. Therefore, PROCESS_4 may not write to the shared resource 14 as indicated by a line 32 while PROCESS_3 is writing to the shared resource 14 as indicated by a line 34. Likewise, a fifth process designated PROCESS_5 may not read from the shared resource 14 as indicated by a line 36 while PROCESS_3 is writing to the shared resource 14.
FIG. 1C shows a block diagram generally illustrating a third example of a conventional system at 30 including a sixth process designated PROCESS_6 that is operative to write to the resource 14, a seventh process 12 designated PROCESS_7 operative to read from the resource, and an eighth process designated PROCESS_8 operative to read from the resource. PROCESS_7 and PROCESS_8 may read from the resource 14 concurrently as indicated by lines 42 and 44 because data is not modified during reading, but only one process may write to the resource 14 at a time. However, while either or both of PROCESS_7 and PROCESS_8 is reading from the resource, PROCESS_6 cannot write to the resource 14 as indicated by a line 46.
Note that each shared resource must have a semaphore associated with it. If only one semaphore was to be used for two different shared resources, the method would fail. Also, note that any number of processes may share access to a resource. However, a semaphore associated with a particular shared resource must provide a range of values (or must have a number of bits) sufficient to provide a unique value associated with each process sharing access to the resource.
FIG. 2A shows a block diagram generally illustrating a first conventional type of computer graphics system at 50 wherein a plurality of execution units share a resource. The system 50 includes: a CPU 52; a system memory 54 coupled with the CPU via a bus 56 which may be a system bus or a local bus; a graphics engine 58 coupled with the bus via a first channel 60; a disk controller 62 coupled with the bus via a second channel 64; and an audio engine 66 coupled with the bus via a third channel 68. Each of the channels 60, 64 and 68 may be controlled via a programmed input/output (programmed I/O). Data is transferred to engines 58, 62, and 66 in accordance with a method wherein the CPU 52 writes instructions and data to each of the channels 60, 64, and 68, and each of the engines reads instructions and data from the associated channel. The engines 58, 62, and 66 and the channels 60, 64 and 68 provide a parallel I/O sub-system at 70. Note that only one engine or process may access the system memory 54 via the bus at an instant of time.
Each of the channels 60, 64 and 68 is typically a first-in first-out memory device (FIFO) including a dual-ported memory (not shown) having a semaphore built into it, the dual-ported memory providing a circular buffer using a xe2x80x9cgetxe2x80x9d pointer and a xe2x80x9cputxe2x80x9d pointer to indicate a starting point and an ending point. Note that two separate clock domains may be associated with each of the channels. Ownership of each of the channels is determined in accordance with a free count semaphore method wherein a free count value indicates a number of bytes available for the CPU 52 to write to. When the CPU 52 writes to one of the channels, the free count semaphore value is decreased. As an example, when the CPU writes to one of the channel units, the associated free count value may be decreased by four where four bytes are written at a time. Likewise, when one of the engines reads from the associated one of the channels, the free count semaphore value is incremented by four. Each channel can be accessed during a single cycle by the CPU 52 and by the associated one of the engines. However, two different processes may not access one location of the channel at the same time. As an example, consider that the first channel 60 includes memory space for storing 64 bytes. In this case, the free count semaphore value is a seven byte value. If the free count value is zero, this indicates that the CPU 52 has filled all of the memory locations, and that all of the filled memory locations are now owned by the graphics engine 58. When the graphics engine reads a portion of the memory space of the channel, ownership of the memory space is provided back to the CPU and the engine increments the free count value.
The CPU 52 operates under control of an OS, and each of the engines also operates in accordance with rules imposed by the OS. The OS uses notification schemes including semaphores to determine ownership of shared resources between a plurality of software processes that function asynchronously and concurrently on the CPU 52, and also between the processes and the engines. Particular groups of the software processes and the engines share associated resources. A semaphore value associated with a particular one of the shared resources may be stored any place in the system 50 that is accessible by each of the execution units sharing access to the particular resource. A semaphore may be stored in a cache location or in a notification data structure of the system memory 54.
Many different methods may be used to communicate notification information, indicating ownership status of a shared resource, between the CPU 52 and the engines 58, 62, and 66. As an example, an interrupt operation may be executed by one of the engines to notify the CPU that the CPU now owns an associated resource such as the memory unit 54. Also, semaphores may be used by the CPU and the engines to notify each other regarding ownership of shared resources. As an example, consider that the OS instructs the disk controller 62 to read a portion of data from a disk (not shown) to the system memory 54. In this case, a process executed by the CPU 52 that requires access to the system memory cannot be executed until the specified data has been successfully transferred from the disk to the system memory by the disk controller 62. Therefore, the OS must ensure that the process is xe2x80x9cput to sleepxe2x80x9d or the process must stall until the transfer of data from the disk to the system memory is successfully completed.
In accordance with one conventional notification scheme, when the OS instructs the disk controller 62 to transfer data from the disk to the system memory, the OS writes a first semaphore value to a status register located in the disk controller 62, the first semaphore value indicating that the disk controller 62 owns the system memory 54. When the disk controller 62 is done transferring all of the data to the system memory 54, the disk controller 62 must provide notification to the 0s that the disk controller has relinquished ownership of the memory unit. A notification data structure stored in the system memory 54 includes a second semaphore value accessible by both the CPU 52 and the disk controller 62, the second semaphore value indicating the ownership status of the system memory. While the disk controller 62 is transferring data to the system memory 54, the second semaphore value indicates that the system memory is owned by the disk controller, and the CPU spins while sampling the second semaphore value to determine any change in the ownership status. When the disk controller 62 is done transferring all of the data to the system memory 54, the disk controller 62 writes a semaphore value to the 5 notification data structure stored in the system memory to indicate that the OS now has ownership. A problem associated with this conventional resource sharing system is that the CPU 52 xe2x80x9cspinsxe2x80x9d while sampling semaphores while waiting for ownership status of shared resources to change. A significant amount of time and processing power is wasted while the CPU is spinning.
One advantage associated with the I/O system 70 is that it allows for concurrent processing of instructions by the engines 58, 62, and 66. However, a disadvantage associated with the I/O system 70 is that the CPU 52 must provide a significant amount of processing power in order to ensure that the instructions provided to each one of the engines is executed in sequence. The CPU must orchestrate the execution of instructions by each of the engines. So, it is problematic that the CPU 52 must wait for engines to process a number of instructions before providing another one of the engines with additional instructions. As an example, the CPU 52 may execute instructions of an application that coordinates processing of data by different engines. For example, if audio and video data are to be processed by the audio engine 66 and graphics engine 58, it is necessary to synchronize the audio and video data. The audio data rate must be maintained at a constant rate in order to maintain sound quality, and therefore many frames of the video data may need to be deleted or dropped in order to make the video data rate match the audio data rate. Therefore, a programmer must be provided with flexibility in coordinating the processing of data and instructions by the engines.
As the system speed increases, the processing requirements for the CPU 52 increase because more data and instructions need to be pushed to the engines. If the graphics engine stalls waiting for the audio engine, the CPU 52 sets up audio to provide a notification at a certain point. Then the CPU could start loading data and instructions into the graphics engine 72 via the associated channel. So, the CPU 52 is spinning while the audio engine 66 is processing.
FIG. 2B shows a block diagram generally illustrating a second type of conventional computer graphics system at 80 including the CPU 52, system memory 54, bus 56, disk controller, audio engine, and graphics engine. Each of the engines 58, 62, and 66 is communicatively coupled with the CPU and system memory via an associated one of three direct memory access controllers (DMA controllers) 82, 84, and 86. Each of the DMA controllers is operative to read instructions and data from the system memory, and to provide the instructions and data to the associated one of the engines.
With reference to the parallel type I/O interface system 70 (FIG. 2A), the CPU 52 may transmit a first set of commands to the audio engine 66, a second set of commands to the graphics engine 58, and a third set of commands to the disk controller 62. Note that each of the engines may process instructions at a fast rate, and therefore, an engine may stall while waiting for additional instructions to be sent from the CPU. Note further, that if one of the engines is stalled, then the associated channel is stalled. In the parallel type I/O sub-system 70, each of the engines may process instructions concurrently because instructions and data may be provided to each of the engines in parallel.
FIG. 3A shows a block diagram generally illustrating a prior art graphics system at 100. The system 100 includes: the CPU 52; a frame buffer 101; a video engine 102 operative to read data from the frame buffer 101; and a display unit 104 coupled with the video engine 102 as shown by a line 106. The frame buffer 101 includes a first buffer 112 designated BUFFER_0 and a second buffer 114 designated BUFFER_1. A first notification unit 116 stores a notification associated with BUFFER_0, and a second notification unit 118 stores a second notification associated with BUFFER_1. The notification units 116 and 118 are used to determine ownership of buffers 112 and 114, respectively. The CPU 52 renders image data into buffers 112 and 114 as shown at 120, and the video engine 102 scans image data from buffers 112 and 114 after the CPU 52 has finished rendering data as shown at 122. The CPU 52 and the video engine 102 are both operative to access the notifications 116 and 118. Each of the first and second notifications may be implemented in any memory storage unit that is mutually accessible by the CPU and video engine.
While the CPU 52 is rendering data into BUFFER_0, the first notification 116 indicates that BUFFER_0 is owned by the CPU, and the video engine 102 may not access BUFFER_0. When the CPU 52 is done rendering to BUFFER_0, the CPU notifies the video engine that the video engine now owns BUFFER_0 by writing a command to the video engine 102 as indicated by the line 130. Consider now that the CPU 52 renders to BUFFER_1. During this time, the video engine 102 accesses BUFFER_0, but may not access BUFFER_1. After the CPU 52 has finished rendering to BUFFER_1, the CPU notifies the video engine that the video engine now owns BUFFER_1 by writing a command to the video engine as indicated by the line 130. After the video engine finishes accessing BUFFER_0, the video engine provides a notification to the CPU by writing an appropriate value to the semaphore 116. The video engine may also indicate to the CPU that the CPU may begin rendering data to BUFFER_0 by providing a bit in a register, which is readable by the CPU.
FIG. 3B shows a block diagram illustrating a conventional graphics system at 140 wherein the CPU 52 provides instructions and associated parameters to the graphics engine 58 as illustrated by a line 142. Typically, the CPU 52 provides instructions and associated parameters to the graphics engine 58 via the bus 56 using programmed I/O (FIG. 2A) or via the bus 56 using the DMA 82 (FIG. 2B). As described above, the graphics engine 58 renders data into the buffers 112 and 114 of the frame buffer, as illustrated by lines 144 and 146 respectively. Note that the CPU can also render directly to the buffers 112 and 114 of the frame buffer as illustrated by lines 148 and 150 respectively, in the event that the CPU needs to modify certain pixels. The CPU 52 is further operative to control the video engine 102 as illustrated by a line 154.
As in the system 100 (FIG. 3A), NOTIFICATION_0 and NOTIFICATION_1 are used by the video engine 102 to notify the CPU regarding ownership status of BUFFER_0 and BUFFER_1; respectively. Likewise, a third notification unit 160 designated NOTIFICATION_2 is written by the graphics engine 58 and read by the CPU 52 in order to notify the CPU when the graphics engine no longer owns BUFFER_0, and a fourth notification unit 162 designated NOTIFICATION_3 is written by the graphics engine and read by the CPU in order to notify the CPU when the graphics engine no longer owns BUFFER_1.
Typically, the interface between the CPU 52 and the graphics engine 58 includes a first-in first-out (FIFO) buffer (not shown). Also, the interface between the CPU 52 and the video engine 102 typically includes a FIFO (not shown) for buffering instructions and associated parameters. Often, the graphics engine 58 and the video engine 102 are implemented on separate chips.
A problem arises because the CPU 52 cannot program, that is write commands to, the graphics engine 58 to start rendering data into BUFFER_0 until the video engine 102 is done scanning data out of BUFFER_0, and therefore the CPU 52 spins while sampling NOTIFICATION_0 and waiting for the video engine to relinquish ownership of BUFFER_0 back to the graphics engine 58. While the CPU 52 is spinning, the CPU cannot set up a next display list or perform any data transfers for the engines. Because the CPU must orchestrate the use of each of the buffers 112 and 114 by the engines 58 and 102 (as well as any additional engines including the CPU). The CPU must spin while waiting for each engine to finish accessing the buffers, an excess amount of time and processing power of the CPU is wasted. This problem is increased as the length of the FIFOs (not shown) between the CPU 52 and the graphics engine 58, and between the CPU 52 and the video engine 102 increase.
What is needed is an apparatus and method for managing ownership of shared resources wherein each of the execution units sharing ownership of the resource may perform efficiently, and wherein it is not incumbent on a CPU to spend an excess amount of time and processing power orchestrating usage of the shared resources by the execution units.
It is an object of the present invention to provide an apparatus and method for coordinating accessing of at least one shared resource by a plurality of execution units wherein the execution units may perform concurrently and efficiently.
It is also an object of the present invention to provide coordinated access to a shared resource by a plurality of execution units wherein it is not incumbent on a CPU to spend an excess amount of time and processing power coordinating the accessing of the shared resource.
Briefly, a presently preferred embodiment of the present invention includes a shared resource management system providing coordinated accessing of at least one shared resource by a plurality of execution units. The system includes a memory access control unit operative to access a sequence of instructions stored in a memory unit, the sequence of instructions including a plurality of execution instructions and a plurality of synchronization commands interspersed between associated execution instructions. The system also includes: a first execution unit communicatively coupled with the memory access control unit for receiving associated execution instructions from the memory access control unit via a first channel; and a second execution unit communicatively coupled with the memory access control unit for receiving associated execution instructions from the memory access control unit via a second channel, the first and second execution units being capable of accessing at least one shared resource. The memory access control unit is responsive to the synchronization commands and operative to access at least one semaphore value stored in a semaphore storage location, the semaphore value being associated with a shared resource and indicating an ownership status for the shared resource. The control unit is operative to manage the flow of the execution instructions to the first and second execution units via the first and second channels in order to cause the execution units to cooperate in their accessing of the shared resource.
The synchronization commands include: an acquire command having an associated acquire value, the acquire command indicating that the associated execution unit may acquire ownership of a shared resource upon a determination that the associated acquire value has a predetermined relationship with an associated semaphore value; and a release command having an associated release value indicating that the associated execution unit is to relinquish ownership of the shared resource after the associated execution unit is done accessing a shared resource.
The memory access control unit is operative to perform a shared resource ownership acquisition process in response to an acquire command. The acquire process includes the steps of: determining whether the received acquire value has a predetermined relationship with an associated current semaphore value; and if the received acquire value has a predetermined relationship with the associated current semaphore value, providing a portion of the execution instructions associated with the received acquire command to the associated execution unit via the associated channel. The memory access control unit is also operative to perform a shared resource ownership release process in response to a release command. The release process includes the steps of: determining whether the associated execution unit is done processing a portion of the associated execution instructions associated with a previously received acquire command; and if the execution unit is done, writing the associated release value to an associated semaphore storage location.
The memory access control unit includes: a register for storing the acquire values and the release values; and a control logic unit coupled with the register. In one embodiment, the first and second execution units are each capable of accessing a plurality of shared resources, each of the resources having an associated semaphore. In this embodiment, each of the synchronization commands has an associated pointer value indicating an associated semaphore, and the memory access control unit includes a third register for storing the pointer values.
An important advantage of the shared resource management system of the present invention is that it is not incumbent on a central processing unit to coordinate accessing of the shared resources by sampling semaphore values and spinning while the execution units finish executing instructions.
Another important advantage of the shared resource management system of the present invention is that ownership of a shared resource is transferred between the execution units quickly, and the execution units may process instructions efficiently.
The foregoing and other objects, features, and advantages of the present invention will be apparent from the following detailed description of the preferred embodiment, which makes reference to the several figures of the drawing.