The present invention relates to the field of Symmetric Multi-Processing (SMP) systems, and more particularly to an SMP system where attached processing units have restricted access to a shared memory without being structurally configured with an address translation mechanism.
One widely accepted system architecture for personal computers has been the Symmetric Multi-Processing (SMP) architecture. Symmetric Multi-Processing (SMP) computer architectures are known in the art as overcoming the limitations of single or uni-processors in terms of processing speed and transaction throughput, among other things. Typically, commercially available SMP systems are generally xe2x80x9cshared memoryxe2x80x9d systems, characterized in that multiple processing elements on a bus, or a plurality of busses, share a single global memory. In shared memory multiprocessors, all memory is uniformly accessible to each processing element, which simplifies the task of dynamic load distribution. Processing of complex tasks can be distributed among various processing elements in the multiprocessor system while data used in the processing is substantially equally available to each of the processing elements undertaking any portion of the complex task. Similarly, programmers writing code for typical shared memory SMP systems do not need to be concerned with issues of data partitioning, as each of the processing elements has access to and shares the same, consistent global memory.
SMP systems typically run multiple processes or threads at a time where each process requires some amount of physical memory, i.e., a block of physical memory, in the shared memory. Since the amount of physical memory in the shared memory is limited, it must be allocated among the different processing elements. Typically, physical memory may be divided into pages where the pages are allocated to different processing elements. Physical memory that is so allocated may be referred to as mapped memory.
Each process that may be allocated a block of physical memory may further be provided with a set of translations for translating virtual addresses to assigned physical addresses of the allocated block. Each set of translations may be stored in what is commonly referred to as a page table. Page tables are typically stored in the shared memory.
Page tables are commonly indexed by virtual page numbers and include a Page Table Entry (PTE) for each virtual page address. If a virtual page is stored in the shared memory, then a corresponding PTE may include a physical address of the page. The PTE for a page may be identified by looking at an index that corresponds to the virtual page address.
When a process requests access to a particular virtual memory address, a page table that is associated with the process is searched for the requested virtual memory address. When the virtual address is found, the process may access the desired page using the physical address in the PTE that is associated with the virtual address.
Each processing element in the SMP computer architecture may comprise a processing unit. The processing unit may comprise a central processing unit, e.g., Power PC(trademark), and an address translation mechanism such as a Translation Lookaside Buffer (TLB). A TLB may be used for storing a number of most recently used virtual memory address-to-physical memory address translations, i.e., PTE translations. When a processing unit retrieves a translation from the PTE in the shared memory, it typically stores the translation in an associated TLB. The processing unit may retrieve a translation from the TLB faster than from an associated cache or the shared memory.
Each processing element in the SMP computer architecture may further comprise a plurality of Attached Processing Units (APU""s). In prior art SMP architectures, each APU may be structured to perform a particular task, e.g., image compression, image decompression, transformation, clipping, lighting, texturing, depth cueing, transparency processing, set-up, screen space rendering of graphics primitives, by the processing unit. That is, an APU may be configured to perform a particular operation, e.g., floating point calculation, vector calculation. For example, an APU may be a floating point unit configured to execute floating point operations on source operands. One of the advantages of an APU structurally configured to perform a particular operation is that the APU""s do not have to perform address translation, i.e., mapping virtual addresses to physical addresses. By not performing address translation, APU""s do not have to be structurally configured with an address translation mechanism, e.g., TLB, and thereby reduce the complexity of the APU""s.
Unfortunately, APU""s in prior art SMP computer architectures may not access the shared memory because APU""s are not structurally configured with an address translation mechanism, e.g., TLB.
It would therefore be desirable to develop an SMP computer architecture where the APU""s have restricted access to the shared memory without being structurally configured with an address translation mechanism. It would further be desirable to develop an SMP computer architecture where the APU""s have more capabilities than prior art APU""s, i.e., structured to perform a particular task. It would further be desirable to develop an SMP system where Translation Lookaside Buffer (TLB) consistency may be maintained by the processing units only.
The problems outlined above may at least in part be solved in some embodiments by an SMP system comprising direct memory access controllers with an address translation mechanism, e.g., Translation Lookaside Buffer (TLB). Attached processing units may then be configured to issue a request to access the shared memory to its associated direct memory access controller. Since the direct memory access controllers comprise an address translation mechanism, attached processing units may request to access the shared memory specifying the range of addresses to be accessed as virtual addresses instead of physical addresses thereby foregoing the need of an address translation mechanism.
In one embodiment, a system comprises a shared memory. The system further comprises a plurality of processing elements coupled to the shared memory. Each of the plurality of processing elements comprises a processing unit, a direct memory access controller and a plurality of attached processing units. Each processing unit comprises an address translation mechanism. Each direct memory access controller comprises an address translation mechanism thereby enabling each of the plurality of attached processing units to access the shared memory in a restricted manner without an address translation mechanism. Each of the plurality of attached processing units is configured to issue a request to an associated direct memory access controller to access the shared memory where the request specifies a range of addresses to be accessed as virtual addresses. The associated direct memory access controller is configured to translate the range of virtual addresses to be accessed into an associated range of physical addresses.
In another embodiment of the present invention, a method for maintaining TLB consistency in a system comprising a shared memory and a plurality of processing elements coupled to the shared memory where each of the plurality of processing elements comprises a processing unit, a direct memory access controller and a plurality of attached processing units. Each of the plurality of processing units and plurality of direct memory access controllers comprises a TLB. The method comprises the step of invalidating a copy of a page table entry that was updated in a particular TLB by a particular processing unit. The method further comprises issuing a TLB invalidated entry instruction by the particular processing unit. The TLB invalidated entry may be broadcasted to each of the plurality of processing units other than the particular processing unit by the particular processing unit. The method further comprises determining whether to invalidate any entries in the TLB""s associated with each of the plurality of processing units other than the particular processing unit and in the TLB""s associated with each of the plurality of direct memory access controllers other than the direct memory access controller associated with the particular processing unit. The method further comprises issuing a synchronization instruction to each of the plurality of processing units other than the particular processing unit by the particular processing unit.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention.