Many data processing systems, such as computers, employ memory access techniques that utilize virtual address space to physical address space mapping techniques. For example, personal computers and other data processing devices have a fixed amount of physical address space. This is typically considered the host processor's physical memory address space. As other processes, such as software applications and peripheral devices, including graphics processors, modems and other processes are used, the operating system typically provides fragmented, physically non-contiguous memory allocation for each given process. However, a process is given virtual address spaces that are contiguous address spaces. The host processor typically includes a mapping function to map virtual addresses space into physical address space. This defines a certain "view" of the memory subsystem as seen by the process(ors). As part of this operation, the operating system manages page tables for free memory. Each process, such as a software application, may get its own map of memory and hence its own page tables. Each page table entry may, for example, specify the mapping for a 4 kilobyte block of physical memory. Cooperating processes may share page tables and conflicting or competing processes may require exclusive use of unique page tables.
As such, multiprocess environments may use hierarchical page tables. Operating systems, for example, use hierarchical page tables as an indexing system. Typically two levels of tables are used. A first level includes a page table directory and a second level includes the page tables indexed in the page table directory. A host processor may subdivide the physical memory into four kilobyte logical pages and a memory management unit logically composes a 4 gigabyte contiguous linear address space from a sequence of physically non-adjacent pages. The page table directory holds a directory of pointers to all page tables for a particular process or application. Each page table may hold 1,024 entries of 32 bits each, for example. Each entry in the page table directory corresponds to a page table. Each page table, however, may correspond to 4 kilobytes*1024 of noncontiguous physical memory. However, since the page tables are indexed in a particular sequence, a processor or application sees the information as a contiguous memory space. As such, each entry in the page table directory points to one table and each entry in the page table points to one page. Pages being not necessarily contiguous in physical address space. Such systems typically require translation from virtual to physical address space for each page by using page table directories and/or page tables.
Many peripheral devices typically do not provide addressing space equal to that of the host processor or main processor. This can become problematic when newer versions of the peripheral device or software application are developed. A complete new addressing management scheme may typically have to be implemented to accommodate increased addressing space.
When such peripheral applications or devices, such as other non-host applications or non-operating system processes, use real time requests requiring data from memory in real time, the memory allocation technique must be fast enough and have a high enough bandwidth to accommodate real time requests. Real time requests may include, for example, real time video or audio requests from a display engine or audio engine.
Virtual memory based systems and page tables also typically utilize translation look aside buffers (TLB) which are typically dedicated to each process or processor in a multiprocessor system. These buffers are used to speed up access of data retrieval by avoiding redundantly looking up page table entries. Translation look aside buffers contain, for example, a directory entry and corresponding page table entries. However, these translation look aside buffers must also be suitably updated so that the proper virtual address is being retrieved at any given time.
With multiple processors maintaining a copy of a primary page table, it is difficult to keep page tables coherent for multiple processors since each have their own lookup buffers serving as cached versions of the page lookup tables. These translation look aside buffers can get out of synch with the actual page table when a page table is updated. For multiple processors sharing the same page table, both processors typically access the same physical address space so there needs to be synchronization to avoid conflicts. As a result, sophisticated synchronization techniques have to be used which require additional processing power and can slow down the system.
A major source of inefficiency results from one processor waiting for another processor to reach a synchronization point. Synchronization points guarantee that synchronized processes share the same view of memory at any point in time because they use equivalent page tables. The dynamic behavior of processes often results in changing views of memory requiring dynamic maintenance of page tables. Each process may progress at a different rate, leading to same views at different times or equivalent views at the same time.
In other systems, each processor may have its own page table stored but with multiple processors each require different versions of the page table at different times depending upon where in their particular process they are. For example, with a host processor and a graphics accelerator processor, a graphics accelerator typically does not need the full page table of the system to operate and only requires a subset thereof. However, the host processor typically updates the graphics controller page table only after the graphics controller has finished performing its operations relating to the current page table. The update of the host processor page table and the subset of the page table for the other processor, for example, is typically only done when the subsequent processor is idle. The host processor also will update its own translation look aside buffer, but typically does not update the translation look aside buffer of any other processor. Each processor typically has to update its own translation look aside buffer and also must wait until the host processor updates the main page table to avoid errors due to premature table updates.
Because one processing unit and another processing unit may process tasks at different speeds, and since the host processor is typically required to update all page tables, the host processor must wait until other processors have completed their tasks. This can be quite inefficient when the processors are performing functions for one another. For example, in a graphics accelerator system, the host processor may generate textures and store them within a new virtual memory space but the graphics controller may not have finished using the current page tables so the host processor has to wait for the graphics controller to become idle. Then, if the host processor knows that the other processor is idle, the host processor will then update the main page table and any page table for the graphics controller. A synchronization barrier slows down the system since the host processor can be idle with respect to the graphics controller and by doing so reducing concurrency between the processors.
A problem also exists with multiple processors using a common virtual memory space since the host processor or page table updater needs to notify the other processors immediately to have the other processors update their translation look aside buffers. The page table updater needs to wait until all processors are completed with their respective processes before the common page table can be updated. This also creates a synchronization barrier since some processors may be idle waiting for others to finish using the page table.
Consequently, there exists a need for an improved page table update method and apparatus to facilitate efficient page table updates among a plurality of processing units or processes.