A reference in this specification to a published document is not to be taken as an admission that the contents of that document are part of the common general knowledge of the skilled addressee of the present specification. Examples of memory management architectures are disclosed in [1], [2], and [3]. The technical terms employed to describe the architecture of various memory protection unit (MPU) and memory management unit (MMU) technologies sometimes have conflicting definitions. Throughout this specification, including the claims:                ‘Comprises’ and ‘comprising’ are used to specify the presence of stated features, integers, steps or components but do not preclude the presence or addition of one or more other features, integers, steps, components.        A memory store (e.g. 118 of FIG. 1) coupled with a memory controller (e.g. 115 of FIG. 1) may be described at a higher level of abstraction as a memory store.        A peripheral may (e.g. network controller 105 of FIG. 1) or may not (e.g. cryptographic accelerator module) have external I/O pins. A peripheral comprises at least one interconnect interface, in which each interconnect interface is either an interconnect-master or interconnect-target port.        A bus is a type of interconnect. A crossbar is a type of interconnect.        A memory-to-memory direct memory access (M2M DMA) unit (e.g. 140 of FIG. 1) is a programmable hardware circuit specifically optimised for issuing memory transfer requests over one or more interconnect-master ports (e.g. 143 and 144 of FIG. 1) for the purpose of reading the value of memory stored in one memory location and writing that value to a different memory location. A M2M DMA is a slave device subject to control by a different master device (e.g. 110 or 194 of FIG. 1). A well known example of this type of M2M DMA unit is the Intel 8237A. A M2M DMA unit may also offer additional memory movement related capabilities, such as reading contiguous memory locations from a memory store and writing each word of data to the same address of a memory mapped peripheral. A M2M DMA unit can be described as a programmable direct memory access (PDMA) unit.        A memory protection unit (MPU) receives a memory transfer request associated with an input address space and in response generates a memory transfer requests associated with an output address space. An MPU is characterised in that (a) access controls may be applied to one or more regions of the input address space; and (b) the MPU always employs an identity transformation between the address of a memory transfer request in the input address space and the address of the corresponding memory transfer request in the output address space. Some MPU architectures are explicitly designed to support the mapping of two or more region descriptors to the same contiguous region of the input address space at run-time.        A memory management unit (MMU) receives a memory transfer request associated with an input address space and in response generates a corresponding memory transfer request associated with an output address space. An MMU is characterised in that (a) access controls may be applied to one or more regions of the input address space, and (b) the MMU is adapted to translate the address of a memory transfer request associated with the input address space and the address of the corresponding memory transfer request in the output address space.        A well-formed memory transfer request is any memory transfer request that correctly satisfies the associated interconnect protocol requirements for a memory transfer request. The reception of a well-formed memory transfer request implies that that the request was not malformed on issue and that the request was not corrupted in transit.        
In many publications describing memory management technologies, a “virtual address space” is mapped to a “physical address space”. This terminology is unambiguous when there is a single level of address translation means for software running on a general purpose processor that is employed in a given computer architecture. In this specification we use the terminology an “input address space” is mapped to a “translated address space”. This later terminology can be used consistently for each level of memory address translation means when considering computer architectures that have two or more levels of address translation means. This later terminology can also be used consistently for memory address translation means that are adapted to receive memory transfer requests from general purpose processors, graphics processors and other types of interconnect-master peripherals.
Throughout this specification, including the claims, we define a page, page descriptor, frame, segment, segment descriptor and range descriptor as follows:                A “frame” of N bytes in length defines a contiguous region of memory in a translated address space that is N bytes in length and that starts on an N byte boundary.        A “page” of N bytes in length defines a contiguous region of memory in an input address space that is N bytes in length and that starts on an N byte boundary. A “page descriptor” describes a page of memory associated with an input address space. A page of memory in an input address space may be mapped to a frame of memory in a translated address space.        A “segment” of N bytes in length defines a contiguous region of memory in an input address space that is N bytes in length that starts on an O byte boundary. The allocated portion of a segment may be less than N bytes in length and may also start at an address offset located within that segment. The terminology “a variable length segment” implies that the length of the allocated portion of a segment may vary. The allocated portion of a segment may be mapped to a contiguous region of memory on a P byte boundary in a translated address space. The value of O and the value of P may also be different. The relationship between the values of N, O and P vary depending on the segmentation scheme implementation details. In practice, some computer architectures are designed to employ means implemented in hardware to prevent the allocated portions of two programmable segments overlapping in the input address space. Correct operation of other computer architectures may require the executive software to ensure that the allocated portions of two programmable segments do not overlap in the input address space at run-time. A “segment descriptor” describes a segment of memory associated with an input address space.        A “range descriptor” of N bytes in length defines a contiguous region of memory in the input address space that is N bytes in length. A range may be defined by a lower-bound address and an upper-bound address, or a lower-bound address and a range length. If there is no programmable address translation enabled in a given range descriptor, a range in the input address spaced is mapped using the identity transformation to a contiguous region of memory in the output address space. A range descriptor may start and stop on fine grain boundaries (e.g. 64-byte granularity) in the input address space. Alternatively, a range descriptor may start and stop on coarse grain boundaries in the input address space (e.g. kibibyte granularity). It is common with commercial off the shelf MPU implementations to explicitly permit two or more range descriptors, in which those range descriptors do not have programmable address translation capabilities, to be associated with overlapping memory regions in the input address space.        
Throughout this specification, including the claims we define a cache line, cache block, cache sub-block and a cache tag as follows:                A “cache line” is a contiguous region of memory. Traditionally in general purpose computer architectures, the length of a cache line ranges from 8-bytes to 32-bytes. In principle, a cache line could have the same length as the maximum length of a page or a segment. Each cache line is associated with a cache tag. In the context of cache lines, a “cache tag” stores metadata about a cache line. That metadata may include, but may not be limited to, its address in the input address space, its address in the translated address space and the status of that cache line.        A “cache block” is a contiguous region of memory subdivided into cache sub-blocks. Traditionally in general purpose computer architectures a cache block is comprised of 2 to 4 cache sub-blocks. Traditionally in general purpose computer architectures, the length of a cache sub-block ranges between 8-bytes to 32-bytes. Each cache block is associated with a cache tag. In the context of cache blocks, a cache tag stores metadata about a cache block. That metadata may include, but may not be limited to, its address in the input address space, its address in the translated address space and the status of the cache sub-blocks.        
Throughout this specification, including the claims, we define a “programmable memory transfer request processing” (PMTRP) unit and a “region descriptor” as follows:                A PMTRP unit is adapted to receive and process memory transfer requests according to various policies, in which each memory transfer request is associated with a specific address space, and each address space is associated with certain policies to be enforced by that PMTRP unit. A “region descriptor” is used to associate various policies with a specific region of a specific address space associated with a specific PMTRP unit instance. For example:                    a region descriptor may or may not be adapted with one or more access control fields;            a region descriptor may or may not be adapted with one or more address translation fields; and            a region descriptor may or may not be adapted with fields that modify the default behavior of the memory subsystem that receives memory transfer requests issued by the PMTRP unit (e.g. by adjusting the cache write policy and/or memory order policy).                        The region of an address space associated with a region descriptor:                    may or may not be constrained with regards to a specific subset of all possible base address offsets within an address space; and            may or may not be constrained with regards to a specific subset of all possible region lengths.                        
Consequently, the fields of a “region descriptor” can be adapted to implement a variety of descriptors. This includes, but is not limited to: page descriptors; segment descriptors; translation look aside buffer descriptors; range descriptors without programmatic address translation; range descriptors with programmatic address translation; and cache tags.
Clearly, the specific fields of a given region descriptor are defined specifically for that region descriptor instance.
A PMTRP unit is defined independently from the one or more interconnect-masters that are adapted to issue memory transfer requests to the one or more interconnect-target ports of that PMTRP unit. By way of non-limiting example, a PMTRP unit that implements MMU functionality may be adapted for use as a private IOMMU for one interconnect-master peripheral without loss of generality. Furthermore, a PMTRP unit that implements MMU functionality may be adapted for use a shared IOMMU that is shared across multiple interconnect-master peripherals without loss of generality.
Throughout this specification, including the claims we define a “programmable region descriptor” as a region descriptor in which one or more fields of that region descriptor may be adjusted programmatically.
Early MMU schemes for managing the main memory of computer architectures were typically adapted for use with main memories that had small storage capacities.
As the storage capacity of physical memory increased, MMU schemes based on different principles were employed to overcome various perceived or actual limitations [1], [2] of those earlier MMU schemes that were designed for small storage capacities.
To the best of the author's knowledge, all published MMU schemes that support large input address spaces with fine grain memory allocation capabilities employ (either software or hardware controlled) translation look aside buffers (TLB). Those TLB are used to cache a relatively small number of the potentially very large number of region descriptors that can be associated with an input address space. Consider the VAX-11/780 architecture [3]. The VAX-11/780 MMU scheme requires 8,388,608 region descriptors to allocate the entire 32-bit input address space [2]. Some implementations of the VAX-11/780 employed a unified TLB to cache up to 128 of those up to 8,388,608 region descriptors [2]) in high-speed memory, and stored the enabled region descriptors in one or more tables that were stored in relatively slower main-memory storage. Subsequently, to the best of the author's knowledge, industry practice has predominantly focused on employing two (or more) levels of indirection (indexed schemes, hash schemes, linked-list schemes) when searching for region descriptors to improve the management of a potentially very large number of enabled region descriptors. The industry trend towards the use of two or more levels of indirection is apparently to overcome various technical difficulties found in the single-level translation architectures such as the VAX-11/780 architecture [2]. To the best of the author's knowledge, all published implementations of MMU schemes which support two or more levels of indirection to access a leaf region-descriptor in a 32-bit or 64-bit address space employ a (software or hardware controlled) TLB to accelerate their performance. It is well known that the use of a TLB to cache region descriptors in combination with a data cache significantly increases the complexity of performing static timing analysis of software running on a processor core that has a data cache that is enabled and that has a TLB capability that is enabled.
There is a long-felt need for an MMU architecture that has low-latency, high-throughput, constant time operation with support for relatively fine-grain memory allocation in 32-bit and 64-bit input address spaces. In the microcontroller market, there is also a need to provide a PMTRP unit that can operate as a memory protection unit (MPU) and also operate as a MMU to run commercial high-assurance security-critical real-time operating systems (RTOS). This is because many high-assurance RTOS rely on the availability of a hardware MMU with address translation capabilities.
There is also a long-felt need for a means to cost effectively accelerate the re-programming of region descriptors with lower-latency in a time-analysable way in real-time environments to support faster task-swapping and improved system performance.
There is also a long-felt need to support two-levels of address translation, in which each level of the address translation is under the control of different software (e.g. a hypervisor controls a first level of the MMU scheme and an operating system hosted on the hypervisor controls a second level of that MMU scheme), that is suitable for use in statically time-analysable real-time systems.
In resource constrained environments (such as the Internet of Things) that must run page based MMU schemes to support general purpose operating systems such as Linux, there is also a compelling market need for an innovative MMU architecture that requires less hardware circuit area to implement than conventional page-based MMU and that also supports faster execution of software after a user address space context swap.
Preferred embodiments of the present invention provide new and innovative solutions to the above mentioned market needs.