1. Field of the Invention
The present disclosure relates to computer processing systems.
2. State of the Art
A computer processor and the program which it executes needs places to put data for later reference. A computer processor will typically have many such places, each with its own trade off of capacity, speed of access, and cost. Usually these are arranged in a hierarchal manner referred to as the memory system of the computer processing system, with small, fast, costly places used for short lived small data and large, slow and cheap places used for what doesn't fit in the small, fast, costly places. The hierarchical memory system typically includes the following components arranged in order of decreasing speed of access:                register file or other form of fast operand storage;        one or more levels of cache memory (one or more levels of the cache memory can be integrated with the processor (on-chip cache) or separate from the processor (off-chip cache);        main memory (or physical memory), which is typically implemented by DRAM memory and/or NVRAM memory and/or ROM memory; and        on-line mass storage (typically implemented by one or more hard disk drives).        
In many computer processing systems, the main memory can take several hundred cycles to access. The cache memory, which is much smaller and more expensive but with faster access as compared to the main memory, is used to keep copies of data that resides in the main memory. If a reference finds the desired data in the cache (a cache hit) it can access it in a few cycles instead of several hundred when it doesn't (a cache miss). Because a program typically has nothing else to do while waiting to access data in memory, using a cache and making sure that desired data is copied into the cache can provide significant improvements in performance.
The address space of the program can employ virtual memory, which provides for two different purposes in modern processors. One purpose, hereinafter paging, permits the totality of the address spaces used by all programs to exceed the capacity of the main memory attached to the processor. The other purpose, hereinafter address extension, permits the totality of the address spaces used by all programs to exceed the address space supported by the processor.
Paging can be used to map the virtual addresses used by the program at page granularity to physical addresses recognized by the main memory or to devices such as disk that are used as paging store. The set of valid virtual addresses usable without error by a program is called its address space. The address mapping is represented by a set of mapping tables maintained by the operating system as it allocates and de-allocates memory for the various running programs. Every virtual address must be translated to the corresponding physical address before it may be used to access physical memory.
Systems with caches differ in whether cache lines store tags defined by a physical address (physical caching) or a virtual address (virtual caching). In the former, virtual addresses must be translated at least before they are used to match against the physical addressed tags of the cache; in the latter, translation occurs after cache access and is avoided if the reference is satisfied from cache.
Address extension is not needed when the space encompassed by the representation of a program address is large enough. Common representations of program address space are four bytes (32 bits) and eight bytes (64 bytes). The four-byte representation (yielding a four gigabyte address space) is easily exceeded by modern programs, so addresses (and address spaces) must be reused with different meanings by different programs and address extension must be used. Reuse of the same address by different programs is called aliasing. The computer processing system must disambiguate aliased use of addresses before they are actually used in the memory hierarchy.
In a computer processing system employing physical caching, alias disambiguation occurs prior to the caches. In a computer processing system employing virtual caching, disambiguation can occur after the caches if the caches are restricted to hold only memory from a single one of the aliased addressed spaces. Such a design requires that cache contents be discarded whenever the address space changes. However, the total space used by even thousands of very large programs will not approach the size representable in 64 bits, so aliasing need not occur and address extension is unnecessary in 64-bit machines. A computer processing system that does not use address extension permits all programs to share a single, large address space; such a design is said to use the single-address-space model.
It happens that the same hardware can be used both to disambiguate aliases and to map physical memory, and such is the common arrangement. Because alias disambiguation is typically performed prior to physical caches, using the common hardware means that page mapping occurs their too. When paging and alias disambiguation are in front of physical caches, it is also common to use the same hardware for access control, restricting the kinds of access and the addresses accessible to the program. The hardware enforced restrictions comprise the protection model of the processor and memory system. Protection must apply to cache accesses, so the protection machinery must be ahead of the caches. Hence it is common to have one set of hardware that intercepts all accesses to the memory hierarchy and applies protection restriction, alias disambiguation, and page mapping all together. Because all this must be performed for every reference to memory, and specifically must be performed before cache can be accessed, the necessary hardware is power hungry, large and on the critical path for program performance.
Furthermore, modern CPU architectures support protected multiprocessing where different program invocations are given their own sets of private resources (a process) and then run in parallel, with a combination of hardware and software ensuring that no program can inspect or change the private resources of any other. This protected multiprocessing is often accomplished by letting the CPU execute the code of one process for a while (with access to the resources of that process), and then changing the hardware execution environment to that of another process and running that one for a while with access to the resources of the second but no longer with access to the resources of the first. Changing from running one process to running another is called a process switch and is very expensive in machine terms because of the amount of state that has to be saved and restored as the process context is changed.
A process can contain multiple threads. A thread is a sequence of one or more instructions executed by the CPU. Typically, threads are used for small tasks, whereas processes are used for more heavyweight tasks, such as the execution of applications. Another difference between a thread and a process is that threads within the same process share the same address space, whereas different processes do not. This allows threads to read from and write to the same data structures and variables, and also facilitates communication between threads.
Communication between processes (also known as inter-process communication) can be quite difficult and resource-intensive. For example, one process may be the source of some data that it computes or reads from a file, while the second is a sink for the data which it uses in its own operation or writes out to a file in turn. In the usual arrangement, the processes arrange for a buffer to be created as a shared resource in memory that they both have access to. The source then fills the buffer with data and triggers a process switch to the sink. The sink consumes the data, and then triggers a process switch back to the source for more data. Each buffer full thus involves two process switches. In addition, the processes must establish some protocol to make sure that (for example) the source doesn't start putting more data into the buffer before the sink has finished emptying it of the previous data. Such protocols are difficult to write and a frequent source of subtle bugs.
Communication between the threads of a process is much easier. In one method, both source and sink threads can run concurrently (on separate cores) or semi-concurrently (being swapped in and out of a single core), and communicate using shared data structures similar to inter-process communication. In a second method, only one thread is active at a time (no matter how many cores are available), and a special operation or system function permits the running thread to give up control to an idle thread, possibly passing arguments to the idle thread. This method is typically referred to as “coroutines,” and the operation that stops the active thread and passes control to the idle thread is often called a “visit.” Processes can also communicate as coroutines.
The difference between process and thread-based communication is that the threads share their whole environment, while processes don't, although they may share limited quantities of state for purposes of the communication. Thus, current CPU architectures require that the program code of the cooperating source and sink threads share resource environments. If the code of the source and sink are to have private resource sets, they must be organized as separate processes and utilize the process switch machinery and a custom protocol.