In conventional program development systems, a human operator (a programmer) specifies keyword definitions, variable declarations and program functions through syntactical data entry into a text file, commonly referred to as a source file. The source file is compiled into a sequence of machine-executable instructions through execution of a compiler (which is itself a computer program), and stored in an object file. The object file may be linked to one or more other object files through execution of a linking program (e.g., a program which resolves inter-object (as opposed to intra-object) references to functions, variables, definitions and so forth), resulting in creation of an executable code sequence stored as yet another file, called an executable file. In a general purpose data processing system, an operating system (another program execution), responds to a program-execution command by retrieving a specified executable file from a relatively slow, non-volatile storage and placing the machine code from the executable file into a smaller, faster memory commonly referred to as main memory or system memory, and allocating storage for program variables in the main memory. Thereafter, program execution occurs by repeatedly fetching and executing instructions; fetching (retrieving) program instructions from main memory, loading the instructions into an instruction register of a processor, and initiating instruction execution in the processor.
FIG. 1A illustrates the actions of the programmer (100), compiler (102) and then hardware (104) with regard to conventional variable declaration and run-time reference. That is, a programmer initially declares the variable through specification of a data type and a variable name as shown at 112. Thereafter, the programmer may specify an operation to be undertaken with respect to the variable by referencing the variable name in a program statement (e.g., the increment operation shown at 114).
Still referring to FIG. 1A, the compiler responds to the variable declaration by allocating an amount of storage space indicated by the data type specifier, and by correspondingly extending the total data storage space to be allocated to the executable program. The compiler converts the variable reference (e.g., in the increment operation) into a machine level load and/or store instruction that is included within the overall executable code sequence loaded from non-volatile storage into a particular region of operating memory (i.e., placed in the operating memory) by the operating system. In an embedded system or for elemental or kernel programs (e.g., basic input/output services or the like), the executable code may be placed into a particular region of operating memory by a bootstrap loader (primitive program that copies the executable code to a predetermined location in the operating memory) or by more permanent disposition in a non-volatile memory (e.g., a read only memory or any variant thereof).
During program execution, the processor executes the load/store instruction, resulting in issuance of a memory read request to an off-chip memory subsystem. If the system includes an on-chip or off-chip cache, the cache will be queried (e.g., by comparing a portion of the memory address issued in the memory access request with contents of a tag memory) to determine whether the data sought has been cached as part of a preceding memory access. If a cache hit occurs (data is in cache), the data will be retrieved from the cache and the off-chip memory access request canceled. Otherwise, a cache miss occurs, and the off-chip memory access request is completed to fetch the requested data to the processor. FIG. 1B illustrates the transfer of data from off-chip memory 159 (random access memory (RAM) such as dynamic RAM (DRAM) or static RAM (SRAM)) to a processor 150 and cache memory 155 that are disposed on an integrated circuit die 140. After the content of memory region ‘x’ is recorded in the cache (i.e., as x’), a subsequent memory access directed to memory region ‘x’ will result in a cache hit, obviating off-chip memory access.
Although the combined actions of the hardware and compiler serve to hide the underlying complexity of memory access from the programmer, enabling the programmer to simply declare variables without concern for their placement in the memory hierarchy represented by the off-chip memory and cache, the on-chip cache tends to be relatively large and slow and thus compromises performance. Worse, in the event of a cache miss, which may occur whenever data has not yet been cached or has been evicted from the cache due to non-access or other reason, a substantial performance penalty occurs in forcing access to off-chip memory.
In high-performance processing systems where the penalties associated with cache operation/cache miss are generally not tolerable, the programmer may specifically place critical data in an on-chip memory that is immediately accessible to the processor. The immediate access to the on-chip memory results in substantially faster access than conventional on-chip cache architectures, and dramatically faster access than in instances of cache miss.
FIG. 2A illustrates the typical operations of a programmer 200, compiler 202 and hardware 204 in a system in which the programmer specifies the on-chip address of critical data. Specifically, the programmer specifies an on-chip address (e.g., as in the definition statement at 210) and anchors a variable at that address through declaration of a pointer to a specified data type, and assignment of the on-chip address to the pointer as shown at 212. Thereafter, the programmer may specify a reference to the on-chip address (i.e., access the content of the variable anchored at the on-chip address) by dereferencing the pointer. This is shown at 214 by an exemplary C programming language statement in which the ‘*’ symbol indicates that the content at the address specified by pointer_variable_name (i.e., 0x60000) is to be incremented.
Still referring to FIG. 2A, a compiler converts the reference to the on-chip address (i.e., the dereferenced pointer) into machine-level instruction to load data from (and potentially to subsequently store incremented at) the on-chip address. As before, the machine level instruction is fetched and executed by a processor, but in this case, due to the specification of on-chip address, execution results in direct access to on-chip memory.
Although substantially higher run-time performance may be achieved through programmer specification of on-chip addresses, program development becomes substantially more complex, particularly where program development is carried out by a team of individuals, as care must be taken to avoid data placement errors (i.e., inadvertent overlap between on-chip storage space allocated to program variables as shown graphically in FIG. 2B by off-chip memory 159 and an integrated circuit 240 that includes a processor 250 and on-chip memory 255). Arranging data in on-chip memory efficiently tends to be time consuming, lengthening the code writing process. Program debugging also tends to become more difficult as data placement errors are often hard to trace. In the worst case, depending on the nature of the data stored and test vectors exercised, the erroneous placement may not be detected at all, leading to release of defective software.