1. Field of the Invention
The present invention relates generally to buffering information in information processing systems such as computer systems and microprocessors, and, more particularly, to optimization of single-port memories.
2. Description of the Related Art
Computer systems are information handling systems which may be found in many forms including, for example, mainframes, minicomputers, workstations, servers, personal computers, network computers, terminals, hand-held systems and embedded systems. A typical computer system includes at least one processing unit, associated memory and a number of input/output (I/O) devices. An embedded computer or embedded system is a special purpose computer system that is built into another device and that may or may not have the various elements of typical standalone computer systems. An embedded system is a part of the larger system and performs some of the requirements of the larger system.
A typical computer system processes information according to a program (a sequence of internally stored instructions such as a particular application program and/or an operating system) and produces resultant output information via I/O devices if present. Typically, a program counter of the processor of the computer system provides a series of memory addresses which are used by the processor for fetching instructions stored in the associated memory to obtain the instructions. For each memory access, the processor conveys the memory address to the memory over an address bus and the memory responds by conveying to the processor over an instruction/data bus the instruction stored in the corresponding addressed memory location. The instructions stored in the memory constitute the program for the processor. Multitasking processors typically include many "concurrently" executing programs or processes.
During program development, it is advantageous to verify the correctness of program instructions stored in the memory to be executed by the processor. However, the growth in software complexity, coupled with increasing processor clock speeds, has placed an increasing burden on application software developers. The cost of developing and debugging new software products is now a significant factor in processor selection. A processor's failure to adequately facilitate software debug results in longer customer development times and reduces the processor's attractiveness for use within industry. The need to provide software debug support is particularly acute within the embedded products industry, where specialized on-chip circuitry is often combined with a processor core.
In addition to the software engineer, other parties are also affected by debug tool configuration. These parties include: the trace algorithm developer who must search through captured software trace data that reflects instruction execution flow in at processor; the in-circuit emulator developer who deals with problems of signal synchronization, clock frequency and trace bandwidth; and the processor manufacturer who does not want a solution that results in increased processor cost or design and development complexity.
With desktop systems, complex multitasking operating systems are currently available to support debugging. However, the initial task of getting these operating systems running reliably often requires special development equipment. While not the standard in the desktop environment, the use of such equipment is often the approach taken within the embedded industry. Logic analyzers, read-only memory (ROM) emulators and in-circuit emulators (ICE) are frequently employed. In-circuit emulators do provide certain advantages over other debug environments, offering complete control and visibility over memory and register contents, as well as overlay and trace memory in case system memory is insufficient. Use of traditional in-circuit emulators, which involves interfacing a custom emulator back-end with a processor socket to allow communication between emulation equipment and the target system, is becoming increasingly difficult and expensive in today's age of exotic packages and shrinking product life cycles.
Assuming full-function in-circuit emulation is required, there are a few known processor manufacturing techniques able to offer the required support for emulation equipment. Most processors intended for personal computer (PC) systems utilize a multiplexed approach in which existing pins are multiplexed for use in software debug. This approach is not particularly desirable in the embedded industry, where it is more difficult to overload pin functionality.
Other more advanced processors multiplex debug pins in time. In such processors, the address bus is used to report software trace information during a BTA-cycle (Branch Target Address). The BTA-cycle, however, must be stolen from the regular bus operation. In debug environments where branch activity is high and cache hit rates are low, it becomes difficult to hide the BTA-cycles. The resulting conflict over access to the address bus necessitates processor throttle back to prevent loss of instruction trace information. In the communications industry, for example, software typically makes extensive use of branching and suffers poor cache utilization, often resulting in 20% throttle back or more. This amount of throttling is unacceptable amount for embedded products which must accommodate real-time constraints.
In another approach, a second trace or slave processor is combined with the main processor, with the two processors operating in-step. Only the main processor is required to fetch instructions. The second, slave processor is used to monitor the fetched instructions on the data bus and keeps its internal state in synchronization with the main processor. The address bus of the slave processor functions to provide trace information. After power-up, e.g., via a JTAG (Joint Test Action Group) input, the second processor is switched into a slave mode of operation. Free from the need to fetch instructions, its address bus and other pins provide the necessary trace information.
Another existing approach involves building debug support into every processor, but only bonding-out the necessary signal pins in a limited number of packages. These specially packaged versions of the processor are used during debug and replaced with the smaller package for final production. This bond-out approach suffers from the need to support additional bond pad sites in all fabricated devices. This can be a burden in small packages and pad limited designs, particularly if a substantial number of extra pins are required by the debug support variant. Additionally, the debug capability of the specially packaged processors is unavailable in typical processor-based production systems.
Yet another approach includes the Background Debug Mode (BDM) implemented by Motorola, Inc. of Schaumburg, Ill. In BDM, limited on-chip debug circuitry is provided for basic run control. Through a dedicated serial link requiring additional pins, this approach allows a debugger to start and stop the target system and apply basic code breakpoints by inserting special instructions in system memory. Once halted, special commands are used to inspect memory variables and register contents. This serial link, however, does not provide trace support--additional dedicated pins and expensive external trace capture hardware are required to provide instruction trace data.
Thus, the current solutions for software debugging suffer from a variety of limitations, including: increased packaging and development costs, circuit (complexity, processor throttling, and bandwidth matching difficulties. Further, there is currently no adequate low-cost procedure for providing trace information. The limitations of the existing solutions are likely to be exacerbated in the future as internal processor clock frequencies continue to increase.