1. Field of the Invention
The present invention is related to processor control methods of multiple execution processing cores (“cores”) of CPUs (Central Processing Units), MPUs (Micro Processing Units), DSPs (Digital Signal Processors), and GPUs (Graphics Processing Units, or graphic processing LSIs, or geometry engines), etc., or other applicable processors.
2. Description of the Related Art
Conventionally, computer systems such as servers that demand particularly high processing abilities such as mission-critical processing for enterprises have improved their processing ability by connecting to multiple processors by being structured using a loosely coupled cluster structure, or a tightly coupled structure through an SMP (Symmetrical Multi-Processor) structure.
However, for loosely coupled cluster structures the communication overhead between server nodes is an issue, and for tightly coupled SMP structures the complexity of server hardware is an issue, so either case has limitations regarding the improvement of performance for singular computer systems using current architecture.
Here, in the field of high-end processors, multi-core processors such as CMPs (Chip Multi-Processors), etc. that make performance improvement possible are currently becoming mainstream by applying a multi-core structure that implements multiple cores within one processor.
However, in the case of a multi-core structure such as that of a CMP, etc., in exchange for the improvement of processing performance through an increase in core numbers, problems such as complication of controls by implementing multiple cores and a decrease in yield during semiconductor production due to the increase in die size have occurred. In particular, the decrease in yield during semiconductor production due to the increase in die size is a very important issue for multi-core processors that use multi-cores such as CMPs, etc.
FIG. 1 shows the basic hardware structure conventional in single-core processors.
Processor 101 is comprised of common block 102 structured from secondary shared cache 111 and local interconnect interface 111 [sic], and core block 103 structured from primary command cache 112, primary data cache 113, command branch unit 114, command issuance unit 115, load/store unit 116, general-purpose register file 117, integer arithmetic unit 118, unit for integer arithmetic 119, floating point register file 120, floating point operation unit 121, and floating point operation completion unit 122. Processor 101 connects to other processors and main memory through local interconnect interface 110, and instructions or data is supplied from the main memory.
The instructions supplied from local interconnect interface 110 are supplied to general-purpose register file 117 or floating point register file 120 through secondary cache 111, primary command cache 112, and command branch unit 114, and instructions are given to integer arithmetic unit 118 or floating point operation unit 120.
Data supplied from local interconnect interface 110 gives data to integer arithmetic unit 118 or floating point operation unit 121 by being supplied to general-purpose register file 117 or floating point register file 120 through secondary sharing cache 111, primary data cache 113, and load/store unit 116.
The data that corresponds to operations for the aforementioned integer arithmetic unit 118, or the operation result for integer arithmetic unit 118, is rewritten to the general-purpose register file 117 through integer arithmetic completion unit 119, and is retained. The data that corresponds to operations for floating point operation unit 120, or the operation result for floating point operation unit 120, is rewritten to floating point register file 121 through floating point operation completion unit 122, and is stored.
Thus, to improve processing performance of computer systems such as servers, there is a method to increase the number of operations included within a computer system.
Also, FIG. 2 shows a server structure that uses conventional symmetrical multiprocessors. Processor 201 is structured from singular CORE block 211 and secondary cache block 212.
Also, the server system is structured from the aforementioned multiple processors 201 connected via the processor local interconnect, processor local interconnect arbiter 202, service processor 203 connected via the (Joint Test Action Group) JTAG interface standardized through IEEE 1149.1, and the system back plane crossbar controller 206 connected via the system back plane crossbar. Processor local interconnect arbiter 202 performs arbitration controls between each processor connected to the processor local interconnect. Also, system back plane crossbar controller 206 performs interface controls between each system board connected to the system back plane crossbar.
For CORE blocks 211 within the aforementioned multiple processors 201, settings are made for the register and scan FF, etc. within each CPU by controlling service processor 203 through service processor program 204 and service processor terminal 205, and by performing a scan through the JTAG interface.
Next, as one example of multi-core processor application, FIG. 3 shows a server system structure that uses 2 CMP multi-core processors and thus has 2 cores. Processor 301 is comprised of CORE 0 block 311, CORE 1 block 312, and CMP common block 310. Also, the server system comprises the aforementioned multiple processor 301 connected through processor local interconnect, and processor local interconnect arbiter 202, service processor 203 connected through the JTAG interface, and system back plane crossbar controller 206 connected through the system back plane crossbar. For CORE 0 block 311 and CORE 1 block 312 within the aforementioned multiple processor 301, by controlling service processor 203 through service processor program 204 and service processor terminal 205, and performing a scan through the JTAG interface, the register and scan FF, etc. within each CPU are set.
Also, FIG. 4 shows conventional multi-core processor structure number 1. Processor 401 is a 2-core multi-core processor comprised of CORE 0 block 411, CORE 1 block 412, and CMP common block 410.
Also, the JTAG controller includes Test Access Port (TAP) controller 413, decoder 415, load controller 416, and load register 417 each on the CMP common block side, CORE 0 register controller 418 and CORE 0 setup register 419 each on the CORE 0 block side, and CORE 1 register controller 420 and CORE 1 setup register 421 each on the CORE 1 block side.
TAP controller 413 performs setting of the load data (scan data) for the core by scan controls for load register 417. Then, JTAG command 414 issued from TAP controller 413 is decoded by decoder 415, and using those decoded results, load controller 416 controls CORE 0 register controller 418 and CORE 1 register controller 419 using a load control signal (load valid) of load data scan set to load register 417, and the same load data settings are simultaneously done to CORE 0 setup register 419 and CORE 1 setup register 420.
In this conventional structure number 1, since only the same load data can be set to CORE 0 setup register 419 and CORE 1 setup register 420, there was a problem of being unable to set individual settings on each core.
Next, FIG. 5 shows conventional multi-core processor structure number 2. Processor 501 is a 2-core multi-core processor comprised of CORE 0 block 411, CORE 1 block 412, and CMP common block 410. Also, the JTAG controller includes TAP controller 413, decoder for CORE 0 515, load controller for CORE 0 516, load register for CORE 0 517, decoder for CORE 1 519, load controller 520 for CORE 1, and load register for CORE 1 521 each on the CMP common block side, CORE 0 register controller 418 and CORE 0 setup register 419 each on the CORE 0 block side, and CORE 1 register controller 420 and CORE 1 setup register 421 each on the CORE 1 block side.
TAP controller 413 performs setting of the load data for the core by scan controlling load register for CORE 0 517 and load register for CORE 1.
First, JTAG command-0 514 issued from TAP controller 413 is decoded by decoder for CORE 0 515, and through those decoded results, load controller for CORE 0 516 controls CORE 0 register controller 418 using a load control signal (load valid) signal on the load data scan set to load register for CORE 0 517, and by controlling CORE 0 register controller 418, the aforementioned load data setting is done on CORE 0 setup register 419.
Next, JTAG command-1 518 issued from TAP controller 413 is decoded with decoder for CORE 1 519, and from these decoded results, load controller for CORE 1 520 controls CORE 1 register controller 420 using a load control signal (load valid) signal on the load data scan set to load register for CORE 1 521, and by controlling CORE 1 register controller 420, the aforementioned load data setting is done on CORE 1 setup register 421.
In this conventional structure number 2, similar to decoder for CORE 0 515, load controller for CORE 0 516, load register for CORE 0 517, decoder for CORE 1 519, load controller for CORE 1 520, and load register for CORE 1 521, the same number of hardware for core controls as the number of cores is required. Due to this, there has been an issue of difficult applicability, since multi-core control logic must become large-scale for large-scale multi-core processors that are likely to become mainstream in the future.
Also, FIG. 6 shows conventional multi-core processor structure number 3. Processor 601 is a 2-core multi-core processor comprised of CORE 0 block 411, CORE 1 block 412, and CMP common block 410. Also, the JTAG controller includes TAP controller 413, decoder 616, load controller 617, and load register 618 each on the CMP common block side, and CORE 0 register controller 418 and CORE 0 setup register 419 each on the CORE 0 block side, and CORE 1 register controller 420 and CORE 1 setup register 421 each on the CORE 1 block side.
First, TAP controller 413 performs settings of load data for CORE 0 through scan control of load register 618. JTAG command-0 614 issued from TAP controller 413 is decoded by decoder 616, and from those decoded results, load controller 617 controls CORE 0 register controller 418 using a load control signal (load valid-0) of the scan set load data on load register 618, and the aforementioned load data is set on CORE 0 setup register 419.
Next, TAP controller 413 performs settings of load data for CORE 1 through scan control of load register 618. JTAG command-1 615 issued from TAP controller 413 is decoded by decoder 616, and from those decoded results, load controller 617 controls CORE 1 register controller 420 using a load control signal (load valid-1) of the scan set load data on load register 618, and the aforementioned load data is set on CORE 1 setup register 421.
In this conventional structure number 3, similar to JTAG command-0 614 and JTAG command-1 615, the same number of JTAG commands as the number of cores is required. Due to this, there has been an issue of difficult applicability, since decode logic must become large-scale for large-scale multi-core processors that are likely to become mainstream in the future.
Other patent literature for chip multiprocessors includes Japanese Unexamined Patent Application Publication 2001-51957.
As described above, conventional technology of processors with multi-core structures through CMP, etc., has problems such as complication of controls for multiple cores and a decrease of yield due to an increase in die size. A need arises for a technique by which controls for multiple processors can be provided without undue complication or decrease of yield.