In a so-called computer system such as a server or a personal computer (PC), there is a method of using a system bus called DMA (direct memory access). According to the DMA, the use of a system bus that is normally performed by a CPU (central processing unit) is permitted also to devices (bus master) other than the CPU. In addition, in a multi-processor system including a plurality of CPUs (or CPU cores), the plurality of CPUs are designed to exchange the use of a system bus with one another.
A system in which there are a plurality of devices (including a CPU) permitted to use a system bus is called a multi-master system, and the bus used in the multi-master system is called a multi-master bus.
In the multi-master system, it is one device including the CPU that can use the bus at a certain timing. Accordingly, the multi-master system includes an arbitration circuit (bus arbiter) that arbitrates issued DMA requests when the DMA requests are issued from a plurality of devices at a certain timing and gives a DMA permission to one device thereof.
FIG. 14 is a diagram that illustrates an example of the configuration of a computer system (multi-master system) 100 including a device that issues a DMA request, and FIG. 15 is a diagram that illustrates an example of the configuration of a bus arbiter 400 that arbitrates DMA requests in the computer system 100.
In the example illustrated in FIG. 14, devices that are DMA factors in the computer system 100 are HDDs (hard disk drives), which are not illustrated in the figure, corresponding to a USB (universal serial bus) interface and an SATA (serial advanced technology attachment) interface. The computer system 100 includes a USB host controller (UHCI: universal host controller interface) and an SATA host controller (AHCI: advanced host controller interface) as controllers that are used for connecting such devices to a system bus 600.
Such host controllers are configured as PCI devices 300-1 to 300-3 and are connected to a PCI bus that is the system bus 600. DMA requests generated from the PCI devices 300-1 to 300-3 are arbitrated by a bus arbiter 400 that is compliant with the PCI bus specification and are issued as access requests (DMA requests) to a main memory 1000 for a bus arbiter 820.
More specifically, as illustrated in FIG. 14, the PCI devices (a plurality of bus master devices) 300-1 to 300-3 serving as DMA request issuing sources issue DMA requests (access requests to make access to the main memory 1000) denoted by broken-line arrows A to C to the bus arbiter 400.
In the bus arbiter 400, as illustrated in FIG. 15, input timings of the DMA requests A to C are adjusted by a synchronization processing unit 410. An arbitration processing unit 420 that has received the DMA requests A to C requests a permission to use the host bus 810 by issuing a DMA request D to the bus arbiter 820 disposed on the high-level bus (host bus 810) side through a bus bridge 700. In the bus arbiter 820, it is determined whether or not the use of the host bus 810 is permitted in accordance with the use state of the host bus 810 for the DMA request. When the use of the host bus 810 is permitted by the bus arbiter 820, and a DMA permission signal E is transmitted, in the bus arbiter 400, a DMA request is arbitrated (selected) by the arbitration processing unit 420, and the output timing is adjusted by the synchronization processing unit 430. Then, DMA permissions (output signals; see A′ to C′ illustrated in FIGS. 14 and 15) are output to the PCI devices 300-1 to 300-3 corresponding to the selected (accepted) DMA requests from the bus arbiter 400. The device (any one of the PCI devices 300-1 to 300-3) that has received these output signals A′ to C′ acquires the right of use of the host bus 810.
Here, the use of the host bus 810 is performed by transmitting a bus command representing the type (read, write, or the like) of the access together with a memory address desired to be accessed from the bus master. In other words, the use of the host bus 810 that is made by the bus master is performed similarly to a read/write operation of the data of a device, which is made by the CPU, using the host bus 810.
In the process of the wide spreading of the PCI bus specification, the relation between a DMA operation and the total performance of a system has been researched and enlightened, and it is known that performing DMA request in small pieces in a system lowers the efficiency of the entire system. In other words, the amount of data that is read/written from/into the main memory 1000 increases in accordance with a DMA performed by the bus master (for example, about several K bytes to several M bytes). Accordingly, in the computer system 100, when access permission is given for DMA requests in small pieces (for example, for each several bytes), the issuance of a DMA request and the permission thereof are repeatedly performed, whereby the processing efficiency of the entire system is lowered.
For such a reason and the like, the bus arbiter 400 is designed so as to respond to a DMA request at higher speed.
Meanwhile, there are cases where a large-capacity cache memory (hereinafter, referred to as a CM) 220 is mounted in a CPU in accordance with an increase in the processing speed of a recent computer system. A multi-master system in which such a CPU is mounted is desired to have a bus snoop function.
In the CM 220, a set of data and address information representing a place in the main memory 1000 at which the data is present is stored. When each device on the bus performs a write operation for the main memory 1000 using the host bus 810, in order to maintain coherency between the main memory 1000 and the CM 220, data written into the main memory 1000 through a memory controller 900 needs to be reflected also on the CM 220 of the CPU.
According to the bus snoop function, each device on the host bus 810 monitors the operation of the bus and detects whether or not address information corresponding to a memory address 1000 transmitted on the host bus 810 is present in the CM 220 included therein. When the address information corresponding to the memory address transmitted on the host bus 810 is detected from the CM 220 included therein, each device on the host bus 810 performs update of the CM 220, or the like, in accordance with the address information. In this manner, each device compares a memory address flowing on the host bus 810 and all the address information stored in the CM 220 with each other every time when any one of all the access operations is performed.
Next, the power control of a CPU including the CM 220 will be described.
Tasks executed by the CPU are managed by an OS (operating system), and, in a case where there is no executed task (in the case of an idle state), the OS suppresses unnecessary power consumption of the CPU by stopping the CPU or turning off the power of the CPU. For example, in a case where the CPU includes a write-back-type CM 220, before the OS turns off the power, the CPU performs a process of reflecting data stored in the CM 220 on the main memory 1000.
This reflection process performed by the CPU is performed by searching a portion that is different from the content of the main memory 1000 from the CM 220 and writing data of the different portion into the main memory 1000. It takes a time according to the size of the CM 220 and the performance of the memory to perform this reflection process. In recent years, there are CPUs each including a CM 220 having a capacity of about 6 M bytes, and, in order to write all the content of the CM 220 into the main memory 1000 by the reflection process performed by the CPU, it may take several milliseconds.
In addition, the computer system performing such power control may include hardware that automatically turns on the power of the CPU and the CM. 220 by being triggered upon an interrupt request (IRQ) that is generated in a case where a new task is executed after the power of the CPU is turned off.
Next, the sequence of the power control of the CPU will be described with reference to FIG. 16.
FIG. 16 is a flowchart that illustrates an example of power control of a CPU including the CM 220 in the idle state of the OS.
First, a power-off timer included in each CPU (CPU core 210) is initialized and started counting by the OS (Step S1), and it is determined whether a system (or a task) assigned to each CPU is in the idle state (Step S2).
When the system is not in the idle state, in other words, when the system is in execution (No Route of Step S2), the processes of Steps S1 and S2 are performed until the system enters the idle state, for example, for every predetermined time. On the other hand, when the system is in the idle state (Yes Route of Step S2), it is determined whether or not the power-off timer has expired for each CPU (core 210) corresponding to the system (or the task) that is in the idle state by the OS (Step S3).
In a case where the power-off timer has not expired (No Route of Step S3), the process proceeds to the process of Step S2. On the other hand, in a case where the power-off timer has expired (Yes Route of Step S3), the content of the CM 220 included in the CPU is output to the main memory 1000 (Step S4). Then, the power of the CPU including the CM 220 is cut off by the OS (Step S5).
Subsequently, when it is determined that an IRQ has been generated by hardware that detects the IRQ (Yes Route of Step S6), power is input to the CPU and the CM 220 (Step S7), and the process relating to the system and the bus snoop process are performed by the CPU. In addition, when power is input to the CPU in Step S7, the process proceeds to the process of Step S1. In addition, until an IRQ is generated, the power-off the CPU and the CM 220 is maintained (Step S6 and No Route of Step S6).
According to such a process, the power of the CPU is controlled.
In addition, as a relating technology, there is a technology in which a DMA unit control unit requests the right of use of the system bus from a CPU or opens the right of use of the system bus to the CPU in accordance with the state of a DMA operation enable signal while a DMA transmission request signal that is in the On state is supplied from the CPU for efficiently performing data transmission and enabling data transmission in a DMA mode without exclusive use of the bus.
Furthermore, as another relating technology, there is a technology for achieving low power consumption by performing clock control for collectively performing DMA by supplying a high-speed clock only for a required period at a timing at which DMA transmission is performed from a clock generating unit in accordance with an instruction from a transmission control unit to a CPU, which is in the sleep state, and the memory.
As the capacity of the CM increases, address information that is compared by the CPU in the bus snoop process increases.    Patent Literature 1: Japanese Laid-open Patent Publication No. 2000-90045    Patent Literature 2: Japanese Laid-open Patent Publication No. 2005-190332
Since the CM is disposed so as to perform data exchange between the CPU and the main memory at high speed, it is not preferable from the viewpoint of an increase in the processing time for the CPU to sequentially compare the address information in accordance with the bus snoop process when data flows on the host bus. Thus, for example, while a configuration may be considered in which comparators corresponding to a required number are included, and comparison operations thereof are performed at the same time, power corresponding to the number of the comparators is consumed. While this power consumption is different in accordance with the capacity, the cache system, and the like of the CM, in recent years in which the implementation of high capacity and high speed of the CM has progressed, there is also an example in which the CM uses about 40% of the power consumed by the CPU.
By the way, in recent years, while there are cases where a CPU has a function for power control, it is difficult to suppress the power consumption of the CM. The reason for this is that, even in a case where the power consumption can be reduced by stopping the operation, as long as another bus mater uses the host bus, the CPU needs to allow at least the part of the CM to be in the operation state for a bus snoop.
In order to reduce the power consumption of the CM, the content of the CM is vacated, and then, the cache operation thereof is stopped. However, as described above, the burden for vacating the CM is large, and, particularly, in a case where the CM of the write-back-type is used, corresponding time and power are needed. In addition, in a state (power-off state) in which the CM does not operate, it is difficult to achieve regular performance even by operating the CPU, and, in order to restart the operation of the CPU, the CM is returned to be in the operation state as well.
As above, since time and power are taken also for a state transition between a stop state and an operation state of the CM, there are cases where the power consumption rather increases as the frequency of the transition increases. The frequency of the transition changes in accordance with the number of devices making a DMA request and an interrupt request and changes also in accordance with the performance of the memory. In addition, when a time until a DMA request or an interrupt process request is received is long, the performance of the device may be degraded, or the device may be incapable of continuing the operation at the worst. Accordingly, conventionally, a computer system (a host bus or the like) is frequently designed to suppress a reduction of the processing speed by responding to an interrupt request from each device as soon as possible, and there is a problem that the reduction of the power consumption does not advance.
As above, in a case where a DMA is generated in the multi-master system, the bus snoop process is performed by the CPU, and there is a problem that the power consumption of the CPU (CM) increases.
In addition, in a case where a DMA is generated in a state in which the power of the CPU is cut off by the OS, power is input to the CPU (CM), and the CM transits from the stop state to the operation state, whereby there is a problem in that the processing time and the power consumption increase.