As a solution to device interconnection in a system, the Peripheral Component Interconnect Express (PCIe) bus technology has been very popular. The PCIe bus technology features high performance, low delay, low power consumption, excellent scalability, strong interference immunity, and the like. The current standard has evolved to PCI Express V3.0, and a maximum bus frequency of 8.0 GHz and a data bit width of 32 lanes (lane) may provide a transmission bandwidth up to 8.0*32*2=512 Gbps. In view of the advantages of PCIe transmission, it has currently been used for inter-system interconnection, from which a non-transparent bridge (NTB, None Transparent Bridge) technology is derived.
Refer to FIG. 1, which is a structural diagram of an NTB transmission system that applies the PCIe bus technology in the prior art, and where a specific data flow is marked and data is transmitted between a first system and a second system through a PCIe bus between NT ports (port) of two PCIe switches (Switch).
Referring to FIG. 1, initiating, by the first system, writing data into a memory of the second system may be divided into the following five processes:
Process 1. A first central processing unit (CPU) of the first system acquires data from a first memory module and initiates a memory write request to a first northbridge, where a destination address of the memory write request is an address of a first NT port of a first PCIe switch.
Process 2. After translating the destination address, the first northbridge forms a memory write transaction-layer packet (TLP, Transaction Layer Packet) on a first root complex (RC, Root Complex) module and sends it to the first PCIe switch.
Process 3. After decoding the memory write TLP, the first PCIe switch converts the destination address of the memory write TLP into a corresponding address of a second memory module by using an address translation table on the first NT port and sends the memory write TLP after destination address translation to a second PCIe switch through a second NT port.
Process 4. As an active device, the second PCIe switch sends the memory write TLP to a second northbridge through a second root complex.
Process 5. The second northbridge depacketizes the memory write TLP, converts it into the memory write request, and sends it to the second memory module to complete the memory write task.
In the foregoing processes for writing data into the second memory module, the memory write TLP is generated by the first root complex of the first northbridge and is terminated by the second root complex of the second northbridge. Valid payload (payload) of the memory write TLP remains unchanged all the time, and only a destination address and a source identity (ID) of a packet header change because of address translation on the first NT port.
However, PCIe transmission efficiency=Number of bytes of valid payload/(Number of bytes of valid payload+Header length), a length of the TLP packet header is determined by a type of an access command and cannot be changed, and therefore the larger the valid payload transmitted by the TLP at a time, the higher the transmission efficiency of PCIe is.
In process 1, the length of valid payload carried in the memory write request initiated by the first CPU to the first northbridge determines the valid payload of the memory write TLP assembled by the first root complex on the first northbridge. Unfortunately, when a CPU is used to move data, a memcopy (memcopy) instruction is generally used, and a length of the carried valid payload is only 4 bytes. This point has been proved by capturing, by a logical analyzer, TLPs on a PCIe bus.
The write combining technology may combine data of multiple memcopy instructions, and then the first CPU initiates a first memory write request, sends the data to the first northbridge at a time, thereby achieving the purpose of acquiring relatively large valid payload and improving the PCIe transmission efficiency. The write combining technology may be used when data is written into a memory space with a write-combine attribute, and the data is not temporarily stored and may be internally combined by the CPU as a single write operation, thereby reducing the number of times of memory access. Therefore, setting a memory space of the first NT port of the first PCIe switch to the write-combine attribute may be expected to improve the PCIe transmission efficiency in the first system shown in FIG. 1. Completed tests show that: When the write combining technology is not used, valid payload of the memory write TLP is only 4 bytes; and after the write combining technology is used, valid payload of the memory write TLP may be changed into a length of a cacheline, that is, 128 bytes. After the write combining technology is used on a known model, the entire transmission performance improves by two times, and the PCIe transmission efficiency improves by about four times.
However, currently, it is a relatively tough problem to allocate a memory space with a write-combine attribute to an NT port of a PCIe switch.
In an existing method for allocating a memory space with a write-combine attribute, the write-combine attribute is added to a memory space of a device by modifying, by a user in a manual manner on an operating system, a memory type range register (MTRR, Memory Type Range Register) attribute table.
However, modifications made to the MTRR attribute table must follow a rule, for example, a size of a memory must strictly be an integral power of 2, or a starting address of the memory must be aligned with a limit of the size of the memory, and this may cause, when the write-combine attribute is added to a memory space of a certain device, damage to a starting address and a memory size of another memory space subsequent to the memory space, and as a result, the another memory space subsequent to the memory space must be re-split. However, an algorithm for re-allocating the subsequent memory space is complex and not easy to implement; in addition, a method for re-allocating the subsequent memory space that meets the foregoing rule cannot always be found, and therefore, the method for manually modifying an attribute of the memory space of the device to the write-combine attribute is not reliable.