A digital media server is a computing device that streams digital media content onto a digital data transmission network. In the past, digital media servers have been designed using a general-purpose personal computer (PC) based architecture in which PCs provide all significant processing relating to wire packet generation. But digital media are, by their very nature, bandwidth intensive and time sensitive, a particularly difficult combination for PC-based architectures whose stored-computing techniques require repeated data copying. This repeated data copying creates bottlenecks that diminish overall system performance especially in high-bandwidth applications. And because digital media are time sensitive, any such compromise of server performance typically impacts directly on the end-user's experience when viewing the media.
FIG. 1 demonstrates the required steps for generating a single wire packet in a traditional media server comprising a general-purpose PC architecture. The figure makes no assumptions regarding hardware acceleration of any aspect of the PC architecture using add-on cards. Therefore, the flow and number of memory copies are representative of the prior art whether data blocks read from the storage device are reassembled in hardware or software.
Referring now to FIG. 1, in step 101, an application program running on a general-purpose PC requests data from a storage device. Using direct memory access (DMA), a storage controller transfers blocks of data to operating system (OS) random access memory (RAM). In step 102, the OS reassembles the data from the blocks in RAM. In step 103, the data is copied from the OS RAM to a memory location set aside by the OS for the user application (application RAM). These first three steps are performed in response to a user application's request for data from the memory storage device.
In step 104, the application copies the data from RAM into central processing unit (CPU) registers. In step 105, the CPU performs the necessary data manipulations to convert the data from file format to wire format. In step 106, the wire-format data is copied back into application RAM from the CPU registers.
In step 107, the application submits the wire-format data to the OS for transmission on the network and the OS allocates a new memory location for storing the packet format data. In step 108, the OS writes packet-header information to the allocated packet memory from the CPU registers. In step 109, the OS copies the media data from the application RAM to the allocated packet RAM, thus completing the process of generating a wire packet. In step 110, the completed packet is transferred from the allocated packet RAM to OS RAM.
Finally, the OS sends the wire packet out to the network. In particular, in step 111, the OS reads the packet data from the OS RAM into CPU registers and, in step 112, computes a checksum for the packet. In step 113, the OS writes the checksum to OS RAM. In step 114, the OS writes network headers to the OS RAM. In step 115, the OS copies the wire packet from OS RAM to the network interface device over the shared I/O bus, using a DMA transfer. In step 116, the network interface sends the packet to the network.
As will be recognized, a general-purpose-PC architecture accomplishes the packet-generation flow illustrated in FIG. 1 using a number of memory transfers. These memory transfers are described in more detail in connection with FIG. 2.
As shown in FIG. 2, the transfer from storage device 201 to file system cache 202 uses a fast Direct Memory Access (DMA) transfer. The transfer from file system cache 202 to file format data 203 requires each 32 bit word to be copied into a CPU register and back out into random access memory (RAM). This kind of copy is often referred to as a mem copy (or memcpy from the C language procedure), and is a relatively slow process when compared to the wire speed at which hardware algorithms execute. The copy from file format data 203 to wire format data 204 and from wire format data 204 to OS Kernel RAM 205 are also mem copies. Network headers are added to the data while in the OS Kernel RAM 205, which requires a write of header information from the CPU to OS Kernel RAM. Determining the checksum requires a complete read of the entire data packet, and exhibits performance similar to a mem copy. The copy from the OS Kernel RAM 205 to Network Interface Card 206 is a DMA transfer across a shared peripheral component interconnect (PCI) bus. Thus, a total of 5 copies, and 1 complete iterative read into the CPU, of the payload data are required to generate a single network wire packet.