1. Field of the Invention
The present invention relates to cache servers of a content delivery network (CDN) and, in particular, to optimization of hard drive performance in cache servers of content delivery networks (CDNs).
2. Description of the Related Art
Content delivery networks (CDNs) provide fast delivery of data from sources of data to clients via servers placed at strategic locations in a network. A server that is the original source of the data is called the origin server. Copies of the data are cached on various servers of the CDN. When a user requests data or files, the user request is served by a server of the CDN that is near the client rather than the origin server. As a result, the data access time is reduced and bottlenecks created by centralized sources of data are avoided.
Web applications often require access to a large number of small files to display a single web page to a user. For example, in a social networking website, the web page of a user may display different types of information including user profile, images of friends of the user in the social network, status information of the friends, newsfeed from various sources, and the like. This information may be scattered across a hard drive of a server or across multiple hard drives of different servers within the CDN. Aggregating the data for responding to a request may require access to multiple servers, or multiple hard drives within each server, as well as access to different positions within each hard drive (also referred to as a disk or a disk drive).
Accessing data in a hard drive requires mechanical movements in the hard drive. For example, a seek operation is required to move the hard drive to a position that allows the disk head to read data. As a result read/write operations of a hard drive are significantly slow compared to computer operations that do not require mechanical movements like processing or accessing data in random access memory (RAM). If an application accesses data that is physically scattered at various positions on a hard drive, multiple seek operations are required to access the data. This can cause significant overhead in accessing the data and degrade the performance of the application.
Conventional ways of improving the disk throughput of such applications include use of a hard rive with better seek performance, use of solid-state drives (SSD) instead of hard drives, or use of large RAM. Hard drives with better seek performance can be expensive and the cost of replacing existing hard drives with faster ones may be prohibitive. SSDs have fast read/write operations since they do not require mechanical movement of parts for accessing data. However, a solution based on SSDs can be prohibitive since SSDs have a higher cost per megabyte storage compared to traditional hard drives. Similarly use of machines with large RAM or addition of RAM to existing machines involves additional costs that may be prohibitive. Besides, increasing RAM improves the performance of an application only if the same data is accessed multiple times. The performance improves because subsequent accesses of data loaded in RAM do not require disk access. A small RAM causes the data to be flushed from the RAM to make room for newer data. As a result, if the data that has been flushed is requested again by the client, a disk access is required. A large RAM prevents subsequent disk accesses for the same data by storing large amount of data in RAM. However, if an application does not access the same data multiple times, increasing the size of the RAM does not improve performance since the data stored in the RAM is never accessed in future.