A distributed storage system is a storage system which includes a plurality of servers provided with a storage device such as a hard disk drive (HDD) and a solid state drive (SSD), for example. In recent years, research and development of a distributed storage system have been extensively carried out so as to manipulate large quantities of data called big data.
The distributed storage system is characterized in that a performance and capacity thereof is easily expanded and high reliability is secured by replication of data (called a replica, as well). For example, it is possible to easily handle variation of a load which is imposed on the distributed storage system and variation of total capacity of data by increasing/decreasing the number of servers. Further, a plurality of servers holds replicas of the same data to make the data redundant, enabling improvement of availability and durability of the data.
In such distributed storage system, performance degradation caused by a load pattern which is called a spike has been a problem. A spike is such load pattern that access is concentrated only on specific data. Since access is concentrated only on specific data, access is performed only to nodes which hold a replica of the data even if the number of servers is simply increased. Thus, it is difficult to avoid performance degradation.
In order to avoid performance degradation, it is preferable to detect data which is a cause of a spike in a short period of time (for example, approximately several seconds) to increase replicas of the data. However, a spike does not frequently occur but occurs rarely, and it is not favorable that many resources are routinely consumed to detect a spike. Accordingly, it is preferable to efficiently detect data which is a cause of a spike without using many resources.
Regarding analysis of access frequency, an algorithm called the space-saving algorithm is widely employed. The space-saving algorithm is an algorithm with which popularity of data is obtained by using an error E. The analysis of access frequency in which the space-saving algorithm is used is described with reference to examples illustrated in FIGS. 1 to 3.
FIG. 1 is a schematic diagram illustrating an example of buckets and elements used for a space-saving algorithm. FIG. 1 illustrates buckets 101 to 103 and elements 104 to 107. A bucket is a piece of information used for a mechanism (for example, an instance) which manages elements, of which counter values are same as each other, by using an element list. An element is a piece of information used for a mechanism (for example, an instance) which manages popularity of data and includes identification information of data and a value of a counter for the date. Hereinafter, “element” will be also expressed as “management information element” so as to imply that “element” is a piece of information for management. Here, an upper limit of the number of elements is 1/ε and is fixed. As a value of the counter for an element which is managed by a bucket is larger, the bucket is arranged on a more right side and elements including same values of the counter are connected by a list.
A case in which a main page is accessed in a state depicted in FIG. 1 is described with reference to FIG. 2. When a main page is accessed, a value of the counter for the element 106 is incremented, and the element 106 turns to an element 108. At this point, there is no elements which are managed by the bucket 102 and the bucket 102 is not used, so that the bucket 102 is merged into the bucket 103.
A case in which an article of a work of Mr. M is accessed in a state which is illustrated on the lower half of FIG. 2 is described with reference to FIG. 3. Here, it is assumed that an upper limit 1/ε of the number of elements in FIG. 3 is 4. There are no elements about the article of a work of Mr. M in the state illustrated on the lower half of FIG. 2. However, the number of elements has reached 4 which is the upper limit, so that an element may not be simply added. Therefore, in the space-saving algorithm, the element 104 having the minimum value of the counter (an element related to a search page, in this example) among the elements is deleted and an element 110 related to the work of Mr. M is added. Here, a value of the counter for the element 110 is “91” which is obtained by incrementing the value of the counter for the element 104 by 1. Further, a bucket 109 for managing elements of which a value of the counter is 91 is added between the bucket 101 and the bucket 103.
In the space-saving algorithm, the number of elements which are used for counting the number of access is 1/ε and is fixed. Therefore, even in a case in which a wide variety of data such as big data are accessed, for example, a used amount of resources (for example, a memory) stays constant. However, a value of a counter is a cumulative total of the number of times of access from a time point at which execution of the space-saving algorithm is started to a current time point. That is, the number of times of access including the number of times of access during time in which data is unpopular is counted. Therefore, in a case in which the number of access is rapidly increased after the space-saving algorithm is executed for a long period of time, it may be difficult to deal with the case.
Further, there is the following technique regarding management of an access history. Specifically, an access history management unit generates an access history for every file on a main memory. When a plurality of access history cells of which generated dates are same as each other are present in one access history chain after elapse of a certain period of time from generation of an access history, the access history management unit integrates these access history cells to generate a single access history cell. Further, after a predetermined period of time further elapses, the access history management unit deletes an access history from the access history cell chain. However, in this technique, when many files are accessed before access histories are integrated or deleted, large quantities of data are temporarily held. That is, a large quantity of resources is consumed, thereby impairing efficiency.
Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi, “Efficient Computation of Frequent and Top-k Elements in Data Streams”, ICDT'05 Proceedings of the 10th international conference on Database Theory, p. 398-412, 2005 is an example of related art.
Japanese Laid-open Patent Publication No. 2011-100419 is another example of related art.