CPC G06F 16/1748 (2019.01) [G06F 16/172 (2019.01)] | 18 Claims |
1. A method for deduplication caching using an unreliable edge resource, comprising the following steps:
acquiring a total storage capacity of all edge servers;
searching for candidate cache files by a similarity-based hierarchical clustering (SHC) method, and acquiring file clusters of all the candidate cache files after clustering, wherein the candidate cache files each comprise a deduplicated data chunk; and
based on the file clusters and a reliability of all of the edge servers, selecting, by a heuristic algorithm, a file cluster from the file clusters to cache to an edge server until a size of cached content reaches the total storage capacity,
wherein the searching for the candidate cache files by the SHC method, and the acquiring of the file clusters of all the candidate cache files after clustering comprises:
determining, by a hierarchical clustering method based on a Jaccard index, whether a sorting index of two files after clustering is greater than sorting indexes of the two files before clustering, in each iteration of an iterative clustering process;
if yes, merging the two files into a new cluster;
determining a heat rate of the new cluster, and recalculating a file availability based on a chunk location in the new cluster; and
acquiring the file clusters after all iterative clustering is completed.
|