With the wide availability of broadband networks and the increasing speed of the network transmission, it is now possible to use the Internet resources to achieve online multimedia download and playback. It has been realized to transmit and play back the audio, video and other multimedia information on the Internet.
However, in order to further improve the utilization of existing network resources, various solutions have been introduced to increase the download speed, including the P2P (Peer to Peer) mode. Such mode realizes a point-to-point network, that is, one user terminal can directly form an uploading and downloading relationship with another user terminal. The download speed is then closely related to the upload speed of the other end. Because the upload speed for most users is not fast, and the upload speed can be further limited by some users, the P2P upload often lacks enough bandwidth, slowing down P2P transmission.
The P2S (Peer to Server) mode is based on a user-to-server structure. The user directly downloads files from a large scale download website. The files are stored in the server in the download website, so the speed can be guaranteed. However, problems, such as scattered resources and difficulties on searching, etc., still exist.
The P2SP mode is based on the user-to-server-plus-user structure. Unlike the P2P and the P2S, the P2SP download mechanism is actually a further extension of the P2P technology. The P2SP mode not only supports the P2P technology, but also uses an index database to integrate the resources of the server and the resources of the P2P together. When a file is downloaded by a user, other resources will be automatically searched, and the appropriate resource is selected to accelerate the download process. This leads to a large improvement over the traditional P2P solution on the download stability and speed.
In the existing P2SP solutions, when a file is downloaded, the data may come from different sources as the original link, the P2P network, and the third-party mirrored site, and then the data is integrated to a complete file using a unique identifier of the complete file, such as MD5 (Message Digest Algorithm 5) or SHA (Secure Hash Algorithm). FIG. 1 is a specific flow chart for the existing P2SP downloading process. As shown in FIG. 1, the downloading process may include followings.
Step 101: when a downloading client terminal needs to download a file, the downloading client terminal acquires a URL (Uniform Resource Locator) link from the Internet or a resource website.
Step 102: The downloading client terminal uses the URL link as an entrance point, queries a resource index server for multiple resources and a file hash, and then downloads the data from the retrieved URL(s) after the query.
Step 103: After the downloading client terminal completes the download of the file, the downloading client terminal registers file information to a Tracker (the tracking point) server.
Step 104: Other downloading client terminals can find the peers who have completed the download process and the peers who are in the middle of the download process through the Tracker service.
Step 105: Other downloading client terminals start a multi-source P2P download, and P2P peers exchange data among other another.
Step 106: After the download process is complete, the statistics information is reported to a statistics server.
The server obtains resources through two main approaches: client terminals take the initiative to report the resources to the server, and a backend server actively crawls and collects the appropriate download links using a crawler system, and then writes the download links into the resource index database for client terminals to query. The quantity and quality of the URL index collection is essential to the overall quality of the multi-source download services.
In existing HTTP (Hyper Text Transport Protocol) download protocol, due to the characteristics of the URL link, it is easy to technically take other's contents not in one's own server, bypassing the final page with the others' advertisements, to provide the contents to users directly on one's page with its own advertisements, such as for users' download.
When browsing, a complete web page often is not completely transmitted to the client terminal all at once. If the client terminal requests a page with many pictures and other information, the data transmitted back for the first HTTP request is an HTML (Hypertext Markup Language) text for this page. After the client terminal (e.g., the web browser) interprets the HTML text, the client's browser discovers that there are more files referred in the text. The client's browser then sends out one or more HTTP requests. After these requests are processed by the server, subsequence files are transmitted to the client, and then these files are put to the proper positions in the page by the client's browser. A complete page can be fully displayed only after multiple HTTP requests are sent and fulfilled.
Based on this mechanism, the hotlinking becomes possible. An Internet service provider can embed others' links into its own page, and displayed those links on its own page, which achieves the purpose of the hotlinking.
Currently, the commonly used anti-hotlinking method is to increase the threshold of the hotlinking, such as, changing the download website address based on the sources of the requests, inserting random numbers in the links requested to confuse the links, or adding timestamp information in the requested links With these methods, even if the hotlinking website contains the original link, the website cannot provide normal download services because the links will soon be expired. Because the generation of this kind of links is under the control of the original website, the threshold of hotlinking can be increased by modifying the link generation rules.
For example, the download address of a web site provided for a game file 17173_tlbb—0330580.exe for a period of time is in the following form: http://cdn1.download.17173.com/wangsu_key_XXXX00XXXXXXXXXX0059XXXXXXXX48 00XXXfXXXXXX00XXXX/t1/17173_tlbb—0330580.exe, where “XXXXXX” is a random number and, for the same file, the “XXXXXX” is different when the file is requested at different time.
From the backend download log of the P2SP download system, it may be found that a lot of download links for URL address paths with regular pattern of the random download addresses have been reported. In the current P2SP multi-source download technical solutions, when the download process is finished by the client terminal, the original download link corresponding to the task added by the client is saved in the database as a resource. When other clients start a query, the saved link is returned to the other clients as a download resource.
The multi-source download system may store the URL download links, mainly through reporting by the client terminals or obtaining by the server's web crawler. Those links are often directly written to the URL index database and the URL resource database. Therefore, a lot of links for the same file may be saved to the databases. For example, if one file is downloaded 100 thousand times, 100 thousand records of the links may be created. For popular files, one hash value may be associated to a huge URL collection, even hundreds of thousands of records, and it is also possible that hundreds of thousands of URLs are associated one hash file. Thus, when the URL resource collection or the correspondence between the hash value and the URLs, useless records may take up a lot of disk space, causing busy system disk IO and reduced resource query efficiency.
Thus, because a large number of URLs correspond to a same hash, multiple mapping records are stored in the database and in the memory for the same hash. A large amount of storage index resources are occupied, which may affect the system query efficiency and resource recording efficiency.
Further, because the original website adds random coefficients to the URL links to adjust the form of the links, when the P2SP software records download links, a large amount of same links (corresponding to the same file on the same server) are recorded. Even if only one valid download link is returned to the client terminal, a lot of storage resources are occupied in the server, reducing the system query efficiency.
The disclosed methods and systems are directed to solve one or more problems set forth above and other problems.