The usage of the internet has proliferated as millions of users employed the internet as a medium of communication and a source of information. Due to the far-reaching capability and popularity of the internet, the interact has also been manipulated to become a tool for spreading malicious code to attack computer systems of unsuspecting victims. Once the malicious code have successfully infiltrated a computer system, the malicious code can cause far-reaching damages (e.g., delete files, rewrite the registry, rewrite the disk space, etc.) that may not be limited to the individual computer system but may also spread to other computers that may be on the same network. Thus, individuals and enterprises are usually looking for a solution that may minimize the possibility of an attack because the task of removing the malicious code and/or addressing the problems caused by the malicious code can quickly become expensive in term of time and resources.
A popular method by which malicious code may be spread is to embed the code onto a web page. When a user accesses a web page, the malicious code may be downloaded onto the user's computer system. In recent years, a plurality of suspicious web page clearinghouse web sites (e.g., phishtank.com, stopbadware.org, etc) has been created to solicit user's help in identifying potential suspicious URLs. Many companies that provide anti-virus application programs may access the plurality of suspicious web page clearinghouse web sites to retrieve the potential suspicious URLs in the task of identifying new virus patterns to update the anti-virus application programs.
The task of identifying the true risk status of a potential suspicious web page is usually manually performed by one or more engineers of companies that develop anti-virus application programs. As discussed herein, a risk status refers to the status of a web page. Risk status may include, but are not limited, to safe, suspicious, and malicious.
To facilitate discussion, FIG. 1 shows a simple flow chart illustrating the process for identifying the risk status for a potential suspicious web page.
At a first step 102, a list of potential suspicious web URLs may be extracted from a plurality of suspicions web page clearinghouse web sites. The task of extracting the list is usually manually performed by an engineer.
At a next step 104, each of the potential suspicious URLs may be downloaded for analysis.
At a next step 106, each of the potential suspicious URLs may be analyzed. In an example, the web content of each of the potential suspicious URLs may be scanned by an anti-virus program to determine the risk status of the web page. Note, that most anti-virus application programs may only be able to identify known threats. Thus, most anti-virus application programs may only be able to identify a web page as having a risk status of safe or malicious. Since, most anti-virus application programs are unable to identify unknown threats, most anti-virus application program is unable to identify whether or not a web page has a risk status of suspicious. Although some anti-virus application programs has been able to identify script that may look suspicious. Once a web page has been identified as a web page with suspicious script, the engineer may have to perform further analysis to determine whether or not a web page is suspicious.
At a next step 108, the web page with a risk status of malicious may be added to a database of malicious web pages. If the web page is identified as safe and/or may have suspicious script, the web page is not added to the database. In an example, until the engineer has a chance to analyze the suspicious script, the web page is usually unaccounted for in the database.
The aforementioned method as described in FIG. 1 is dependent upon an engineer's schedule. In other words, whether or not a potential suspicious URL is timely analyzed is dependent upon the time and resources that may be available to analyze the web content associated with the potential suspicious URL. Thus, if the engineer is not able to timely analyze the web page, the web page may have expired and be unavailable for analysis by the time the engineer has sufficient time to perform the analysis. As a result, the database is not updated with the potential suspicious web page. For expired web pages with suspicious script, the engineer may have lost the opportunity to update the anti-virus program with new virus patterns.
Due to the sheer volume of potential suspicious URLs that may be listed on the suspicious web page clearinghouse web sites, the engineers may not be able to analyze each of the potential suspicious URLs that are listed on the plurality of suspicious web page clearinghouse web sites. In an example, on one suspicious web page clearinghouse web sites, an average of a few thousands potential suspicious URLs are submitted daily. Given the time required to analyze each potentially suspicious web page, most companies that develop anti-virus application programs may not have sufficient resources to perform analysis on all of the potential suspicious URLs. Further, the cost of hiring additional engineers to enable such a possibility may be uneconomical. As a result, not all of the potential suspicious URLs are adequately analyzed. Further, many of the links on the potential suspicious URLs are left unchecked because the engineers just do not have the capacity to broaden the scope of the analysis.
Since most companies that develop anti-virus application programs may have limited resources, each company's resources may be dedicated to reviewing potential suspicious URLs instead of cleaning out the database that may be storing the malicious web pages. Over time, the site of the database of malicious web pages may grow as more malicious web pages are added to the database. The database may quickly become bloated as new malicious web pages are added to the database but expired web pages are not removed. Since the task of maintaining the database is usually a manual process, the task of cleaning the database may not be given priority. As a result, the database of malicious web pages may continue to grow, thereby requiring additional memory space in order to store the ever-growing number of malicious web pages.