1. Field of the Invention
This application relates to scanning webpages for content and more particularly to a system and method for only scanning webpages for updated content if the webpage includes dynamic content.
2. Description of the Related Technology
Internet filtering is the ability to restrict users from accessing certain websites due to the content that the website contains. For example, an employer might restrict employees from accessing certain websites that are objectionable or interfere with productivity. The employer can set policies for employees that only allow the employees to access business related websites during business hours. Similarly, schools and parents might restrict access for students and children to only age appropriate websites.
Additionally, internet filtering is used to prevent users from accessing websites that might contain malicious content. As webpages contain more sophisticated content, it increases the opportunity for malicious code to be downloaded onto the user's computer. As security vulnerabilities in operating systems and web browsing applications become identified, unscrupulous hackers have begun to write malicious code and applications that utilize these vulnerabilities to download themselves onto the user's machine without relying on any particular activity of the user to launch an infected file. One such example of such an attack is the use of malicious code embedded into an active content object of the webpage.
Typically webpage content is categorized using an automated process or manually. A database containing website addresses (URL's) and the categorization of the website is created. This database is transmitted regularly to a network device that filters websites requested by users. However, with this method, there can be a delay between the time a website is categorized and the time the update is sent to the network device such that malicious or inappropriate websites may be accessible by network users.
In order to reduce this delay, real-time scanning techniques have been developed. With these techniques, if a user accesses an uncategorized website, the website is categorized immediately before it is sent to the user. This permits the user to access the website immediately after categorization. However, real-time scanning is only appropriate for uncategorized websites. Due to the number of websites accessible on the Internet, it would be impractical to analyze each and every website a user requests in real-time. Therefore, there is still the threat that malicious or inappropriate content could be present on a website that had been previously categorized. Furthermore, due to the changing nature of some websites (e.g., blogs and social networking sites), the categorization of the websites might have changed such that it should now be blocked by the network device.