The invention relates to a method and system for controlling user access to inappropriate content on a network or database, for example, the Internet, by means of content blocking, and in case of attempts to circumvent said control access, notification to an appropriate supervisor, such as a system administrator. More particularly, the present invention is directed to a method and system for identifying and locating inappropriate content.
The Internet is a vast collection of resources from around the world with no sort of xe2x80x9ccentralxe2x80x9d or main database. Instead it is a collection of thousands of computers, each with their own individual properties and content, linked to a network which is in turn linked to other networks. Many of these computers have documents written in a markup language, such as Hypertext Mark-up Language (xe2x80x9cHTMLxe2x80x9d) that are publicly viewable. These HTML documents that are available for public use on the Internet are commonly referred to as xe2x80x9cWeb Pagesxe2x80x9d. All of the computers that host web pages comprise what is known today as the World Wide Web (xe2x80x9cWWWxe2x80x9d).
The WWW is comprised of an extremely large number of web pages that is growing at an exponential amount every day. A naming convention known as a Uniform Resource Locator (xe2x80x9cURLxe2x80x9d) is used to designate information on the Internet. Web pages are typically assigned to the subclass known as the Hypertext Transport Protocol (xe2x80x9chttpxe2x80x9d) while other subclasses exist for file servers, information servers, and other machines present on the Internet. URLs are an important part of the Internet in that they are responsible for locating a web page and hence, for locating desired information. A user may locate a web page by entering the URL into an appropriate field of a web browser. A user may also locate web pages through xe2x80x9clinking.xe2x80x9d When the user accesses any given web page, xe2x80x9clinksxe2x80x9d to other web pages may be present on the web page. This expanding directory structure is seemingly infinite and can result in a single user seeking one web page, and compiling a list of hundreds of new web pages that were previously unknown from the links on the one web page.
Large amounts of information are available on the WWW and are easily accessible by anyone who has Internet access. In many situations it is desirable to limit the amount and type of information that certain individuals are permitted to retrieve. For example, in an educational setting it may be undesirable for the students to view pornographic or violent content while using the WWW.
Until now, schools and businesses have either ignored inappropriate material available on the Internet or attempted to filter it with software originally designed for home use on a single computer. Others have tried to convert their filtering products to proxy servers so that they may filter entire networks. xe2x80x9cYes Listsxe2x80x9d and xe2x80x9cContent Filteringxe2x80x9d are other industry methods, which have found use in this area, albeit with less success. Conventional xe2x80x9cfilteringxe2x80x9d has several inherent flaws, despite the fact that it is considered the best alternative of inappropriate site management. If a filter list is broad enough to ensure complete safety for its users, unthreatening material is inevitably filtered along with material considered to be appropriate. This leads to a reduction in the versatility of the Internet and the possibility of censorship accusations. On the other hand, if the filter list is too narrow, inappropriate material is more likely to pass through to the users. In addition, the filter vendor is in control of defining the filter list. This results in the moral and ethical standards of the vendor being imposed upon the user. Moreover, because new inappropriate sites appear on the Internet at a very rapid pace, and because Internet search engines tend to present newer web sites first, the newer sites that are least likely to be in a filter list are most likely to appear at the top of search results.
A xe2x80x9cYes Listxe2x80x9d is the safest method of protecting students on the Internet. However, it is the most expensive to administer, and it dramatically reduces the benefits of the Internet in an educational setting by being the most restrictive. xe2x80x9cYes Listsxe2x80x9d require the teachers to research the Internet for materials they wish students to have access to, then submit the list of suitable materials to an administrator. The administrator then unblocks these sites for students access, leaving all non-approved sites fully blocked and non-accessible.
The final method of managing inappropriate material is xe2x80x9cContent Filteringxe2x80x9d. This involves scanning the actual materials (not the URL) inbound to a network from the Internet. Word lists and phrase pattern matching techniques are used to determine if the material is inappropriate or not. This process requires a great deal of computer processor time and power, slowing down Internet access and also making this a very expensive alternative. Furthermore, it is easily defeated by pictures, Java, or some other method of presenting words/content without the actual use of fonts.
These and other drawbacks exist.
An object of the invention is to overcome these and other drawbacks in existing devices.
It is an object of the present invention to circumvent the current lack of administrative intervention by notifying a system administrator when a user repeatedly attempts to bypass security measures that have been placed to restrict viewing of inappropriate material.
It is another object of the present invention to provide a system and method of adaptively building a list of inappropriate materials so that, for example, newer websites containing inappropriate materials may be added to a filter list of websites containing inappropriate material.
It is another object of the present invention to provide a system and method of adaptively building a list of inappropriate materials by scanning local memory of network interfacing hardware such as the cache of a proxy server or the storage space of a firewall system.
According to one embodiment, a method for determining undesirable content on a public computer network is disclosed. The method operates in conjunction with a system for accessing content from the network. The system includes at least one computer and an interface for facilitating communication between the computer and the network. The interface includes local storage. The method scans the local storage of the interface for undesirable content. The scanning utilizes methods such as word, phrase and pattern matching to identify inappropriate content. After scanning, the method determines the location of undesirable content within the public computer network. For example, the network can identify the URL at which the inappropriate content is located. This is also done from local memory. These identified locations may then be added to a filter list to prevent other users from accessing the inappropriate content.
According to another embodiment, a system for determining undesirable content on a network is disclosed. The system includes at least one computer, an interface for facilitating communication between the computer and the network. The interface includes local storage e.g., for storing accessed content. The system also includes a means for scanning the local storage of the interface for undesirable content and means for determining the location of undesirable content within the network. Finally, the system also includes a filter list to which the identified sites are added.
Other features and advantages of the present invention will be apparent to one of ordinary skill in the art upon reviewing the detailed description of the present invention and the attached drawings.