The use of computer systems and the Internet for communication and obtaining information has grown drastically over the years. The Internet may be described as a collection of interconnected computer networks. The World Wide Web (“Web”) may be described as a collection of interconnected documents and other resources, linked by hyperlinks and Uniform Resource Locators (“URLs”). The Web may be accessed using a computer via the Internet, and are many other services available through the Web including e-mail, file sharing, and others. Through the Internet, millions of people worldwide have easy, virtually instant access to a vast and diverse amount of on-line information. Compared to encyclopedias and traditional libraries, the World Wide Web has enabled a sudden and extreme decentralization of information and data. The Web is a fast changing place, as web sites may change and new ones may come up all the time.
A content-control software, for example, Cyber Patrol™, squidGuard™, NetNanny™, NetBarrier™, ContentBarrier™, DansGuardian™ may be used to control and restrict material delivered over the Web to a user. Typical users of such software includes parents who wish to limit what web sites their children may view from home computers, schools performing the same function with regard to computers found at school, employers restricting what content may be viewed by employees while on the job, and the like.
Most existing content controls typically filter the access to web pages based on a list of banned sites. Some filters allow control of advertisements through features like blacklists, whitelists, and regular expression filters. Some web browsers include content filtering, which prevents certain external files from the blacklist from loading.
Typically, the existing content-controlling filters operate at a proxy server. These programs work by caching and filtering content before it is displayed in a user's browser. DansGuardian™ , an Open Source proxy software, in addition to a list of banned sites, has a static list of weighted phrases and words. DansGuardian™ may filter the web pages based on phrase matching and URL matching. DansGuardian™, however, ignores a new word that is not in the static list.
The proxy content controlling filters, however, may not control all web traffic going through the system.
Additionally, to adapt to the changes that happen on the Web, the existing content controlling filters solutions typically use neutral networks and baysian statistics methodologies.