The Internet is a collection of interconnected computer networks, which has long been employed for data management, sharing, and searching. As the Internet continues to evolve, the data being transmitted increases in volume and variety. With this increase in volume and variety, the risk of malicious and/or inappropriate information/contents on the Internet also increases. Inhibiting intentional or unintentional access to malicious and/or inappropriate information/contents is a critical task for individuals and organizations alike.
One technique to inhibit the intentional or unintentional access to malicious and/or inappropriate/contents involves the use of Internet filters, for example, Uniform Resource Locator (URL) filtering. URL filtering works by, for example, examining a URL that is requested by a client browser in order to decide whether to allow or prohibit access to the website associated with the URL. Generally speaking, URL filtering may employ one or more URL rating servers for examining the URL.
FIG. 1 shows an example of URL filtering involving a client 104, a gateway with URL filtering 104, and a Hyper Text Transfer Protocol (HTTP) URL rating server 114, representing a typical prior art URL filtering method. Gateway with URL filtering 104 intercepts the URL forwarded by client 104 is attempting to access the website associated with the URL. Once the URL is intercepted by gateway with URL filtering 104, gateway with URL filtering 104 then forwards the URL as-is by was of the Internet 112 to HTTP URL rating server 1114, for example, Trend Micro URL Filtering Engine (TMUFE), over the HTTP protocol, which is based on Transmission Control Protocol (TCP). HTTP URL rating server 114 then employs a rating scheme to categorize/rate the URL. After the categorizing/rating for the URL is generated, HTTP URL rating server 114 then delivers a response back to gateway with URL filtering 104. Gateway with URL filtering 104 subsequently receives the response from HTTP URL rating server 114 and applies the policy accordingly to either allow or deny access of the URL by client 102.
However, there are at least two problems with the prior art URL filtering technique. Because TCP protocol tends to be fairly heavy and cause latency, congestion can occur when a large number of gateways with URL filtering 104 forward their numerous URLs by way of the Internet 112 over HTTP protocol. Also, congestion can occur when gateway with URL filtering 104 forwards the URL as-is to HTTP URL rating server 114 because URLs tend to be very large. Therefore, because URLs lend to be large bandwidth utilization increases.
Another drawback of the prior art URL filtering method is on URL rating server 122 side. When HTTP URL rating server 114 employs a rating scheme to categorize/rate the URL, a string-based lookup can be very expensive. String-based lookup is expensive because of the high computational requirement and therefore more server bandwidth is required.