It is often desirable to classify a uniform resource locator (URL) of a web page into topical categories without accessing underlying content. It is known to use URL classification techniques to classify a target web page. Known techniques rely on a variety of data sources including: (i) text of the web page itself; (ii) hyperlink structure of the web page; (iii) link structure of pages pointing to the target page; (iv) anchor text from pages pointing to the target page; and (v) location of the page according to the URL.
It is known to carry out URL classification operations on network packet streams. These operations are generally performed in an inline manner, that is, during the transfer of packets within the network packet stream. Web domain information is extracted from the network packet. The extracted information is looked up in a database to determine a corresponding URL category. The URL categories may be arbitrary and may be determined by the user. Example categories are: (i) art; (ii) business; (iii) computers; (iv) games; (v) health; (vi) home; (vii) news; (viii) reference; (ix) sports; (x) shopping; (xi) social networks; and/or (xii) finance. URL categories may also be based on open lists and/or commercially available lists.