In many software applications, it is desirable to create and keep a record (e.g., an index) of content that is available on from a site on the Internet. The content may represent different products, coupons, news articles, video clips, social networking information or a variety of other information. In general, each web page that contains the content is referenced by a uniform resource identifier (URI) that defines the specific address for the web page.
One difficulty in creating an index for a web site is that many web pages that contain the same content or show the same product are referenced by different URIs. If each different URI is placed into a database for the web site, the database would quickly become unmanageable. Given this problem, there is a need for a technique that can be implemented by a computer to associate numerous different URIs with a single entry in a database that stores information about content contained in a web page.