With the advent of the Internet, the amount of information can be accessed is becoming difficult to manage. In order to classify the large amounts of information, personalized news recommendation engines use network crawling software and hardware that identifies a website and then classifies information in the website; this allows a personalized news recommendation engine to identify which websites match certain search criteria.
Moreover, because of the vast amounts of information on the Internet, crawling the Internet to identify specific types of information can very processor intensive, which may interfere with other processes associated with personalized news recommendation engines.
In addition, information that is on different websites is typically formatted in different computer formats that are specific to each website. For example, a title of documents one webpage may be in Extended Markup Language (XML) and the title of a second document may be in Portable Document Format (PDF). Moreover, the location of a title within the information on one webpage may be in a different location than a title on a second webpage. Because the format/location of information on different webpages may vary dramatically, this can cause problems when trying to identify specific types of information on a website, such as, a title of a document or a tile of a news story.