Commercial publishers commonly research content that others have published. Such research is particularly common for content, such as for news and entertainment, published on the internet, or “online”. For example, in news and entertainment, commercial publishers need to ensure that content available from them online is interesting to potential viewers, and otherwise up-to-date.
For example, each major commercial news outlet typically has a “home page” on the internet on which it publishes headlines for major stories of the moment. The home page generally has many headlines, which typically are hypertext links that can be used to access a full story. In some cases, a few sentences may be provided on the home page. The organization of the headlines on the home page generally changes several times per day as new stories become available, and older stories become less frequently viewed. Thus, online content from commercial publishers, particularly in news and entertainment can change very quickly.
For a commercial publisher to identify content published online by other publishers, and to compare such content to its own resources, a challenging task is presented due to the high volume of content, rapid change of content and limited access to content. A large amount of time and computer resources can be consumed by users in reviewing online content and content stored in their content management systems.