1. Field of the Invention
This invention relates to a system and a method for analyzing documents and images that include network identifiers and for detecting and marking network identifiers of particular known interest to a given user or group of users.
2. Background Art
In almost any industrialized country, it would be difficult if not impossible not to notice the rapid increase in the number and availability of network-based information services. This is especially true in the case of the Internet. Indeed, almost every popular news report involving the Internet makes some mention of how fast Internet-based services are growing.
Unfortunately, along with the rapid growth in the crop of Internet xe2x80x9cwheat,xe2x80x9d there is a corresponding rapid growth in the amount of Internet xe2x80x9cchaff.xe2x80x9d There are at present several different ways of enabling a user to separate these two types of information. The most common way is by using a search engine. As is well-known, the user of a search engine enters one or more keywords, which the engine then matches against the millions of potential available Web sites. Links to relevant Web sites are then presented to the user by means of some conventional browser.
The problem with this solution lies in the word xe2x80x9crelevant.xe2x80x9d What the search engine considers to be relevant is often not at all what the user might consider to be relevant. This is a result not only of the inherent limitations of any search engine, but also of the increasing commercialization of the Web itself. In some cases, for example, a retrieved site may be listed very highly only because the site provider paid the search engine company to put it there. In many other cases, the results of a keyword search may include a large number of advertisements that appear either as banners or as actual sites and whose inclusion is triggered by a particular keyword. To the extent that links to various sites are presented to the user, they are thus chosen based on criteria set either by the provider or by entities associated with the provider. In other words, links to Web sites are differentiated by the provider.
There are also some programs available that allow a user to filter information retrieved from the network, for example, to prevent children from downloading and viewing pornography. Such programs give the user some ability to filter out undesirable information, but they still do nothing to increase the amount of relevant information that is presented to the user. This is in part because these programs also differentiate based on particular pre-defined keywords that are provided to the user""s computer.
One well-known shortcoming of existing search engines and Web filters is that they do not incorporate any form of cross-reference to any generalized class of information that is known to be relevant to a given user. This means that the user must be even more skillful when selecting sets of keywords to be used as the basis of a Web search. Assume for example that a user is interested in Scandinavian poetry and wants to learn more about the famous series of poems by Swedish Nobel Laureate (1931) Erik Axel Karlfeldt describing his mythical rustic Swedish hero Fridolin. A simple search using a well-known search engine on the keyword xe2x80x9cFridolinxe2x80x9d returned not a single reference to Scandinavian poetry in the list of the first 100 links; the top five links were related to 1) the genealogy of a German man; 2) boomerangs; 3) Saint Fridolin, the founder of a monastery; 4) a technical report on parallel processing; and 5) a race horse.
According to one attempt at helping a user identify relevant information in an Internet Web site, software downloaded into the user""s computer analyzes the text in the HTML stream of the Web site. Wherever the software locates a word in the text of the Web site that matches a predetermined keyword list, it highlights this word and presents to the user a small overlaid display window containing network links to sites and site categories assumed to be of interest to the viewing user. For example, if the user is viewing a Web site that includes the phrase xe2x80x9cNew York, N.Y.xe2x80x9d then this conventional software generates a display with links to car rental agencies, maps, hotels, and other categories of Web sites assumed to be of interest to the viewer of this Web page.
The problem with this known method is, however, that the assumptions are redetermined by the keyword list included in the software by the provider. Even if this keyword list is updated, whoever views the same Web page will get the same highlighting and same link presentation. One viewer might be interested in traveling to New York City and may actually wish to link to the overlaid sites about available car rental agencies. Another viewer, however, might look upon any potential trip to the Big Apple as a nightmarish ordeal, and links to maps of New York city would be completely irrelevant. No matter how often or xe2x80x9cintelligentlyxe2x80x9d the software were to update the keyword list, it would still not be able to ensure relevance to a particular user, since the user is not the one determining the list.
What is needed it is therefore a system and a method that make it easier for a user to identify network links and similar data that is known to be particularly relevant to the user, and lets the user control what the system considers xe2x80x9crelevant.xe2x80x9d Such a system should, however, preferably not require a great deal of user skills and intervention, and it should ideally even identify sites of particular relevance to the user even when the user may not have known about these sites in advance. Unwanted advertisers and other sources of xe2x80x9cnoisexe2x80x9d should not be able to interfere with the presentation of relevant information. In other words, a better system would allow for link differentiation at the client/user side. This invention provides such a system and method.