1. Field of the Invention
The present invention relates generally to data processing systems, and more particularly relates to techniques for manipulating results generated by a search engine used to access information by the data processing system. Still more particularly, the present invention relates to techniques for determining which results of a search engine are accessible by the data processing system.
2. Description of the Related Art
The Internet, also referred to as an “internetwork”, is a set of computer networks, possibly dissimilar, joined together by means of gateways that handle data transfer and the conversion of messages from a protocol of the sending network to a protocol used by the receiving network. When capitalized, the term “Internet” refers to the collection of networks and gateways that use the TCP/IP suite of protocols.
The Internet has become a cultural fixture as a source of both information and entertainment. Many businesses are creating Internet sites as an integral part of their marketing efforts, informing consumers of the products or services offered by the business or providing other information seeking to engender brand loyalty. Many federal, state, and local government agencies are also employing Internet sites for informational purposes, particularly agencies which must interact with virtually all segments of society such as the Internal Revenue Service and secretaries of state. Providing informational guides and/or searchable databases of online public records may reduce operating costs. Further, the Internet is becoming increasingly popular as a medium for commercial transactions.
Currently, the most commonly employed method of transferring data over the Internet is to employ the World Wide Web environment, also called simply “the Web”. Other Internet resources exist for transferring information, such as File Transfer Protocol (FTP) and Gopher, but have not achieved the popularity of the Web. In the Web environment, servers and clients effect data transaction using the Hypertext Transfer Protocol (HTTP), a known protocol for handling the transfer of various data files (e.g., text, still graphic images, audio, motion video, etc.). The information in various data files is formatted for presentation to a user by a standard page description language, the Hypertext Markup Language (HTML). In addition to basic presentation formatting, HTML allows developers to specify “links” to other Web resources identified by a Uniform Resource Locator (URL). A URL is a special syntax identifier defining a communications path to specific information. Each logical block of information accessible to a client, called a “page” or a “Web page”, is identified by a URL. The URL provides a universal, consistent method for finding and accessing this information, not necessarily for the user, but mostly for the user's Web “browser”. A browser is a program capable of submitting a request for information identified by an identifier, such as, for example, a URL. A user may enter a domain name through a graphical user interface (GUI) for the browser to access a source of content. The domain name is automatically converted to the Internet Protocol (IP) address by a domain name system (DNS), which is a service that translates the symbolic name entered by the user into an IP address by looking up the domain name in a database.
Due to the global and diverse nature of the Internet, it is sometimes difficult to find desired information that a user may desire to acquire or access. Search engines are useful in searching the Internet to obtain information pertaining to user-specified keywords that are entered by a user into the search engine to assist in obtaining desired information. A search engine or search service is a document retrieval system designed to assist a user in finding information maintained on various computer systems that comprise a network such as the World Wide Web or Internet. These search engines allow a user to specify desired information or content using keywords, phrases, or questions such that the search engine retrieves a list of items, typically URLs to computer systems which contain or relate to such items, or URLs that directly point to content maintained on such systems, which is to be retrieved as matching or otherwise relating to the user-specified search criteria. Known search engines such as Google and Yahoo provide such functionality.
However, in some situations, the particular computer system containing the requested information to be retrieved may not be available to the user, or the information may no longer exist at that location. For example, a common nuisance to many users is selecting a URL from a search result list and getting the infamous ‘web site not found’ error message. Other situations where this may occur are when various filters are used to block access to one or more web sites, including country censorship (e.g. countries such as China blocking access to western-country news media outlets, certain political/militant/religious organizations, etc.) and parental controls over website accessibility for their children. The problem is that today's search engines do not remove web sites that are otherwise inaccessible or blocked from their search results that are presented to the user.
Therefore, it would be advantageous to have an improved method, apparatus, and computer instructions for selectively removing search results such as web-pages/web-sites from a search result list when such web-page/web-site is not accessible.