1. Field of Invention
This invention relates to information retrieval systems. More particularly, the invention relates to a content-indexing search system and method providing search results consistent with content filtering and blocking policies implemented in a blocking engine.
2. Description of Prior Art
With the explosive growth of text and multimedia content that is available in the Internet and other data networks and systems, end users are increasingly relying on text and key word based search tools to locate information of potential interest. End users typically enter as input to a search tool or engine, key words describing the information and documents they are seeking. The search tool or engine will then search in an existing indexing database, and return a list of pointers to documents of potential interest, with document titles and often with a few descriptive lines of text extracted from the document body. End users will then proceed to navigate to some or all of the returned pointers to retrieve and view the actual document or online content. The search engine indexing database is typically built automatically or semi-automatically by launching an automaton program against a content source (such as Internet Web Sites), and having the automaton search the root content source as well as links to the content tree (often going to other sites), and indexing the information in the database for future searches. For large content sources, such as Web Sites on the Internet, automated searching and indexing is the only practical way to create an index search database.
With the increased diversity of information available on online systems and networks, corporations, individuals, groups, and Network Service Providers (NSPs) are increasingly implementing policies and controls to filter or otherwise limit the availability of content that is deemed inappropriate or undesirable for end users. Such content access control policies typically block undesirable content from reaching all or a subset of end users in a given online service and network. The blocking of content is typically performed in a content proxy gateway, data-network firewall, or other device inserted between the end user and the ultimate content source. Often the content filtering is implemented as part of a content caching engine, where only desirable content are kept in the cache for the user population, and undesirable content is prevented from being cached. All users can only access network content through the cache. Content is typically blocked for being offensive or inappropriate for a user group or business use or viewing at a particular time of day and other similar reasons. Often NSPs and corporations will rely on a rating system or service such as Platform For Internet Content Selection (PICS) to determine the suitability of a content site or document for a particular population. End users may also select their own self-imposed set of blocking policies in some systems.
A significant problem is presented to NSPs and data delivery providers between the need for automated search engines that automatically index vast amounts of content and the need for blocking engines to block some of the content to ultimately reach end users. Specifically, the problem is the lack of integration and coordination of the search engines with the filter and blocking policy engines. The lack of integration is caused by several reasons, including:
(a) Many organizations deploy and implement content filtering blocking policies on their sites or service which rely on search engines, such as those available on the Internet, over which they have no control.
(b) Search engines by design must find and index as much content as possible and are biased toward seeking all content aggressively. On the other hand, filtering and blocking engines by design attempt to be selective in the documents that are stored on caches and presented to end users.
The intrinsically different missions between search engines and blocking engines combined with the need for high performance and efficiency of implementation, impedes integration and coordination of these two information retrieval functions.
The problem is manifested in the fact that end users while utilizing the services of a search engine will be presented with search results containing content/document titles and descriptions to content documents that will be ultimately inaccessible based on the filtering/blocking policies. In addition to the end user inconvenience and frustration from the inconsistency, the titles and short description of the content/documents returned by the search engine may, in themselves, be highly offensive or otherwise undesirable.
Accordingly, the need exists in information retrieval systems to have search results conformed to and be consistent with blocking policies with as little protocol and performance impact as feasible.
Prior art related to content-indexing search and blocking systems includes the following:
U.S. Pat. No. 5,701,469 issued Dec. 23, 1997 (Brandli et al.) discloses a contact index search system which invokes search result correction routines to remove from the results stored search results incorrectly included and to add stored search results that were incorrectly excluded. In this manner, the search results generated in response to a user query is made accurate even though the content-index used to generate the initial search result was not up to date.
U.S. Pat. No. 5,835,722 issued Nov. 10, 1998, filed Jun. 27, 1996, (Bradshaw et al.) discloses a terminal for blocking the use and transmission of inappropriate material by comprehensively monitoring computer operations for creation or transmission of search inappropriate material, upon which the terminal is blocked and may only be unblocked by supervisory intervention.
U.S. Pat. No. 5,706,507 issued Jan. 6, 1998, filed Jul. 5, 1995, (Schloss) discloses an advisory server operated by a third party which rates the content of data downloaded from a content server to a client in order to block or sensor unwanted material.
U.S. Pat. No. 5,619,648 issued Apr. 8, 1997 (Canale et al.) discloses an e-mail filter which determines whether an e-mail message should be provided to a user in accordance with models of the users correspondence.
None of the prior art discloses a content-indexing search system which provides search results consistent with blocking policies implemented in a blocking engine whereby only content allowed by the blocking policies is returned to the end user as the result of the content search, making the search results consistent with the blocking policies.
An object of the invention is an improved information retrieval system and method of operation providing consistency between search engine results and content blocking policies.
Another object is an improved content-index search system and method of operation providing search results consistent with blocking policies.
Another object is an improved content indexing search system which implements blocking policies in a caching and filtering engine.
Another object is an improved content-indexing search system and method of operation implementing blocking policies during a content-indexing phase.
Another object is an improved content-indexing search system and method of operation implementing blocking policies during an end user""s search result presentation phase.
Another object is an improved content-indexing search system and method of operation by searching a local repository of a caching and blocking engine in lieu of searching and indexing ultimate content site sources and content servers.
Another object is an improved content-indexing searching system and method of operation configured to go through a caching and filtering engine to reach a target content.
These and other objects, features and advantages are achieved in an information retrieval network including a content-indexing search engine having a database and a caching engine coupled between the search engine and the end user, for implementing control policies typically blocking undesirable content such that search results are consistent with an end users organization""s filtering and blocking policies implemented in alternative embodiments.
In one embodiment, only content that is allowable by the blocking policy is added to the search engine indexing database. In a second embodiment, the search and presentation process of the search engine is modified to implement the blocking policies. In a third embodiment, the target of the search engine""s scanning and indexing automaton process is modified to build an indexing database by searching the caching engine""s content. In a fourth embodiment, the search engine""s scanning and indexing automaton is configured in the same way as an end user browser, i.e., going through a caching and filtering engine to reach a target content.