The present invention relates to the field of data searching, and particularly to a software system and associated method for use with a search engine, to search data maintained in systems that are linked together over an associated network such as the Internet or Intranet. More specifically, the invention relates to a graphical user interface (GUI) adapted to represent dynamic data sets in various applications and tables, and to query dynamic and large data repositories and indices of Internet search engine providers.
The World Wide Web (WWW) is comprised of an expansive network of interconnected computers upon which businesses, governments, groups, and individuals throughout the world maintain inter-linked computer files known as web pages. Users navigate these pages by means of computer software programs commonly known as Internet browsers. Due to the vast number of WWW sites, many web pages have a redundancy of information or share a strong likeness in either function or title. The vastness of the unstructured WWW causes users to rely primarily on Internet search engines to retrieve information or to locate businesses. These search engines use various means to determine the relevance of a user-defined search to the information retrieved.
The authors of web pages provide information known as metadata, within the body of the hypertext markup language (HTML) document that defines the web pages. A computer software product known as a web crawler, systematically accesses web pages by sequentially following hypertext links from page to page. The crawler indexes the pages for use by the search engines using information about a web page as provided by its address or Universal Resource Locator (URL), metadata, and other criteria found within the page. The crawler is run periodically to update previously stored data and to append information about newly created web pages. The information compiled by the crawler is stored in a metadata repository or database. The search engines search this repository to identify matches for the user-defined search rather than attempt to find matches in real time.
A typical search engine has an interface with a search window where the user enters an alphanumeric search expression or keywords. The search engine sifts through available web sites for the user""s search terms, and returns the search of results in the form of HTML pages. Each search result includes a list of individual entries that have been identified by the search engine as satisfying the user""s search expression. Each entry or xe2x80x9chitxe2x80x9d includes a hyperlink that points to a Uniform Resource Locator (URL) location or web page.
In addition to the hyperlink, certain search result pages include a short summary or abstract that describes the content of the URL location. Typically, search engines generate this abstract from the file at the URL, and only provide acceptable results for URLs that point to HTML format documents. For URLs that point to HTML documents or web pages, a typical abstract includes a combination of values selected from HTML tags. These values may include a text from the web page""s xe2x80x9ctitlexe2x80x9d tag, from what are referred to as xe2x80x9cannotationsxe2x80x9d or xe2x80x9cmeta tag valuesxe2x80x9d such as xe2x80x9cdescription,xe2x80x9d xe2x80x9ckeywords,xe2x80x9d etc., from xe2x80x9cheadingxe2x80x9d tag values (e.g., H1, H2 tags), or from some combination of the content of these tags.
Typically, search engine providers resort to two types of queries: ad-hoc queries (also called xe2x80x9cinstant queriesxe2x80x9d), and persistent queries. Within the context of an ad-hoc query, a user issues a search query using a web based search form. The search query is passed to the search engine for immediate processing of the query and for returning a list of matches (or search result set). Essentially, the ad-hoc queries have a very short execution time, typically on the order of a fraction of a second, depending on the workload of the search engine. The search engine processes this type of queries immediately, searching an indexed repository (or data store). On occasions, a user might seek a particular piece of information, which is not available in the indexed repository at the time the ad-hoc search is conducted. Consequently, the search result set will not contain the desired piece of information.
The persistent type queries offer the users the possibility of a continuous search (wherefrom the term persistent queries) over a long period of time, for example two weeks. During the time span of the persistent query the user receives notification, such collect new data from the Internet, for instance every second, and that continuously update the indexed repository using crawling and gathering technologies. Exemplary popular subscription or persistent query-type services are jCentral""s(copyright) notification service, and Yahoo!""s(copyright) Auction notification.
However, because typical search engine repositories are very dynamic, a desired piece of information might not be indexed at the time the user performs the ad-hoc query. Even if the user issues a persistent query, as described earlier, this query-type normally takes a long time to process, and the user might not receive the desired result for at least one day. The reason for such a delay is that essentially for every incoming piece of information a matching based on search profiles has to be processed, which requires extensive computational resources. For example, considering a search engine that receives 10,000,000 pieces of new information daily, with a set of 1,000,000 persistent queries. Typically, an off-line batch processing task could take several hours to perform the profile matching, at which time the users are notified of a matching query result.
It is therefore clear that the persistent queries do not satisfy the shortcoming of the ad-hoc queries, as the persistent queries are typically processed only on a daily or weekly interval, which does not provide the users with instantaneous information. Yet another problem associated with persistent queries is that users typically forget to, or do not spend the effort to unsubscribe from the persistent queries they issued. Consequently, of the 1,000,000 persistent queries considered in the example above, only a small percentage is useful to process at all. The majority of the stored persistent queries might become obsolete after a certain period of time from the issuance of the queries, because users may have lost interest in the desired information.
There is currently no search mechanism that combines the convenience and speed of ad-hoc-type queries with the notification feature of persistent-type queries. The need for such a search mechanism has heretofore remained unsatisfied.
The session search system and associated method of the present invention satisfy this need by providing a novel type of query referred to herein as xe2x80x9csession queryxe2x80x9d. In the context of a session query, a user issues a search query using, for example, a web-based form. This query is processed immediately by the search engine, yielding search result elements that are returned within the new context of a xe2x80x9cdynamic search result setxe2x80x9d. In other terms, the search result set of the session query session is not static.
One significant difference between the ad-hoc query and the session query is that as long as the user is reviewing the xe2x80x9cdynamic search result setxe2x80x9d of the session query, the search result is updated automatically in almost real-time, when new information arrives. When the user is no longer interested in continuing the search, such as when the user terminates the search result review process, the life span of the session query terminates. As a result, the session query spans from the initiation of the initial search until either a time-out occurs, for example 20 minutes of inactivity, or when the user expressly terminates the session query by closing the browser window.
The session search system of the present invention generally includes two modules: A client module, also referred to herein as a session manager, that presents the xe2x80x9cdynamic search result setxe2x80x9d, and a server module, also referred to herein as dynamic query matcher, that manages the current set of active session queries. The client module implements an executable code such, as a Java applet, in the user""s web browser, or, alternatively, as a stand-alone application.
During the search session, the client module and the server module exchange xe2x80x9calivexe2x80x9d messages for ensuring that the session query has not timed out or has not been terminated. In one embodiment the client module sends xe2x80x9calivexe2x80x9d messages to the server module, advising the server module that the session is still active. In another embodiment the server module sends xe2x80x9calivexe2x80x9d messages to the client module inquiring if the client is still interested in maintaining the session active.
The server module maintains a record of all the current session queries. If new pieces of information arrive from a web crawler or gatherer, the new information is matched with the current set of session queries. Matched items will be sent to the client module, which, in turn, automatically updates the user""s graphical user interface that presents the dynamic search result set, e.g. the view screen web browser application.
The session search system and associated method of the present invention provide numerous advantages and benefits to the users and to the search engine providers. For example, the session search system and method enable the users to easily and conveniently perform a search query similar to the ad-hoc query, without the need to subscribe or setup and manage a persistent query. The management of persistent queries can be burdensome to a user, especially when notification, of useless information is sent periodically, for instance every day, via e-mail.
The session query is performed automatically, without special user intervention. The life span of a session query could range, for example, from a few minutes to several hours, varying with the user""s needs. The likelihood that a desired piece of information is found during the search query depends on the update frequency of the search engine repository. This improves the overall quality of the search result set, particularly when the search is conducted on very large and dynamic repositories.
The session search system and associated method of the present invention enable the search engine providers to offer a more pro-active interface with the users. In addition, search accuracy will be greatly improved with the increased probability of obtaining a desired piece of information (i.e., a perfect hit), that would have otherwise not been made available at the time a conventional search query was performed.
Moreover, the session search system and associated method of the present invention enable the automatic delivery of the updated information obtained subsequent to the formation and submission of the initial session query, directly into the user""s displayed xe2x80x9cdynamic search result setxe2x80x9d. Another feature of the session search system and method is that the updated information is integrated with the users"" view screen seamlessly, and almost transparently to the users, to avoid fatigue or distraction.
In addition, the session search system and method will significantly reduce the burden of maintaining and tracking persistent queries. Rather than being concerned about maintaining a large set of persistent queries, the users would rely on the self-maintaining feature of the session query. The session queries will result in a smaller matching process, thus requiring less computing resources, increasing the overall speed of the search process, and ultimately enabling the search engine providers to better allocate their resources.