The invention disclosed herein relates to computerized searching for electronic data within collections of stored data, such as, e.g., documents.
Electronically stored data, or information, is now available in immense quantities, and the rate of growth is accelerating. Such data may be stored as logical units of various kinds, such as, e.g., documents, files, records, etc.
A simple example is the World Wide Web (the “Web”), a global information space comprising hyperlinked documents that are requested and distributed via the Internet. From its origin in 1990, the Web has grown such that a study conducted in January 2005 concluded that it comprised at least 11.5 billion indexable Web pages alone.
Specialized databases continue to grow as well. For example, electronic legal databases store, e.g., statutes, regulations, judicial opinions, and secondary sources and are constantly updated an expanded. Examples of such legal databases are described in commonly-owned U.S. patent application Ser. No. 10/045,586, filed on Jan. 11, 2002, and titled “DYNAMIC LEGAL DATABASE PROVIDING CURRENT AND HISTORICAL VERSIONS OF BODIES OF LAW,” and commonly-owned U.S. patent application Ser. No. 10/603,207, filed on Jun. 25, 2003, and titled “ELECTRONIC MANAGEMENT AND DISTRIBUTION OF LEGAL INFORMATION,” both of which are hereby incorporated by reference herein in their entirety.
Data stores of these sizes would not be useful without tools for searching for and retrieving desired information. Various types of search tools (commonly referred to as “search engines”) are well-known. Typically, a search engine accepts a query from a user and then tries to identify all data that in some way corresponds to the query. A list of some or all matching logical units that contain data responsive to the query is provided to the user, who may then be able to retrieve some or all of the logical units.
The utility of a search tool, however, often depends on how well the user can formulate a query. For example, one query related to a topic may return little or no useful information, while a slightly different query may return hundreds, or even thousands, of hits, which may be far too many to examine. Users may waste considerable time in trial and error before stumbling upon a query that leads to a manageable number of relevant hits. In practice, a user may settle for a query known to be overinclusive and then waste additional time mining the results. Against this background, a user may wish to reduce the set of searchable data, and one approach is to limit the search to a subset comprising documents that are related to one another by topic.