1. Field of the Invention
The present invention is directed to a method for searching electronic data files and, more particularly, to a method including the entering of a two-dimensional array of search concepts, each concept being predefined key words and expressions or user-defined key words and expressions, and detecting and displaying a correlation of occurrence, within the electronic data files, between entered concepts in the respective dimensions.
2. Related Art
The amount of information generated, collected, stored, communicated and accessible through the electronic media is continuing to increase. The increase is not only in the volume; it is in the number of sources, and the variety of formats in which the information is communicated and stored. The sources include newspapers, technical journals, government publications, literary works, laws, court opinions, business reports, and public records. More and more of these are being generated, stored, searched, retrieved, and distributed through networked systems of digital computers and other digital document generation and management devices. The migration of these and other sources, and large archives of the same, to electronic media is generally attributed to a combination of the Internet and the increasing number of and capabilities of personal computers (PCs) and other Internet access devices.
The average operator-user with an entry-level PC, a telephone line, and a subscription to an Internet Service Provider (ISP), such as America On Line(copyright), now has access to literally billions of documents, forms, images, and text files, stored throughout the world on a myriad of databases. A large number of the databases are available as free access, to anyone, while others are subscription based or otherwise limited access. There are large databases which, although not directly accessible through the World Wide Web, are available through controlled-access wide area networks (WANs). As known to persons skilled in the relevant art, these may be physically separate from the Internet or may be Virtual Private Networks (VPNs) which coexist on the Internet with public data traffic. Through such private networks an authorized person may have access to large proprietary databases of technical journals, customer profiles, medical records, criminal records, internal memoranda, business reports and the like.
There are continuing problems, though, with searching such a large number of electronic files. Many of these problems prevent users from fully exploiting the Internet, and other wide area networks, and the many databases which these networks make available for their use. One of the problems is the formulation of a search strategy. Search strategy includes the choice of particular features that the user believes, or has otherwise determined, would be contained in, described by, or descriptive of the electronic files relating to the topic that he or she is researching. The choosing of these search features is critical to the research task, yet in most cases it is carried out using nothing more than intuition, trial and error.
Stated more particularly, a typical search of the World Wide Web is as follows: A user accesses the Internet through, for example, an Internet Service Provider such as America On Line(copyright). The user then, using computer software features that are well known in the art, enables a web browser program that resides on his or her personal computer, such as, for example, Microsoft Explorer(copyright) or Netscape Navigator(copyright). As is well known in the art, the web browser is usually programmed with a default xe2x80x9chome pagexe2x80x9d, which is the Universal Resource Locator (xe2x80x9cURLxe2x80x9d) of a specific web site. The web browser then performs the required Hypertext Transfer Protocol (xe2x80x9cHTTPxe2x80x9d) communications with the web server hosting the home page.
The home page may be hosted by a commercial web services/advertising entity, such as Microsoft Network(copyright), Excite(copyright), and Yahoo(copyright). Such commercial home pages generally have one or more icons representing search engines, both their own and those of third parties such as Lycos(copyright) and Infobot(copyright). When the user clicks on the search engine, he or she is presented with a display page typically having a field for entering the search query terms, also referenced in the art as xe2x80x9ckey wordsxe2x80x9d.
The typical user then proceeds to enter the key words. Many commercially available Internet search engines provide Boolean connectors of AND, OR and NOT for connecting the key words. Boolean searching ideally identifies all documents containing the defined connection of string of xe2x80x9ckey wordsxe2x80x9d. This may be with or without further limitations, such as year, language, publisher, and other type characteristics. Some of the sophisticated Boolean search methods permit the user to define search terms to include not only the term itself, but also the synonyms of, and the ranges around the term. There are available search engines that have the ability to group key words according to parenthesis. This permits more complex Boolean expressions.
The entry field, though, forms the key words into a one-line expression, regardless of the number of terms. Therefore, in that one line expression, the user is attempting to formulate a single Boolean expression that will, based only on his or her intuitive sense, have a xe2x80x9cfeels OKxe2x80x9d likelihood of finding relevant files, i.e., xe2x80x9chitsxe2x80x9d, but is not so broad that it retrieves an unwieldy number.
In a typical scenario of Boolean searching, however, the user would not simply formulate a single expression, and then conduct the entire search using only that expression. Instead, the process is typically as follows: The user attempts a first Boolean expression and gets a number of xe2x80x9chitsxe2x80x9d. If the number of hits is zero the user will usually vary the expression, either by removing one of the AND operators and thus lowering the criteria required for a document to qualify as a hit, or by substituting a synonym for one or more of the search terms. If the number is too high the user may retrieve, by one of the known methods, a sample set of the xe2x80x9chitsxe2x80x9d and read them to identify his or her next strategy. Most often the user will simply add further search criteria, typically by connecting another key word to the original Boolean phrase by an AND operator, and then run another search. When the process is completed, which is frequently coincident with the point where the user runs out of time, the typical user will have attempted a generally random sequence of different Boolean expressions, and many variations on each. The user has, hopefully at least, laboriously retrieved and reviewed documents obtained from each search expression and, in a method that is typically unique to each user, has collected and combined these into, for example, a research report.
There are numerous problems with this method. One major problem is that the user is attempting to find an optimal search phrase, using the number of xe2x80x9chitsxe2x80x9d resulting from each attempt compared to the previous attempt as the sole heuristic. For example, assume that a user is writing a paper on trends in the number of children who are transported to and from school by busses as compared to the number who are transported by parents or guardians. Assume that the first Boolean phrase that the person uses is the previous example of (CHILD OR KIDS) AND (BUS OR (xe2x80x9cPUBLIC TRANSPORTATIONxe2x80x9d)). Assume that the user is searching the Internet, using known methods of Internet access. If the number of hits is too high the user will add another search term. An example would be PERCENTAGE TRANSPORTED. The typical user would then run the search again and see the number of hits. After a number of iterations the user would finally obtain an acceptable number of hits, for example thirty.
The search xe2x80x9cmethodologyxe2x80x9d described above has other shortcomings. One is that the user might not record the various search Boolean phrases that were attempted before he or she finds the phrase that yields the desired thirty hits. As a result the user might run the same search twice, or might forget to try all possible substitutions of terms. Another problem, which is more fundamental, is that the search phrase that the user ended up with might not be the only search phrase that obtains thirty hits, and, of those phrases, it might not be the best one.
Still another problem, which overlays all of the previously identified problems, is that some users are better than others at formulating search expressions. This creates a statistical variance in the xe2x80x9cqualityxe2x80x9d of searches, both in terms of time and coverage, which may itself be a problem, especially within certain institutions and professions.
Another problem with a xe2x80x9cmethodologyxe2x80x9d for Boolean searching such as the example above is that the user may not have fully defined or developed the topic of the paper before starting the research. As is well known among, for example, college students, the user frequently starts the search before fully identifying the topic, scope, or conclusion of the task for which the search is being conducted. The user then picks the topic, and composes the outline of the paper, or other reporting document, after sifting through the results obtained from his or her repeated searches with different Boolean expressions. However, in using the xe2x80x9ctrial and errorxe2x80x9d method of attempting numerous Boolean expressions to see which one provides results that inspire the user, the user may frequently overlook many Boolean expressions for which the search results would reveal more interesting or valuable topics.
Yet another problem with the prior art of searching using single-line Boolean expressions is that many users cannot easily generate or store an understandable description of, or history of, the overall search strategies that were employed when he or she conducted a search. Therefore, frequently the user will run what is basically the same search twice, or will recreate the search strategy each time a particular project is picked up again or a new research project is undertaken.
Still another problem is that after trying multiple Boolean phrases and obtaining and relying on the results obtained with one or more of the searches, the user may have difficulty ascertaining or defending the quality of the search. This is the problem that may be encountered by students, as well as consultants and analysts when having to defend the facts, analysis or conclusion presented in a final paper based on research results.
The present invention provides a structured, concept-exhaustive method for searching databases for documents and other electronic files by receiving a plurality of search concepts from the user, designating a first plurality of the search concepts as a first search vector defining a first dimension of the matrix, and designating a second plurality of the search concepts as a second search vector defining a second dimension of the matrix. The method then performs a search of one or more databases based on the matrix, and identifies a plurality of search results, each represented by a cell of the matrix. A row of the matrix is formed by a row of cells reflecting, on a one to one basis, a search result for each of the plurality of search concepts within the first search vector. A column of the matrix is formed by a column of cells reflecting, on a one to one basis, a search result for each of the plurality of search concepts within the second search vector. Other cells of the matrix reflect, on a one to one basis, a search result for each unique pair comprising a search concept from among said first plurality of search concepts and a search concept from among said second plurality of search concepts.
A further embodiment of the invention presents the user with a visual display arranging the first plurality of search concepts as a border column, and the second plurality of search concepts as a border row. Each cell within the border is in a row-column position corresponding to a pair of search concepts, one being from the first plurality of search concepts and one being from the second plurality of search concepts. The step of displaying the search results forms each cell to have a visual state reflecting the search result for the search concept or pair of search concepts corresponding to that cell.
A still further embodiment of the invention includes a step of displaying the matrix of cells to appear as a two-dimensional plane, and displaying the search results to appear as a third dimension.
A further embodiment of the invention includes a step of receiving a search concept definition command from a user, and defining one or more of the plurality of search concepts in accordance with the received search concept definition command.
Another embodiment of the invention may be combined with any of the previously identified embodiments, and comprises the further step of receiving a user-entered cell selection command, presenting the user with a cell result list identifying documents and other electronic files within the search results reflected by the selected cell. This embodiment optionally includes a further step of receiving a document selection command from the user and a step of displaying information reflecting information content of a document or other electronic file selected in accordance with the document selection command. This embodiment optionally includes a further feature of simultaneously displaying the received cell selection command, the cell result list, a data reflecting the document selection command, and the information reflecting information content.
A further embodiment of the invention may be combined with any of the previously defined embodiments of matrix searching in accordance with the present invention, and includes the further steps of receiving a collection document command from the user, and generating a collection document in response, receiving a document selection command from the user, displaying a document or other electronic file in response, receiving a portion storage command, and copying of information into the collection document from a portion of the displayed document corresponding to the portion storage command.
A still further embodiment of the invention includes an organizing step which may be combined with any of the previously defined matrix searching with collection embodiments, and includes the further steps of receiving a user-entered document tag data, and storing an information into the collection document corresponding to the received document tag data and a portion of the displayed document corresponding to the portion storage command. An optional feature of this embodiment includes a user-entered relational database information data with the document tag data. A further optional feature of this embodiment includes steps of receiving a collection document store command from the user, and storing the collection document into a collection database in response, and repeating the step of matrix searching to including searching the collection database.
A further embodiment of the invention includes a reporting step which may be combined with any of the previously defined matrix searching with collection and organizing embodiments, and includes the further steps of receiving a user-entered link analysis generation command, identifying information contained in the search result documents that is common between two or more search concepts, and generating a link document having a link information reflecting the information identified as common. An optional feature of this embodiment includes a step of generating a graphical link chart showing the link information.
A still further embodiment of this invention comprises any of the previously defined embodiments combined with a step of drill down matrix searching, the drill down matrix searching comprising the step of receiving a cell search command from the user, receiving a new plurality of search concepts from the user, the receiving including entering or designating a first plurality of the new search concepts as a first search vector defining a first dimension of a new matrix, and entering or designating a second plurality of the new search concepts as a second search vector defining a second dimension of the new matrix. This embodiment then searches, based on the new matrix, the documents and other electronic files represented in the search results within a cell corresponding to the received cell search command.
These and other objects, features and advantages of the present invention will become more apparent to, and better understood by, those skilled in the relevant art from the following more detailed description of the preferred embodiments of the invention taken with reference to the accompanying drawings, in which like features are identified by like reference numerals.