1. Field of the Invention
This invention relates to a method, system and program product for data searching in a computer environment, that is to say for acting upon a search query supplied to a computer by a user and for locating data in accordance with the query. More particularly, but not exclusively, the invention relates to locating text which may be present in a database of stored text files and which is in accordance with a user supplied search query.
The term “program product” here means a body of computer code stored by a machine readable storage medium such as a CD-ROM or one or more floppy discs, or made available for downloading from a remote computer site.
2. Related Art
In order to identify or locate particular documents or blocks of text in a data base of text files, it is known to provide a method and apparatus which can receive a user supplied search request comprising a particular text string and which will carry out an hierarchical search through an indexed database to find a matching string within the database. One such known method and apparatus is disclosed in U.S. Pat. No. 5,781,772 to Wilkinson, III et al. Also known are systems able to carry out Boolean searching in which documents stored in a database are located on the basis of a search query made up of two or more text strings linked by logical operators such as AND, OR and AND NOT. Special logical operators are also available sometimes, for example “near” where documents are located if two particular words appear next to each other or within a specified number of words from each other in the document.
The result of a search of a large database may well comprise many, perhaps a very large number of, ‘hits’, this being due to the searcher being unable to recollect exactly the item for which he is seeking and to the lack of some search capability enabling the search to be more refined. Also, whilst known systems are able to identify particular documents containing the text strings in a search query, it is still required to search each document found to identify where the text strings are located within that document and whether they add up to meaningful whole, e.g. whether they are contained in a text passage identifying the passage in a meaningful way.
In the specification of U.S. patent application Ser. No. JP919990273US1 entitled “Method and apparatus for data searching and computer readable medium for supplying program instructions” assigned to the same assignee as the present application and incorporated herein by reference, there is disclosed a text search method of which one embodiment is intended to seek a text portion comprising text fragments in a predetermined order. More generally, the method comprises receiving a sequence of two or more data fragments expected to be contained within a body of data (the data can be but is not necessarily text); searching the body of data to locate matches between the data and the respective data fragments; and identifying a portion of the body of data from the address of a match with the first data fragment in the sequence and the terminal address of a match with the last data fragment in the sequence.
One embodiment of the method disclosed in the above specification identifies a minimal text portion containing text fragments in a given order (by the term minimal portion there is meant a portion which contains only one complete sequence of the text fragments. Generally at least one of the fragments will appear only once but the portion may contain additional instances of one or some of the fragments). However, there may well be a need to search within a given text for a portion which contains two or more given text fragments but of which the order is not known. For example, it might be remembered vaguely that the text portion to be found is either:    1. The man was lurking in the dark alley. or    2. The alley was dark. The man was lurking there.
If a search request comprising the text fragments “man . . . lurking . . . dark” is passed to the previously proposed algorithm, the request will find the first text portion but not the second. On the other hand, if the search request consisted of the text fragments “dark . . . man . . . lurking”, the second text portion would be found but not the first. This is because the previously proposed algorithm will look for a portion of text in which the text fragments appear in the same sequence as given in the search request.
One object of the invention is to make available a search algorithm which provides an additional functionality or an additional search query format for identifying documents and/or locating blocks of text in a database of text files.
Another object is to provide an apparatus and method for data searching able to better discriminate specific blocks of text identified by a search query.
In particular, it is an object to provide an algorithm for handling a search query comprising text fragments and which will find a text portion containing these fragments in an order different to that of the search query.