The invention relates to searching and retrieving data stored in a digital data processing system.
A storage unit in a digital data processing system, e.g. a hard disk drive in a personal computer (PC), is capable of storing great volumes of data in its files. To search the files, the central processing unit (CPU) in such a system is capable of comparing given data with the data stored in one or more files in order to locate any occurrence(s) of the given data. For example, the CPU can compare a given word or phrase to the words or phrases in a lengthy file and locate the word or phrase if it occurs in the file. Having located the given data, the CPU can then retrieve the data or provide other information regarding it, e.g., the name of the file containing the data.
The storage capacity and access speed of today's hard disk drives is increasing rapidly. At the same time the price of hard disk drives is decreasing rapidly. As a result there is a proliferation of hard disk drives installed in PCs and users of varying levels of expertise are storing more and more data on the drives. Many users, however, encounter difficulties in searching and retrieving the data they have stored. For example, users sometimes cannot remember the name of the file that contains the data they seek or even where the file is located within a maze of directories and subdirectories of files. Further, users who store vast amounts of data in files created with a growing diversity of software applications, e.g., spreadsheets, personal information managers, word processors, database managers, and electronic mail exchanges, often find that they cannot consolidate the data.
Toward managing this growing volume of data, a number of search techniques of varying scope and complexity have been devised. Some search techniques are quite literal, i.e., they search for and retrieve exactly what the user specifies. For example, given "chemical patent" a literal technique locates only occurrences of exactly those two words in that order and overlooks "patent on a chemical compound."
Other search techniques allow a user to issue a search request that contains data as well as Boolean expressions, e.g., AND, OR, or NOT, which expand the range of data retrieved. For example, given "patent AND chemical OR pharmaceutical" a search technique including Boolean expressions locates "chemical patent" as well as "pharmaceutical patent". The range of the search can be further expanded by adding variables indicating word order and proximity. For example, given "patent AND chemical WITHIN 3 WORDS" the technique locates "patent on a chemical".
Still other search techniques include a feature known as "fuzzy searching" which provides "wild card" characters, e.g., "!" and "*", that make it possible to locate variations of given data. For example, if "!" indicates one or more wild card characters, the fuzzy search technique given "chem!" locates "chemical", "chemist", and "chemistry".