1. Field of the Invention
The present invention pertains generally to a technique for searching through a collection of records to retrieve, select or identify those records which have particular or desired attributes, and specifically to such a technique wherein superimposed code words are utilized to catalogue or identify the records.
2. Background of the Invention
In a recent article, John R. Pierce wrote "after twenty-five years of extraordinary progress, the computer industry is ready to enter its infancy." While the wisdom of this statement with respect to the computer industry in general may be arguable, it does appear to accurately and succinctly sum up the present state of affairs in the segment of the computer industry that involves information retrieval. During the last several years, man's ability to collect and accumulate data in a computer has grown dramatically; however, his ability to interact with the stored information has not kept pace with this growth. Rather, the powerful interactive capabilities of the general purpose computer have become inaccessible and even mystical to the vast majority of our population, since programming techniques often require the user to be too intelligent and to known too much before they can obtain any useful output from a computer. Since further advances in device technology promise to bring us more memory and larger capability in the near future, the ability to sort through and extract from the stored data, that which is needed, will become even more significant.
One fact of information retrieval that is the subject of the present invention is the technique of associative retrieval; this technique, if properly implemented, can play an important part in creating a flexible query system that is useuable by people having only a small amount of special training. Broadly speaking, associative retrieval comprehends the selection or identification of one or more units of information based only on a specification of part of the unit's contents. At present, computers do not operate associatively. Rather, retrieval is based on foreknowledge of the exact memory location in which the desired information is stored. However, the human being, while unable to store amounts of data as large as that stored in a computer, has a superior ability to retrieve some unit of stored information on the basis of any of a large variety of informational clues. Thus, a broad object of the present invention is to provide an associative retrieval technique useable in conjunction with computers capable of storing large collections of data.
One prior art technique, originated in the 1940's, that is designed to permit associative retrieval in mechanical type systems rather than in conjunction with computers, is sometimes referred to as "Zatocoding". A complete description of a Zatocoding system, including some of the background mathematics, is contained in British Pat. No. 681,902 issued to Calvin Mooers on Oct. 29, 1952. In the Zatocoding system, a file or collection consisting of a large number of individual records is first examined to determine what attributes of each record are significant for purposes of retrieval. For example, in a file including records which represent particular books in a library, useful attributes might be the author, title, publication date, subject matter, Dewey decimal classification, et cetera, of each work; in a file of telephone subscribers' directory listings, the attributes of each listing or record would likely comprise the subscriber's name, street number, street and town, among others.
Next, each attribute value (i.e., the name of the author of a book being catalogued, the title of the work, and so on) is assigned a code indicative of the attribute value, and all of the codes describing each of the attribute values associated with a record are combined to yield an overall code word denominated a "superimposed code word" for that record. These superimposed code words are then stored in an auxiliary file. In the Zatocoding systen, this file is generated by selectively notching various edge positions in a record medium or card; corresponding between the cards and the records they represent is maintained simply by writing an appropriate notation on each card.
When it is desired to retrieve or identify those records in the collection that have one or more particular or desired attribute values, a query code or match specification is generated using techniques similar to those stated above for encoding the records of the collection. The superimposed code words stored in the auxiliary file are then examined to determine which ones include, in a Boolian logic sense, the query code. In the case where notched cards are employed, this examination is accomplished by inserting long pins or needles through holes formed in the card edges, such that those cards that are notched in the particular positions specified by the query code are separated from those that are not. Since the superimposed code technique embodies random coding principles, to be discussed hereinafter, which only assure that the cards thus selected will include (but not be limited to) those which satisfy the match specification, the retrieval step is completed by conducting, in any well-known manner, a linear search to remove "false drops, " i.e., records which do have codes corresponding to the match specification code but which include undesired attribute values, and to retain only the remaining cards corresponding to desired records of the collection.
While many of the features of the Zatocoding system, including the theory of superimposed coding, may be quite valuable in enabling associative retrieval, it nevertheless remains that the technique was generally oriented toward manual type storage systems and was never expanded so as to be useful in the environment of modern day computers. This then yields another object of the present invention, namely, the adaptation of hardware which permits one to utilize superimposed coding in conjunction with general purpose computers, instead of with needling or other mechanical apparatus.
Other difficulties with the presently known manner of using superimposed coding for associative retrieval will be illustrated by a brief discussion of the assignment of superimposed codes to the records they represent. In Zatocoding, a list of random numbers in a range between 0 and b is initially generated, and numbers from the list are assembled in groups of K numbers. A code dictionary containing a listing of the groups of numbers previously assigned to other attribute values is next consulted, manually, to determine if the attribute value has appeared previously in the collection. If so, the same code assignment is (and must be) retained; otherwise, the next available group of numbers is assigned as the code for the attribute value, and the assignment is entered in the dictionary for further use in succeeding code assignments. Finally, the codes for each of the attribute values of a given record are combined to form the superimposed code word by a process, again usually manually performed, which amounts to logically OR'ing together the numbers in each of the number group, so that "duplicate" or overlapping numbers are eliminated. The foregoing procedure is also used in the generation of a match specification or query code needed to retrieve records having desired attribute values from the collection; the codes for each individual attribute value must, however, be located in the dictionary and then combined as set forth above.
Besides the fact that the aforedescribed coding operations are largely manual, its most severe deficiency is the need to maintain and refer to a code dictionary each and every time a new record is entered in the data base and each time a query or search is undertaken. If a dictionary entry is miscatalogued, misplaced or otherwise improperly filed, correct functioning is frustrated. Eventually, the code dictionary itself can become quite large and cumbersome, further limiting the usefulness of the technique. Accordingly, a further object of the present invention is to accomplish the generation of code words of the type described above in an automatic manner that does not require the maintenance of a code dictionary or other listing which must be manually accessed.