Most existing data storage systems, for example magnetic discs, drums and tapes, and magnetic bubble, charge coupled devices, and Josephson devices, are addressed using a numerical location address. However where such stores contain text information it may be much more convenient to search by content. The advantages are the avoidance of hashing algorithms and the flexibility of using an arbitrary search key.
In the past, interrogation of large data stores has been controlled by a central processing unit and although it is possible to interrogate the data base with search keys using appropriate computer programs, this has not hitherto been very convenient for the searcher especially if the operator is unable to specify the exact key word. If the data base has to be searched a number of times to find a match to one of a number of slightly varying key words, this can add appreciably to the search time.
Thus suppose that a searcher wished to search a data base for articles relating to colour cathode ray tubes, he would need to specify "color" as well as "colour" to ensure that articles using the spelling "color" were found. This is a rather simple example but it serves to illustrate the problem with somewhat more complex search keys.
These problems of retrieving data are reviewed in the article "Retrieving Information" By V. A. J. Maller in Datamation, September 1980, pages 164 to 172. This article in particular describes the so-called Content Addressable File Store (CAFS) in which indexed files can be interrogated by an associative search unit. British patent Nos. 1,491,707, 1,492,260, 1,497,676, 1,497,677, 1,497,678, and 1,497,679 describe various aspects of the CAFS. Although reference is made in this article and in the patents to fuzzy matching, this refers to matching of data between arithmetic limits rather than to matching of inexact words.