In addition to performing extensive and repetitive computations, digital computers are often used in the management and retrieval of vast amounts of data. One particularly important task for computers is to identify records, such as documents, that contain words or terms (collectively referred to as "words") set forth in a query. A number of systems have been developed to accomplish this, some of them also responding to more sophisticated queries such as identification based on proximity of the query words within a record or the presence of some of the words and not others.
A number of query systems have been developed. Typically, the systems use an "inverted file" in which the distinct words contained in all of the records are arranged in a predetermined order, such as alphanumerically. Each word is associated with the identification of the records containing the word. If a system is to respond to queries regarding proximity of words to each other within a record, the inverted file may also identify the location of the word within each record. In processing a query, the system uses the inverted file to identify records containing the words in the query, and locations of the words within the records. The system can then identify those records which contain words which satisfy the query.
As refinements, query systems have also been developed which respond to queries based on weighting values associated with the words in the record and in the queries. The weighting values may be associated, for example, with such factors as the importance, as determined by an external source, of the associated words in the record and the query, the number of times the word appears in the record, and so forth. In responding to a query, such a query system may use the weights to generate a score for each record which, except for the weighting values, would satisfy the query, and identify those records with the highest scores or whose scores exceed a selected threshold value. Such query systems may provide a ranking for those records which satisfy the query.
Generally, the query systems, as described above, have been processed on conventional serial computer systems. Recently, massively parallel computer systems have been developed which can process large amounts of data in parallel. One such computer system is generally described in the aforementioned Hillis patents and Hillis, et al, patent application. In such a computer system, a host 10 computer controls a processing array comprising a large number of processing elements, the host 10 computer controlling the processing elements in unison. The processing elements can also transfer data to other processing elements using several data routers, and can also transfer data to external mass storage devices.
Query systems have been described for such massively parallel computers. In one query system, described in C. Stanfill, et al., "A Parallel Indexed Algorithm for Information Retrieval,", Proceedings, ACM Conference on Research and Development in Information Retrieval, June, 1988, pp. 88-97, each processing element is associated with a word in each document. If, as is likely, the words for each record will be associated with several processing elements, generation of scores in response to a query may require transmission of score information for each record over the data router. This can lengthen processing time if the number of words and records is large.