Automated information retrieval systems have been operating for several decades. Information retrieval can generally be thought of as a question and answer process in which users query an information source of objects (generally sets of documents or journals) seeking relevant information of quality. Most operating retrieval systems, both government and commercial, are based on the so called Boolean strategy. The term Boolean is used to denote that queries are represented by propositions in an algebra of propositions, and answers by subsets of objects of the information source, both of which constitute Boolean algebras. As a consequence, to every query there corresponds a unique subset of objects.
Although the Boolean strategy is very attractive due to its simplicity, it is quite ineffective. Most of the major defects in this approach result from the fact that Boolean logic countenances fallacies of relevance due to the notion of strict implication. The more recent attempts at applying "fuzzy logic" or weighting procedures do not materially alter the situation. Moreover, as far as is known, no operating information retrieval system attacks the problem of the quality of the information retrieved. Among the many serious consequences of these approaches is that information conveyed by any object in the source is independent of the information conveyed by any other object, which is clearly absurd. Moreover, in the case of the strict Boolean strategy, objects must be either completely relevant or completely non-relevant because of the two-valued nature of Boolean logic.
The exponential increase in costs of library and reference publications together with the continuing exponential growth of scientific information has led to the need for small yet effective and affordable library collections. Thus libraries and other institutions using scientific information can not sustain a policy of buying and maintaining large and expensive collections of publications in the hope of capturing the useful material.
At present there are three general methods for selecting core collections of journals from an overall collection. These are: (1) expert consensus; (2) use studies; and (3) impact factor. However, all of these approaches have serious defects. Expert consensus is extremely subjective, time consuming and difficult to implement since the most qualified experts are loathe to participate in such an enterprise. Use studies are very costly and time consuming. They are also difficult to update and are clearly local in character. Selection by impact factor also has problems. The impact factor of a journal is defined as the average number of citations per published item over a previous two year period. This data is available in the Journal Citation Reports (JCR) published annually by the Institute for Scientific Information that lists raw citation data from over 4,000 scientific publications covering all major subject areas of science. The impact factor is supposed to constitute a measure of quality and correct for the various sizes of scientific publications. However, the impact factor is not a measure in the mathematical sense since it does not have the additive property required of a measure. Moreover, its claim of representing quality is also suspect since all citations are treated equally. That is, citation by a high quality journal counts as much as citation by a journal of poor quality.
Selection of core collections of journals might be treated as an information retrieval problem in which journals are to be selected on the basis of quality and relevance.