As the amount of published information pertaining to science, business, reference and the like has grown, there has arisen a concomitant requirement to address the difficulties of space restraints associated with storing the great bulk of paper materials generated. Additionally, selective retrieval of documents and articles from these collections has cebome elaborate. As an example of the significance of storage requirements, a typical 10 year collection of merely one scientific journal may amount to 30,000 pages. Thus, costs to a library facility to carry out conventional binding and shelf storage of significant numbers of such publications have become expensive. To at least lessen the storage requirements, resort has been made for providing microfiche copies of the materials. However, ussers of such storage media find the approach somewhat unsatisfactory, particularly with respect to the information search and hard copy aspects.
Investigators have considered and implemented the storage of such information within the magnetic media of principal computer installations in conjunction with on-line communication with satellite or subscriber terminals provided, for example, as the ubiquitous personal computer (PC). Such systems offer advantages in terms of efficient information storage, as well as in providing searching techniques which may be interactive with the user or library patron. However, such advantages are somewhat offset by the costs of communication between host and satellite terminal, as well as the extended time factor often associated with host-to-PC interactive communication of graphical data, and the less than satisfactory visual quality of the retrieved graphical output.
Over the recent past, optical disk technology has introduced the CD ROM, a compact device capable of carrying a database, of approximately 600 megabytes of digital data. Accordingly, one such device is capable of retaining the equivalent of a decade collection of full text and graphics for a significantly sized scientific journal or the like. Further, because of the reasonable production costs involved, once a master is produced, the CD ROMs can be relatively widely distributed for interactive employment by library patrons or users in the field. High communication costs are avoided, as well as the excessive delays occasioned in accession data through communications links while providing high resolution output on a wide variety of screens and printers.
To effectively implement such high density local memory devices for retaining these reference materials, however, a practical technique or system for creating the necessary master from which they are formed is necessary. A practical commercialization of such technology requires that the output of complex typesetting systems used by publishers can be algorithmically translated into a standard format with a minimum of human intervention. Because such publications conventionally incorporate graphics, tabulations and, very often, chemical and mathematical symbols and equations practical techniques also are required for effectively combining such materials with text in a manner wherein page printouts will closely reflect the quality of the original publication. Finally, an effective indexing and searching facility is required of the master structure to permit adequate access on the part of the local user or patron of the data retained in the CD ROM in an effective and efficient interactive manner. For example, Boolean retrieval techniques, searches conducted by the occurrence of words and strings within specificed fields and paragraphs are desirable. The outputs at the local terminal must be adequately presentable both on the screen of a conventional PC monitor and through a conventional, reasonably priced printer. Thus, searching can be carried out at this local situs and full text materials can be retrieved in printed form.