Entering chemical structures into a computer generally constitutes a difficult task inasmuch as the conventional coding procedures are not geared to handle graphic representations. The advent of microprocessors with menu-driven graphic screens provided for more convenient coding procedures but these too were relatively slow. An improvement in this process was achieved by reducing the number of key strokes to enter two-dimensional figures such as chemical structures into a computer keyboard by utilizing contextual relationships between the character being typed at a specific location and the characters surrounding that location to predict the next character and/or location of that character to be typed. This method is described in U.S. Pat. No. 4,476,462 issued Oct. 9, 1984, to the applicant herein, the entire content of which is incorporated herein by reference. An improvement in that method is described in applicant's U.S. patent application Ser. No. 863,981, filed May 16, 1986, the entire content of which application is incorporated herein by reference.
In order to permit processing by computer, chemical structures must be converted into a suitable form, usually connectivity matrices. In this respect chemical structures are not different from other types of data, for example text, which is transposed character by character into corresponding codes.
However this is not the only transformation undergone by chemical structures. During data capture they may become distorted. Although the input operator keys into the computer what he or she perceives to be a faithful copy, that copy appears on the screen in a distorted form. This distortion is created by requirements such as that all bonds fit within multiples of character spaces and that Luhn dots represent the carbon atoms in rings. This type of distortion is not unique to chemical structures. "Shorthands" developed for taking dictation are in effect transformations undertaken to achieve speed so that the operation may be carried out in "real time".
If chemical structures arrive in a computer in a distorted form then, of necessity, another transformation is required, namely the transformation that restores the original appearance, approximates it, or perhaps even improves on it for output such as printing.
Both the connectivity matrix and the distorted input structure are available as starting points for generating an output structure. The connectivity matrix, although used by the computer for searching and recognizing structures, is not suitable for output. It constitutes a list or inventory of the atoms of a chemical structure and of the bonds connecting them. If printed as an output it is virtually incomprehensible to a chemist.
Computer programs exist that can generate graphic chemical structures from the connectivity matrix (see, for example, R. E. Carhart, JCICS, 1976, pp. 82-88). Actually, not one, but a large number of different-looking structures can be obtained from such a matrix. All are chemically correct. This is because they all are two-dimensional topological projections of a three-dimensional molecule. Of the many projections that can thus be obtained, some resemble each other but others may look quite different, so that even an experienced chemist can be misled. It is for this reason that the conventional appearance of the two-dimensional representation of a structure is important. Of the many possible depictions, the one of the most significance is the one which has become the de facto standard with which chemists are familiar.
The information necessary to represent this conventional appearance is not present in the connectivity matrix. A number of computer programs have been created to supply this information from rules derived from actual experience (see, for example, C. A. Shelley, JCICS, 1983, pp. 61-65). However experience is not easily reducible to rules and a substantial portion of the output generated by such programs fails to achieve a conventional appearance.