This application relates to deriving chemical structural information.
A chemical substance is commonly represented in textual form (“name”) or graphical form (“structure”), each of which has its own advantages. For example, a name such as “benzene” is well-suited for use in a conversational or written statement such as “The object was immersed in 100% benzene.” Benzene can also be represented by a structure (FIG. 1) that illustrates that a benzene molecule features high symmetry, including six carbon atoms arranged at the corners of a regular hexagon, with six hydrogen atoms arranged a fixed distance outward from respective corners.
A chemical substance can have multiple chemical names. For example, benzene is also known as “benzol”, “cyclohexatriene”, “1,2,3-cyclohexatriene”, “cyclohexa-1,2,3-triene”, “[6]annulene”, and “1-carbapyridine”. Some names are sanctioned by at least one of three major organizations that have developed chemical nomenclature systems: the International Union of Pure and Applied Chemistry (“IUPAC”), the International Union of Biochemistry and Molecular Biology (“IUBMB”), and the Chemical Abstracts Service (“CAS”), a division of the American Chemical Society (“ACS”). These organizations often disagree about the preferred name for a substance, and the recommendations from each organization tend to be complex and have changed over time. In many instances, chemists produce or use chemically correct names that vary from the “sanctioned” names. Unintentional errors such as typographical errors are common.
Chemical names are commonly found in one of two general forms, known as “normal” (e.g., “O-acetylsalicylic acid”) and “inverted” (e.g., “salicylic acid, O-acetyl-”). Each form has its utility. The normal form corresponds to regular English writing style, is read from left to right, and is appropriate for use in prose. The inverted form emphasizes the main chemical feature of the substance and is particularly well suited for indexing, since the inverted form allows substances of similar chemistry to be sorted together, alphabetically. Many chemical names are available only in inverted form.
The abundance of different names for the same chemical substance can create confusion and uncertainty when one chemist attempts to understand a written document produced by another chemist. Chemical structures, on the other hand, tend to cause less confusion and uncertainty.