Chemists and material scientists advance their fields by using chemical building blocks in new ways. Access to catalogs of hundreds of thousands of candidate compounds enable creation of these new products. One such rapidly growing catalog of chemicals is made up of molecules found in biological systems, and which can be made accessible through synthetic biology. Many of these compounds would be extremely difficult and expensive to synthesize and purify using classic techniques of synthetic chemistry. However, poor search tools limit the usefulness of these growing biological repositories. The limitations of currently available search tools prevent scientists from rapidly identifying the building blocks that are of greatest utility, including those chemicals with biological origin contained in these newer repositories.
For example, the natural compound class of terpenes, thought to contain over 50,000 members, is practically impossible to search. No commercially available search tool is able to begin with this class of compounds, allow for development of a search statement targeting compounds with multiple substructures of interest (and substructures to exclude), and then return compounds that meet the criteria outlined. Instead, conventional commercial implementations contrive a way of reducing the tens of thousands of candidates down to hundreds using the selection criteria easiest to apply (e.g., molecular weight). Then each remaining candidate is evaluated and manually sorted, requiring substantial effort and thus imposing enormous costs for even minor tweaks to the search or sort criteria. These limitations mean lost time and opportunity as the best candidates may be missed and many inappropriate candidates offered up instead.
To optimize the use of new compound collections, new search tools are desired that would enable scientists to more easily construct queries that specify complex Boolean combinations of chemical substructures in a human-readable manner.