Most drug discovery programs are based on the "empirical" drug development approach where large numbers of substances are screened for activity against a panel of assays that target a therapeutic group. Inclusion of natural products in this approach is problematic because of two issues--dereplication and recollection. Dereplication is important in a program where the number of assays is small and the number of natural products source organisms is large, where the goal is to avoid processing the same organism (from the same site) twice or discovering known compounds from new sources. Collecting the same organism from different geographical regions, however, is appropriate because of the differences in secondary metabolite production by the same organism from different habitats. Integrating chemistry and biological activity data is also important when applying the "empirical" drug development approach for defining models felt to be of predictive use. These technical issues are often the effective core of collection guidance in an acquisition program.
Dereplication and recollection of source organisms must be addressed in a high throughput acquisition program. Both issues require comprehensive information management. These are related issues because they require a similar technical approach--both have a spatial feature. Expedition planning requires application of information on past collections, to know what was collected where. Recollection is a less complicated issue; however, to recollect from the same or identical site is critical for scale-up.
Dereplication by chemotaxonomy is also useful to natural products investigators. The identification of "nuisance" compounds, those which show positive results in a bioassay but are not considered potential drugs is critical in drug discovery programs utilizing bioassay guided bioassay schemes. For example, detergents, salts, and chemical classes known not to be of interest but without defined structures such as polysaccharides in anti-HIV assays and polyphenolics in anti-viral assays. Also, rapid identification of known compounds that have been previously tested and no longer of interest, increases the efficiency of bio-assay guided isolations. For example, the discovery of the inhibitory effect of quercetin against a protein-tyrosine kinase (PTK) triggered a thorough study of the activity of flavonoids as PTK inhibitors. If new compounds are sought with PTK inhibitory activity that are not flavonoids, an effort must be made to dereplicate the flavonoid substructure.
Field identifications are an invaluable tool in dereplication by chemotaxonomy. The availability of easily accessible on-site photographs, organism descriptions, and distribution information is critical to field identifications. In some cases identification down to only the family or order level can be helpful to a researcher. For example, a chemist working with a sponge that was identified in the field as belonging to the order Vergonida should expect to find bromotyrosine derivatives.
These information management requirements, however, can become assets that are used to guide the collection effort and enhance the probability of success. Screening natural products extracts does not necessarily require a random approach. Examples of structure/activity relationships are beginning to emerge for a variety of biological assays. Published information on marine natural products chemistry and related biological activity can guide a collection effort to target, but not duplicate collections from, organism groups with known properties. This targeting relates to the efforts of Shaman Pharmaceuticals, a drug discovery company that collects plant samples based on confirmed ethnobotanical features. Clearly, a contemporary high throughput drug development program requires advanced capabilities for information handling.
The global Internet has changed fundamental aspects of the way scientists work. Electronic mail and a variety of other data exchanges provide researchers with extended capabilities that set the tone and pace of many collaborative investigations. On-line services provide ready access to scientific journals and specialty databases. "The Internet is one of the absolutely critical tools for modern biology. Biological research will become increasingly dominated by the exchange of large amounts of information and by cooperative work on large amounts of information. And the only way to support that is through the networks" (Fields, 1994). Internet-based multimedia technologies are again extending the capability for scientific collaboration.
The Internet provides a unique forum for dissemination of biological information. However, bulletin board services (BBS) and workgroups cover such a wide spectrum that it is difficult to filter a subset of appropriate information. Many users become frustrated by not only the amount of electronic mail they have to sort, but also by the number of non-refereed information sources that they must consider. Database projects on a broader scale are fragmented and their coordination is an immense task. Indeed, the National Biological Information Infrastructure (NBII), currently under development by the National Biological Survey, is a federal-level project that seeks only to provide identification of and route access to biological information.
The present invention, known as natural products Information System (NAPIS) provides an effective solution to the extended requirements for screening natural products. Combining dereplication and recollection with expedition planning turns the requirements for information handling into assets that enhance the probability of success. NAPIS technology is appropriate for natural products drug discovery efforts at the large-scale level, the university laboratory level, and the independent collector level. Biodiversity inventory projects, taxonomists and environmental conservation groups will also find NAPIS system features that are appropriate for support of their efforts.