Typically, text documents that are in the field of chemistry, such as patent applications, research reports, and other investigations, refer to chemicals using generic chemical formulas. These generic formulas are commonly used as stand-ins for a multiplicity of actual chemical formula that are encompassed by the generic formula. For example, a text document might reference a substituent, moiety, or other generic place holder that is a short hand for a number of possible atoms or molecules (e.g. a methyl, ethyl, or aryl group) the make up a particular formula.
Similarly, in other technical fields, a convention or nomenclature is sometimes used to represent a set of related subject matter, such as nucleotide sequences, amino acid sequences, and so on. For instance, adenine and thymine can be represented in a nucleotide sequence with a generic substitute, x.
Understanding the scope of the subject matter covered by, say, a generic formula or generalized representation, enables researchers, patent attorneys, and business persons to identify areas of further study, potential business strategies and intellectual property disputes. More particularly, identifying licensees, research partners, or infringements can be challenging when generic chemical names or other generic representations are used in patents and other documents, because the generic terminology can mask whether there is an overlap or opportunity relating to, say, a particular chemical formula or a particular nucleotide sequence that is of interest.
There are a number of analytical techniques to allow for searching a database for specific structural formulas or specific representational forms that are represented by a generic chemical formula or generalized representation. However, these solutions are usually time consuming and include manually identifying or drawing, in the case of chemical formula, the structural formula, and these fall short of providing an environment in which overlapping subject matter between documents can be identified.
Therefore, what is needed is a system and method that provides a mechanism for generating from generic chemical identifiers one or more sets of queries representing the generic formula and searching a chemical database for specific compounds that match the queries in order to identify those entries in the database having overlapping subject matter with the generic chemical formula. Likewise, what is needed is a system and method that provides a mechanism for generating from generic representational data one or more queries representing the generic representational data and searching a database of representational data to identify those entries in the database that have overlapping subject matter with the generic representational data. What is further needed in the art is a system and method which can organize the identified documents in a virtual presentation in support of analytics upon the documents to reveal opportunities. The present invention addresses these and other needs.