Traditional computerized reference searching for patent-related information is typically conducted by a user manually interfacing with a database or set of databases. As such, the user or searcher typically utilizes keyword searching for targeted word or phrase results. The searcher can also include, for example, date restrictions, reference result category restrictions, or author, inventor, or owner restrictions to further limit returned results sets. In the context of patent references, prior art searching is likewise typically conducted manually using principally keyword searching and restriction by date, patent class or type, or inventor or assignee, for example.
Patent Examiners, professional patent searchers, other patent professionals, individual inventors, IP insurance underwriters, or corporate officers, for example, often utilize these manual searching techniques to conduct landscape, prior art, clearance, or any number of other reference searches. The results gathered by the searching process can provide a view into the state of the art for the keywords or phrases being searched. Relevant references can be subsequently or concurrently identified and analyzed manually by the searchers.
However, the searching process is often more art form than scientific process, and therefore depends heavily on the skills of the searcher. For example, after returning a results set based on a search string, the searcher often makes a judgment call on whether to follow a research thread belonging to a particular result. The searcher's judgment can be based on various aspects of the reference; for example, the specification, the claims, or the figures (in the patent reference context). For many references, multiple aspects of the reference need to be manually studied in order to make a judgment on the reference, and the instincts and experience of the searcher are critical in this process.
The research threads followed by searchers can include other references citing to or cited by the particular reference, and other references similarly or tangentially related. It takes little imagination to understand the various components or slight differences in what is being identified in a reference by the searcher that may factor in to a fruitful or non-fruitful search. The capabilities of one searcher are often fundamentally different than the capabilities of another searcher. Further, this method of manually filtering potential research threads is often an acquired skill that involves instinct rather than a sequential set of pre-defined steps that can be followed rotely. Searchers can learn from previous mistakes, but this is a costly proposition for searchers (and those funding the searches) trying to climb the searching learning curve.
This problem of keyword searching and judgment-based research thread analysis is further compounded by the nature of language. Reference authors may use different words or phrases for the same idea or topic than other authors use. Creativity must therefore be invoked to successfully navigate any particular field, by using synonyms, slang, or other variations of any set of search terms. Many searchers often lack this necessary skill. Again, searchers can learn from previous mistakes in language variations, but this is also costly. Because the searching process often invokes the aforementioned creative and learned skills, as well as instinct and intuition, the quality and efficiency of manual searching can vary wildly.
In addition, for searches mirroring a “tree” structure that targets, for example, a primary reference or set of primary references, a set of secondary references identified by bibliographic or citation listing from each of the primary reference(s), and a set of tertiary references identified by bibliographic or citation listing from each of the secondary reference(s), in practice, by the tertiary depth, the number of references is unmanageable to review on a manual basis. Moreover, the number of references to manage is compounded at every depth. This problem is likewise present for tree searches of generations of backward citations. For the reasons expressed above, the number of potential research threads that can be followed is essentially unlimited. Searches are often therefore bounded by budgets and not any relevant substantive criteria. Existing automated keyword searching likewise cannot explore every potential research thread. The manual gathering of references can be, at best, tedious, and is often unmanageable. It is therefore desirable to effectively automate reference searching and further, to rank the relevance of individual references within the results set.
U.S. Patent Publication No. 2011/0289040, entitled “Method for Creating Associating Index for the Analysis of Documents Classified in a Hierarchical Structure,” offers one example of a method of improving the efficacy of a patent or a portfolio of patents based on utilization of a tree-like, hierarchical structure, for example, that of the International Patent Classification System (IPC). Subjective data, such as the decision to pursue litigation based on the subject matter, the decision to pursue patents within a particular field, the reference of other patents in other classifications, the dollar value played on patents of a particular subject matter, and the decision to pay maintenance fees can, for example, can be applied against the hierarchical structure. So-called unitary events, or those resulting from a human decision and comprehensible without reference to any other event, and so-called binary events, or those resulting from a human decision and comprehensible only with reference to itself and one other event therefore affect the hierarchical structure differently, and therefore the results set.
In another example, U.S. Pat. No. 7,536,331, entitled “Method for Determining the Risk Associated with Licensing or Enforcing Intellectual Property,” describes interfacing with various input sources, including specifics of the intellectual property (IP) owner's task, litigation sources, PTO records, and government financial sources and evaluating the information by comparing it to preset standards. The preset standards or risk factors can be weighted or otherwise customized, with some risk factors deemed more important than others. Other risk indicia, such as the number of successful lawsuits per one hundred intellectual property holders can also be considered. Average recovery amounts and administrative cost amounts are also factored in. Ultimately, a composite score of the relative degree of strength associated with any undertaking to commercialize the IP at issue is calculated.
Relevancy analysis is prevalent in other fields. For example, in the biological and chemical fields, groups of molecules can be compared against other groups of molecules. Researchers at Washington State University have adapted Google's PageRank software that measures and prioritizes the relevance of various Web pages in a user's search to molecule analysis. Specifically, the researchers have equated the interactions between molecules to the links between Web pages. Some links between some molecules will be stronger and more likely than others. The same algorithm that is used to understand how Web pages are connected can be used to understand how molecules interact. Further, the adapted software can quickly characterize the interactions of millions of molecules and help researchers predict how various chemicals will react with one another. Eric Sorensen, Chemist Applies Google Software to Molecules, WSU News, Feb. 14, 2012.
Edward R. Tufte has published numerous texts on the visual display of information. For example, the texts “Envisioning Information,” Graphics Press LLC (1990), “Visual Explanations,” Graphics Press LLC (1997), and “The Visual Display of Quantitative Information,” Graphics Press LLC (2d ed. 2001) all provide numerous examples of illustrations of data representations.
Further, various natural language processing classes, in the field of computer science, are taught at leading universities. Stanford University, for example, offers a natural language processing class that includes instruction on word and sentence tokenization, text classification and sentiment analysis, spelling correction, information extraction, parsing, meaning extraction, and question answering. The class further touches on the underlying theory from probability, statistics, and machine learning, and fundamental algorithms like n-gram language modeling, naive bayes and maxent classifiers, sequence models like Hidden Markov Models, probabilistic dependency and constituent parsing, and vector-space models of meaning.
In another example, text-mining software is known in other industries. For example, the text-mining application “I2E” provided by Linguamatics Ltd. allows for information extraction for information-rich and context-sensitive environments, like life science research and business intelligence needs. Relevant facts and relationships from large document collections are provided to users via real-time query results. Reporting of data is also provided in various structured forms. Semantic search capabilities are also provided using taxonomies, thesauri, and ontologies. (http://www.linguamatics.com/.)
However, at least two problems remain in the patent-related reference context. First, there remains the problem of how to obtain the appropriate harvested materials. Second, once the appropriate materials are harvested, there remains the problem of how to appropriately rank these materials. No technological solution currently exists to solve these problems. Therefore, there is a need for improvements in computerized systems for reference harvesting and reference ranking for patent-related references.