A large number of databases are available in the public domain that are accessible over the Internet and contain a plethora of information. A database is defined as a collection of data organized especially for rapid search and retrieval as by a computer. The data may be text documents and/or images or numbers.
One example of a database is the patent database displayed in the United States Patent and Trademark Office (USPTO) web page. A provider, in this case the USPTO organization, provides and displays documents (i.e., patents) in the database and defines a method of searching the database using standard patent fields, i.e., patent number, inventor, assignee, issue date and title, among others. A user may also search the database by entering a search query consisting of specific keywords encoded in a Boolean formalism. The result of the database search is a list of documents that contain the standard patent fields or the keywords the user requested.
However, in many cases a user searches for specific type of information which although it may be contained in the database documents it cannot be directly accessed using the conventional search methods because the specific search query was not envisioned by the provider or was not intended to be searched. These specific types of search queries require advanced search methods and are used in research applications.
One such advanced search method is described in U.S. Pat. No. 6,038,561, where a dynamic concept (or “natural language”) query is performed. A user enters a list of words ranging from a single keyword to an entire document in a user-specified query document. This user-specified query document is then compared for similarity to a set of documents contained in the database and measures of similarity scores are obtained. These measures of similarity scores provide answers regarding patent infringement between two patents, or synergy between companies and inventories, among others.
However, in general research applications, a user seeks answers to a “new set of questions” and is not looking to develop a similarity analysis between two documents but rather to develop “a thesis about a new subject matter”. The “new subject matter” may be an assessment of the technical capabilities of a given company, a business strategy, a marketing analysis, type of material or human resources required to set-up a specific operation or to develop a specific type of technology. This type of research is usually performed manually in a non-systematic way. It is also cumbersome and takes a long time.
There is a need for an advanced method of researching electronically information in existing databases in order to develop via analysis and/or synthesis a new type of information database and ultimately “a thesis about a new subject matter”.