With the advent of cheap and powerful computing systems and the development of the electronic database, there has been an explosion in the collection and electronic storage of data related to almost all areas of technology, industry, commerce and society. Data is generally held, in many instances, in the form of a “record”, which typically comprises a series of attributes that describe a real world object or event. For example, one type of data record is a health record, which holds information regarding the attributes of a given person, such as their height, gender, weight, existing and past medical conditions, treatments undertaken etc. Another type of data record is that describing a scientific publication wherein a plurality of such data records may form a set and be held for example in a database of publications. Such a publications database can include attributes regarding the publications, such as the authors of each publication, citations or references to other publications, publication date and the subject matter of each publication.
Another structured set of data is data describing intellectual property rights, such as patent data records or trade mark data records. Many countries have legal regimes where owners or creators of intellectual property can register their rights to an invention, a sign and/or a design. Such records are highly structured and include a large number of attributes, such as a date of filing, the name of the owner or applicant, the names of the inventors or authors, data regarding the history of the invention and particular intellectual property office classification codes, such as the IPC (International Patent Classification) code, plus other attributes that describe the nature of the intellectual property right.
As patent data is effectively a record of innovative activity, value can be derived from searching patent data to extract commercially useful information. However, as an ever growing number of patents are filed every year, due to a constant increase in the rate of technological development and a greater awareness of the legal rights covering inventions, patent databases now contain millions or tens of millions of records, and in turn each patent data record contains a large and complex set of attributes. Therefore, traditional methods for searching such databases (such as by looking for keywords in the title, abstract or applicant details attributes) can lack precision, are prone to error and can return large and unwieldy data sets.
One method for selecting, analysing and visualising related database records utilises the network paradigm in view of the relationships that exist between and amongst at least some of the records. US Publication 2010/0106752 (Eckardt, III et al.) for example describes a network visualisation system and method for making sense of sets of related database records or documents by providing a network graphical representation of the records. However the difficulties inherent in analysing and graphically representing large and complex data sets, such as the representation of more than 1000 patent documents pictured in FIG. 13 of the '752 publication, are recognised. Eckardt considers at par [0177] that it is difficult to determine what is to be understood from this network graph of patent documents in which the nodes represent documents and the links are citation linkages.
Furthermore, without seeking professional assistance and studying each patent specification in detail, it is difficult to judge the relative worth or “merit” of a particular patent, or the underlying invention protected by the patent, in comparison to other patents and patented inventions. As such, traditional search methodologies struggle to adequately provide any sophisticated or high level information regarding the relative merit or worth of a patent.
In one proposal, U.S. Pat. No. 7,716,226 (Barney) describes a method and system for probabilistically quantifying and visualizing relevance between two or more citationally or contextually related data objects, in the context of statistically rating, valuing and analysing intellectual property assets including patents, patent applications and related documents. However, Barney relies on probabilistic analysis of patent documents particularly utilizing a multi-variate regression to provide a visual map. This approach has inherent drawbacks including inherent inaccuracies associated with averaging used