With the advent of cheap and powerful computing systems and the development of the electronic database, there has been an explosion in the collection of data across almost all areas of technology and society. Data is generally held, in many instances, in the form of a “record”, which typically comprises a series of attributes that describe a real world object or event. For example, one type of data record is a health and physical record, which holds information regarding the attributes of a given person, such as their height, gender, weight, existing and past medical conditions, etc.
Another structured set of data is data regarding intellectual property rights, such as patent data records or trade mark data records. Many countries and jurisdictions have sophisticated legal regimes where owners or creators of intellectual property can register their rights to an invention, a sign and/or a design. Such records are highly structured and include a large number of attributes, such as a date of filing, the name of the Owner or Applicant, the names of the Inventors, data regarding the history of the invention and particular intellectual property office classification codes, such as the IPC (International Patent Classification) code.
As patent data is effectively a record of innovative activity, value can be derived by searching patent data to extract commercially useful information.
However, as an ever growing number of patents are filed every year, due to a constant increase in the rate of technological development and a greater awareness of the legal rights covering inventions, patent databases now contain millions or tens of millions of records, and each record has a complex set of attributes. Therefore, traditional methods for searching such databases (such as by looking for keywords in the Title, Abstract or Applicant Details) lack precision, are prone to error and can return large and unwieldy data sets.
More importantly, without seeking professional assistance and studying each patent specification in detail, it is difficult to judge the relative worth or “merit” of a particular patent, or the underlying invention protected by the patent, in comparison to other patents and patented inventions. As such, traditional search methodologies struggle to adequately provide any sophisticated or high level information regarding the relative merit or worth of a patent.
In the context of the following description, it will be understood that a data set refers to a collection of one or more data records extracted from a database. In turn, a data record includes a number of attributes. The attributes define and quantify a number of characteristics about a “real world” entity. For example, in the case of patent data records, one attribute may be the patent number, another attribute may be the named Applicant or Patentee, a third attribute may be a list of documents cited against the patent during examination, etc. In turn, an attribute value is the actual value contained in a particular instance of a data record. For example, in a patent data record, an attribute is the patent number, and the attribute value, for a given record, is the actual value stored for that attribute.