The ability to generate and distribute human-readable information in many industries has far out-stripped a user's ability to sort, identify and read useful material. The financial services industry, for example, generates huge amounts of human-readable data on a daily basis. Broker-dealers, for example, produce huge amounts of evaluative and analytical data for consumption by asset managers. Asset managers must collect, sort, prioritize and read the information necessary for them to do their job. Commercial asset managers may then become data generators, for example through the generation of end-user specific materials for reading and consideration by clients.
Well-known standards have developed for the organization and display of data. Extensible Markup Language (XML), for example, has been developed for the structuring of documents by the tagging of particular data types. A particular XML tag may, for example, indicate that the tagged data represents the body of a message. Particular document data types can then be formatted in particular manners. XML is currently the accepted industry standard for the organization of human-readable content. It is used pervasively in the preparation of distributed documents, including industry materials of the type described above.
A formalized subset of XML, Hypertext Markup Language (HTML) has developed as an industry standard for tagging document contents to control the appearance of data within a document. HTML is used pervasively in the preparation of Internet web pages. It is HTML that describes the creation of the colorful, graphically oriented web pages so common on the Internet today.
It will be appreciated, however, that neither XML or HTML solve the problem described above; that of assisting consumers in sorting through voluminous quantities of documents and reports to identify and prioritize those of interest.
Research Information Exchange Markup Language, or RiXML, has been developed with the purpose of improving the process of categorizing, aggregating, comparing, sorting, and distributing global financial research. See the currently existing website for the industry-supported standards organization at www.rixml.org. Consistent with its roots in XML, RiXML enables document drafters to include control tags within the data content. However, in its XML implementation, RiXML defines data tags for content descriptors which describe a content ‘payload’ (a prepackaged content aggregate—usually a document). While this can be used by consumers to automatically sort and prioritize documents, it does not provide a mechanism for finding details within the document itself. For example, an author using RiXML may be able to tag a document so that it can be automatically identified by a user as a written document containing a fundamental analysis of a particular company, but the details surrounding that analysis would require a reading of the document to be identified.
RiXML, for its many benefits, does not solve two fundamental problems associated with document identification and sorting. The first problem is the potentially differing, or asymmetrical, interpretation of various parties as to the nature of identical content. Because the RiXML tags are provided by the drafter, the categorization of the document enabled by RiXML represents the subjective interpretation of the drafter. For example, assume that a broker-dealer drafts a fundamental analysis document for a particular Company X. The drafter then uses RiXML to classify that document as a fundamental analysis document for Company X. An asset manager might be searching for a history of Company X and using RiXML might miss that document. Similarly, an end-user may pull the identical document expecting an analysis of current Company X management team and be disappointed by the content.
The second problem unsolved by RiXML is the inability to associate specific content entities and attributes with specific concepts within a concept package. Rather, such entities and attributes are instead associated with the entire content package, greatly diminishing the ability of a user to find desired content.
It would thus be desirable to develop systems and methods for more thoroughly and usefully analyzing, categorizing and sorting documents, particularly human-readable documents, by content. It would be particularly desirable to provide such systems and methods, which would enable the evaluation of document content based on selected or multiple consumer perspectives. Such an evaluation capability would significantly enhance the abilities of various interested consumers to sort, prioritize and actually read the information of most interest. Equally important, it will provide a more precise means of pruning overwhelming amount content available that would not qualify as useful to the consumer.