XML and various extensions thereof, such as XBRL, are becoming widely accepted as platforms for documents that are exchanged within groups. By conforming to the XML standard, a document is structured in a manner that enables the information therein to be readily identified and displayed in a desired format for viewing purposes. The XBRL standard provides a good example of this functionality in the context of business and financial data. The structure of the data is defined by metadata that is described in Taxonomies. The Taxonomies capture the definition of individual elements of financial data, as well as the relationships between them. Within a document, these elements are identified by tags. The extensible nature of the language permits users to define custom Taxonomies, allowing for potentially infinite kinds of metadata.
Significant efforts are currently underway to adopt XBRL as a replacement for paper-based financial data collection, and various electronic mechanisms for financial data reporting. In the United States, for example, the Federal Deposit Insurance Corporation (FDIC) has instituted a project in which banks and similar types of financial institutions employ a form-based template to submit data in an XBRL format. The Securities and Exchange Commission (SEC) also has a project for the disclosure of company financial performance information, utilizing XBRL. This information can then be downloaded online, by authorized entities. Other users of XBRL-formatted information include companies that disseminate financial news. The XBRL format enables the various companies to distribute the financial information on a common platform.
It can be appreciated that, as the XBRL format is adopted for these types of uses, large collections of business and financial performance information in this format will be amassed. There is a growing need for an efficient mechanism to process and retrieve stored information from such a large collection.
In the past, the typical approach for information retrieval within a large repository of documents is to pre-parse each document in its entirety, and store the parsed information in another storage medium, such as a relational database. The database, rather than the documents themselves, then functions as the source of information that is searched to obtain data responsive to a request. Such an approach significantly increases storage requirements, since each item of information is stored twice, namely in the original document and in the parsed form. In addition, the information is not immediately available as soon as the document is loaded into the repository. Rather, the need to pre-process the document, to extract each item of information and store it in the database, results in a delay before the information contained in the document can be retrieved in response to a query.
Furthermore, since the information is stored in a database for retrieval, it is not readily adaptable to changes in the source documents or taxonomies. For example, if a new extension is created for the XBRL standard, the schema of the database needs to be redesigned to accommodate the extension. Until that is completed and the data is reloaded, queries cannot be based upon the extended features of the standard.