The present invention relates to the area of electronic storage and retrieval of information. In particular, the present invention pertains to a method and system for referencing, storing, retrieving and intelligently categorizing symbolically linked information.
Many types of information are referenced and archived in everyday life using a symbolic code. Typically a symbolic code is employed by a community of users who require a consistent and convenient language to refer to a particular set of signified objectsxe2x80x94entities in the real world signified by the symbols of the code. However, in fact, most symbolic codes are not formalized and therefore users do not employ these codes in a coordinated and consistent manner. Thus, interpretation of symbols is problematic.
For example, in the financial world, financial exchanges each use a different set of exchange (ticker) symbols to refer to companies and their securities. Although within the United States, local exchanges coordinate symbol names, in general, worldwide exchanges each use a particular symbol set and symbol structure for identifying companies and their securities. For example, both the PSE (Pacific Stock Exchange) and the NYSE (New York Stock Exchange) use the symbol xe2x80x98IBMxe2x80x99 to signify a security of IBM. However, in the United States the symbol xe2x80x98Txe2x80x99 refers to an ATandT security while in Canada xe2x80x98Txe2x80x99 refers to a security of the company Telos. In Britain the symbol xe2x80x98Txe2x80x99 may refer to the security of a different company.
Vendors of financial information such as Reuters, Bloomberg, Bridge, etc. also employ unique symbol sets and structures to refer to companies and their securities. Many vendors of financial information use a structured symbol code segmented into two portions separated by a delimiter character. For example, a vendor may use the symbol structure ROOT[delimiter character]SOURCE where the ROOT segment refers to a particular company""s security and the SOURCE segment refers to a country or exchange where that security is traded. The delimiter character is typically a character such as xe2x80x98@xe2x80x99 or xe2x80x98.xe2x80x99.
Because of the multiplicity of symbols sets in circulation, interpreting a symbol in order to identify a security and a company it belongs to is problematic. For example, a single vendor may use the symbol xe2x80x98IBM.FRxe2x80x99 to refer to an IBM security traded in France and xe2x80x98IBM.GBxe2x80x99 to refer to the same IBM security traded in Great Britain. In either case, both symbols IBM.GB and IBM.FR are associated with the same company IBM. However, two vendors may use the same root and source segments to refer to two different securities issued by two different companies. For example, a first vendor might use the symbol xe2x80x98T.U.S.xe2x80x99 to refer to an ATandT security traded in the United States while a second vendor might employ the symbol xe2x80x98T@USxe2x80x99 to refer to a security of a different company. On the other hand, two different vendors may use different root and source symbols to refer to the same security of a company. For example, a first vendor might use the symbol xe2x80x98IBM.UKxe2x80x99 to refer to an IBM security traded in Great Britain while a second vendor may use the symbol xe2x80x98IB.EGxe2x80x99 to refer to the same IBM security.
The need for a consistent system to reference information linked to particular companies has grown even more important as online financial research has increased. Document repositories storing financial documents are accessible to investors and researchers via public networks such as the Internet or private networks. Contributors may submit research documents related to particular companies or securities to a document repository for archival and clients (i.e., investors or researchers) of the document repository may retrieve documents related to particular companies or securities of interest.
In the archival process, contributors typically submit a document along with an input string that refers to the company or security that is the subject of the submitted document. However, because of the multiplicity of symbol sets in use, accurate archival and retrieval of documents is highly problematic. Contributors will typically submit an input string using any of the various vendor symbols and exchange symbols in circulation or possibly may use an idiosyncratic symbol unique to that contributor. Thus, identifying a company security referred to by a contributor is difficult. Similarly, clients desiring to retrieve documents regarding a particular company will submit input symbols in a variety of formats including vendor symbols, exchange symbols or an isolated root symbol, which complicates the retrieval process.
The difficulties regarding the interpretation of security symbols illustrate a general need for a consistent and unambiguous system for referencing symbolically linked information so that the information may be accurately archived and retrieved.
Furthermore, the financial documents produced by these financial companies vary in type and topic. Some financial documents may emphasize a particular subject matter such as commodities, equity reports, industry reports, portfolio/asset strategies, derivatives, and/or foreign exchange/currencies. A particular company may produce documents predominantly related to a certain subject matter such as fund research or commodities. It would be advantageous for documents which emphasize a common topic to be readily retrievable. The authors of various financial documents use different research methodologies in order to produce the financial document such as fundamental, technical, quantitative or strategic research techniques. The methodology used by a person may affect the style, tone and conclusion of the financial document, therefore a person reviewing the financial document may want to know this information prior to reading the financial document. Also, financial companies may have various reasons for generating a financial document such as general commentary, forecasting, news reports and/or market data. Since financial documents may cover various topical subjects, are derived by using various research techniques and are produce for various purposes all which affect content, therefore a method of document archival and retrieval based upon these various criteria would be advantageous.
The present invention provides a method and system for the reference, archival and retrieval of symbolically linked information and the intelligent categorization of the information based upon subject matter, research methodology, publication purposes and primary subject matter. A master symbol database stores a plurality of master symbols, wherein each master symbol is formatted according to a predetermined structure. Each master symbol in the master symbol database is linked to a parent identifier that identifies a unique object. A categorical symbol database stores a plurality of categorical symbols which are also formatted according to a predetermined structure. The categorical symbol is linked to a unique document identifier that enables the retrieval of a document based upon its categorical assignment. Users may archive or retrieve symbolically linked information in an information database by providing an input symbol. The input symbol is normalized and the master symbol database is searched to find a matching master symbol. The parent identifier linked to the matching master symbol is then used to retrieve or archive information in the information database. If the input symbol includes a categorical symbol, then the categorical symbol database is searched to find a matching categorical symbol which is used to categorically retrieve or archive the information in the information database.
According to one embodiment, the present invention is applied in the context of a computer based document repository in which automatic archival of documents submitted by contributors and automatic retrieval of documents requested by clients is provided based upon analysis of an input symbol. The document repository stores a database of master symbols and linked parent identifiers referencing a plurality of objects or sub-objects. The document repository also stores a database of categorical symbols that are used to intelligently categorize the documents within the database. In the archival process, the document repository electronically receives a contributor submitted document and an input symbol pertaining to an object referenced in the document. The input symbol is normalized and used to search the master symbol database to find a matching master symbol. The document is then stored in a document database so that it is linked to the parent identifier corresponding to the matching master symbol. In regard to the categorical archival, the input symbol is used to search the categorical symbol database to find a matching categorical symbol. The document is then stored in a document database where the document identifier is linked to the matching categorical symbol. If the normalized symbol is not found in the master symbol database nor is it found in the categorical symbol database, an analysis of the contributor""s historical patterns is performed to attempt to resolve the indeterminacy. Clients may retrieve documents stored in the repository by electronically providing an input symbol. The input symbol is normalized and at least one client preference parameter may be used to resolve any indeterminacy in the input symbol. The normalized symbol is used to search the master symbol database in order to find a matching master symbol. The parent identifier linked to the matching master symbol is then used to retrieve documents linked to the parent identifier. The normalized symbol is also used to search the categorical symbol database in order to find a matching categorical symbol. The matching categorical symbol is then linked to the document identifier in order to retrieve documents with the appropriate categorical symbol.