The invention relates generally to information retrieval, and more specifically to a technique for automatically and intelligently retrieving information. In particular, the invention relates to gathering information about business entities or industries by retrieving information on newsworthy events.
A wide variety of applications require data mining across multiple information sources. For example, monitoring customer business risk is a critical element of the corporate lending process, both to assess the repayment risk of new loan customers and to monitor the repayment risk of current customers. There are several commercially available tools that permit financial analysts to monitor the financial health of a business entity by analyzing its publicly available financial data. Typically, these tools utilize quantitative financial data to generate risk scores indicative of the financial health of the business entity. Examples of quantitative financial information include financial statement reports, stock price and volume, credit and debt ratings and risk scores related to the business entity.
However, the quantitative data does not provide all of the information that is pertinent to customer risk. Moreover, since quantitative financial data is typically generated quarterly, the tools do not take into account other forms of information such as events related to the business entity that may indicate business risk and that may arise between financial statement reports. For example, these tools do not consider qualitative business event information that may arise before the release of a financial statement such as government investigations, management transitions, debt restructuring, or an entity losing several significant customers. Such business events also have considerable bearing on the overall risk of the business. Events outside of the business, such as government regulatory changes and industry events, also impact business risk. Additionally, these tools generate risk scores with the assumption that the financial statement used to generate the score is accurate.
In order to account for the disadvantages associated with the above tools, financial analysts typically monitor qualitative and quantitative business event information related to a business entity or industry through the use of forensic accounting techniques. Qualitative and quantitative business event information includes, for example, business event data that reflect certain behavioral symptoms or catalysts of financial stress associated with the business entity such as executive staff changes or accountant changes. The forensic accounting techniques determine financial inconsistencies related to a business entity through on-site audits of company books, interactive data mining of commercial databases, analyzing information in publicly available sources, surveying of financial notes related to the business entity, interviews with executive teams, and assessment of accounting standards and control systems. In particular, financial analysts manually read through business, industry and trade news publications for intelligence gathering of qualitative business event information that relates to a business entity and then use their judgment to predict the business risk of the entity. Effective intelligence gathering typically requires the extraction and assimilation of information from an extensive and diverse set of information sources. This often includes collecting and integrating both historical and current information from multiple data providers.
For example, in order to effectively assess the health of a business entity, information sources must be accessed and mined for relevant information, and then the information must be assimilated. This can include reviewing financial statements, financial footnotes, news (such as announcements of new product offerings or pending litigation), press releases, insider trading data, 8-K events of material significance, analyst commentaries, commercial credit ratings, and stock price data. Some and perhaps all of this information may be required to perform an effective analysis of a business entity's historical performance and current state of health. Additionally, if the business entity or industry is subject to ongoing monitoring, new information must be collected proactively. Along with the information collection, if information of sufficient significance is found, it may require a human be notified to then take additional action.
This manual process of collecting and analyzing qualitative business event information is traditionally ad hoc in both its methodology and coverage, and may result in significant delays or completely missing events of importance and missed recognition of trends that indicate overall business risk. Moreover, this process is very time consuming, especially with the increasing amount of information available on the Internet and in other media. Further, multiple heterogeneous data sources have to be accessed and monitored for both historical and current information. There is no single source of all of the potentially relevant information, and so this information must be gathered from different locations and, as a consequence, in different formats. Thus, the fusion and collection of such vast amounts of information is not standardized, not subject to the rigor of statistical analysis, and is not scalable. Moreover, it is desirable to support adding new sources (and possibly removing old ones) over time, as new information sources are found or become available and old ones become obsolete.
Additionally, when evaluating the health of a large portfolio or an entire industry, it quickly becomes cost-prohibitive to capture all of the information on all of the companies in the portfolio or industry. Any experienced credit analyst recognizes that certain information is only needed in certain situations. For example, a lender may only consider it necessary to examine insider trading patterns for companies that are exhibiting a deteriorating operational cash flow position for which the lender has extensive exposure. For other companies, such as those where the exposure is low and the financials are otherwise strong, the time and effort to collect and utilize this information is simply not cost-effective. Additionally, it is also important to choose a suitable data provider based on its strengths for each type of information required. For example, if a lender requires information regarding recent CEO changes for a company, this information may be acquired from low-cost sources such as the Wall Street Journal, or from a significantly more expensive product from Factiva. Currently, financial analysts assess the data sources available, subjectively weigh the pros/cons of each provider (or combinations of providers), and then purchase those sources which seem the best suited. Further, each department within the same organization separately purchases and collects often very similar information leading to redundancies and overspending.
Many attempts have been made to automate the process of collecting this type of data. However, the current techniques build separate automation operations specific to each provider to automate the process of collecting the data. Further, a separate automated system may be required to fuse the information. Additionally, none of the current techniques deals with fee-based sources for data mining or demonstrate the ability to work within cost constraints. Typically, natural language processing (NLP) techniques are being used to identify specific word patterns in news articles, press releases, and financial footnotes to help automate the extraction of materially relevant events. Text-mining software may use these NLP techniques to search textual sources for events such as CEO and auditor changes. However, the technology is considerably less effective than a human at understanding the breadth of information relayed in text, due in large part to the complexities of the English language and the many ways in which ideas can be expressed. Thus, an information extraction system has difficulty understanding the limitless ways in which concepts can be expressed in the English language.
It is therefore desirable to provide a cost-effective and efficient technique for automatically retrieving relevant and useful information from diverse information sources. It is also desirable to provide a deliberative learning technique for intelligent information retrieval. Additionally, it is desirable to provide a technique for fusing and collecting such vast amounts of information in a standardized manner so as to analyze the information.