Interacting with documents is an everyday part of life. For example, renters and landlords may sign a written contract (i.e., a lease) to establish each party's rights and responsibilities with respect to the property being rented. A home purchase typically requires the buyer's signature on multiple documents/contracts to establish the buyer's responsibility to pay the mortgage and not rescind the offer that was agreed upon and the seller's responsibility that the seller keeps the promises that were warranted. Consumers download a piece of software on their computers and have to click “I Agree” to accept the terms of an end user license agreement. Employees sign an employment agreement binding them to the company's rules and regulations while employed and sometimes, thereafter.
Interacting with documents is of course also a frequent occurrence in the professional world. For example in the business world, all public companies, domestic and foreign, trading on any of the US exchanges, are required to file registration statements, periodic reports, and other forms describing any significant changes to the US Securities and Exchange Commission (“SEC”). Filings typically contain financial statements as well as large amounts of ‘unstructured text’ describing the past, present and anticipated future for the firm. Corporate filings provide a central window into the health of the filing company, and thus investors consider them immensely important.
One of the most important corporate filing is the SEC Form 10K. All US companies are obligated to file a Form 10-K within 90 days after the end of their fiscal year. 10-Ks are much more detailed than the annual reports sent to shareholders. Most pages of a 10-K include unstructured, qualitative descriptions about various aspects of the company and hence a very useful context for understanding the financial data. For example, a company should mention if the earnings being reported might have been inflated because of a change in its accounting method or expenses shifted to a later period, or if extra sales were included because of a change in its fiscal end, or whether revenues not yet receivable were included in the computation, or impact of certain expensed/capitalized items. Ideally, explanation of all such changes that lead to different interpretations of the financial statements should be included in the Notes to Financial Statements (“NTFS”). Detailed analyses of risks that the company faces and their potential impact should be included in the Management Discussion and Analysis (“MD&A”) section. It is also intended to assess a company's liquidity, capital resources, and operations. Hence, it is one of the most read and most important components of the financial statements.
Since the text in different filing types is a means of communication from the management to the investors, these textual disclosures provide a means to assess managers' behavioral biases and understand firm behavior. It is also important to have a method to maintain the history of the behavioral biases of managers or investors for future backtests and research. A careful consideration of the textual information has become even more important since the advent of XBRL, since it provided a structure to the numeric information, and hence encouraged the possibility of shifting gray areas of accounting to the textual information. However, this lack of a consistent format coupled with the fact that textual information is hard to quantify, makes it a challenging automated task and hence requires domain experts to interpret it.
Currently, a known approach to analyzing documents, such as contracts and contract clauses or financial documents includes a manual component of asking another person, perhaps someone with more expertise on the subject, which language is best for the situation. While having access to this type of expertise may provide some benefit, the person drafting the contract may not have time to reach out to another individual and/or that individual might be too busy to lend his/her expertise. In addition, the individual being asked may know more about the area than the person drafting but may not be a true expert.
Another known approach is looking through other documents, such as other contracts and contract clauses or other financial documents. For example, some law firms might have a document repository where all the contracts are stored. A lawyer searches the repository to bring up a particular set of contracts and/or contract clauses that might be applicable to the client's situation. However, this type of system does not analyze the contracts and contract clauses to determine what might be the “market standard” language. Market standard language refers to language that is generally accepted by members of a market. For example, the market standard language for a non-compete clause in a salesperson's employment agreement might be different than the market standard language for a non-compete clause in an engineer's employment agreement.
Additionally, each known approach described above consumes a tremendous amount of valuable time. For example, when attempting to engage an individual with more expertise, the drafter may have to work around the individual's schedule which may be cumbersome and time consuming. In addition, that individual may not have the time necessary to discuss the various options of contract language. In another example, when searching other contracts and clauses, several precious hours may be wasted trying to find the necessary language for the drafter's scenario with the potential for little to no success. This practice is inconvenient and wastes multiple hours of researching language instead of focusing on other important aspects like, for example, the discussion and negotiation with opposing counsel.
Further most institutions, such as investment firms, do not have dedicated analysts who can speedily unearth any inaccuracies, management latitude, or resources dedicated to interpreting text that is just very hard to read. Apart from speed, the breath of the companies covered is also limited at best. An automated system can overcome the speed and coverage challenges, but the fact that textual information in corporate filings is largely unstructured is an additional challenge. Unstructured information neither has a pre-defined data model nor fits well into relational tables. Even though the text may be physically organized into sections and subsections, such as the Notes to the Financial Statements in a SEC filing. Yet, from the perspective of information processing, it is unstructured because of irregularities and ambiguities that make it difficult to automate processing. Compared to data stored in a structured data model like a database or annotated/tagged documents, it is typical for unstructured text to be higher dimensional and require further pre-processing.
Accordingly, the inventors have recognized the necessity for additional improvements in analyzing conceptually-related portions of text, in particular contracts and contract clauses.