1. Field of the Invention
The present invention relates generally to methods and systems for using computers and software for generating text and documents, and more particularly, to a computer-based method and system for generating sentences and paragraphs based on collected raw data, one or more user inputs, and a processing engine that builds a text or document in real time using a text model that includes a decision tree. The engine uses the raw data and the decision tree, both of which were selected based on the user input, to automatically select appropriate predefined sentence elements for the text being built and combines the selected sentence elements based on a set of connector rules provided in the text model and in an order defined by the text model.
2. Relevant Background.
Today, the world looks to the Internet and web-based information and news services to obtain all or a large percentage of their daily news and to perform research on numerous topics. Most of these people demand that the information provided by the web-based services be fresh, with updates provided on a very frequent basis. For example, in the financial industry, users want to know the current status of the stock and commodities markets and a particular stock or commodity, not what occurred yesterday or even a few hours ago. Likewise, sports enthusiasts want to follow the action of games and tournaments in real time as if they were watching or listening to the game live. Similarly, weather reports are expected to be tailored to a particular activity and geographic area and to provide more up-to-date information than is provided by newspapers. There are many other examples of users obtaining information online over the Internet, but a common theme is that users expect and demand that the information be current or at least fresher than information provided in printed sources.
An advantage that printed sources provide over many online sources is that a writer has taken the time to process a volume of information, such as stock exchange data, has condensed the data into a useful amount, and importantly, has written sentences or text that can easily be read and understood by a reader of the printed source. To provide a similar service, online services in some cases have resorted to providing teams of writers whose task is to quickly digest incoming data and provide text, but this solution has proven to be unworkable in most cases and typically results in a significant delay. More and more, online services are attempting to provide information in real time by having a computer and associated software applications generate text and charts or other graphics based on collected and processed raw data. The computer-generated text needs to be updated on an ongoing or periodic basis as the raw data, such as weather information or the status of an athletic event, is rapidly changing. To date, a number of approaches of generating text with computers and software have been implemented, but none have fully addressed the demands of the online information industry for providing real-time text that is useful to readers or subscribers and that effectively simulates text written by a human writer in a time-sensitive manner.
A common text updating approach used by online information services is to provide fill-in-the-blanks text in which the blanks are updated based on processed current data. In these services, a form sentence or paragraph is provided in which one or more words are changed based on current information. However, the majority of the text is repeated regardless of the current information or the content of the user request, and the repeated text may be irrelevant and even misleading to the reader. For example, a user may request stock market or trading information by inputting a stock name or symbol. The online service uses a software application that retrieves at least some current market data and information on the stock and returns text to the user. The returned text typically includes one or two sentences that are provided for any input stock such as “The market is up/down today” which is generated by selecting the word “up” or the word “down” based on the retrieved data. The application may further provide another sentence that states “The XXX stock is at $20” with the “XXX” being provided from the user input and the “$20” being provided from processed current data. Further, a number of sentences or paragraphs may be included in the returned text that are generic to the market, i.e., not updated for the stock input, or that are specific to the stock but not currently updated, i.e., written by a human some time prior to the user request and input. For example, “Stock analysts rank this stock a buy” with this repeated text being provided regardless of current conditions. In some cases, data is provided in raw form without descriptive text, such as financial ratios and values and graphs, which are difficult for average users to read or understand. The resulting text of these fill-in-the-blank systems is generally rather basic and typically is stilted or otherwise readily recognizable as being machine generated. Often, such machine-generated text provides information that is not useful to the user because it is not specific to the user's input (such as a particular stock), is too simplistic (such as simply providing a stock price or whether the stock is up or down), and is locked to a single format or sentence construction with only small portions or single words being updated based on current information.
A large number of computer-based text generators have been developed in other industries, but generally, these text generators are limited to processing a pre-existing, source text to generate an output or target text and do not process current data to create a new, up-to-date text. For example, many translation systems that are computer-based have been developed, such as software applications that utilize word mapping or natural language processing techniques. These systems have been developed for translating text prepared in one language, i.e., the source document or text, into text produced in a second language, i.e., the output or target document or text. These computer-based systems generally are dictionary-based and attempt to comply with the many syntax and grammar rules present in the source and target languages to produce a target text that properly conveys the meaning of the source text. The use of computers for translation of written texts has proven difficult to implement because the rules of sentence construction, varying grammar rules, and even varying geographic lexicons of the source language are quite numerous and complex and directly effect the quality of the finished product or text. Generally, these translator tools attempt to map each word in a source document to a word in the output document while verifying that syntax and other rules of text construction have been satisfied. While providing a useful tool for generating text in a particular language, these translator and summarizing tools are not useful for creating up-to-date text for online information services from raw data or without a source document.
Hence, there remains a need for methods and systems for generating human-readable text in a timely manner from collected raw data, such as financial, weather, sports, or other content-specific information. Preferably, such methods and systems would create the text based on user input so as to create input-specific text from the collected and processed raw data. The created text also preferably would be less stilted than existing fill-in-the-blank products to provide more readable text that is much closer to human editorial text, which reads fluidly. Further, such methods and systems preferably would provide multiple models for constructing the text to limit the amount of superfluous or irrelevant information that is included in the text, and in some cases, the user may be able to select a particular model. The methods and systems preferably would act automatically to generate text without operator intervention and would be relatively easy to implement, update, and maintain.