1. Field
This disclosure relates generally to data processing, and, more particularly, to automated data processing of textual data.
2. Background
Natural language generation (NLG) of text has been a high growth area in the recent past. Companies such as Narrative Science, Automated Insights, ARRIA, and Linguastat have created various systems that can generate text “stories” from rich, structured, well-understood data sets, for example, reported company financials, sales analytics, sports box scores and statistics, sensor readings, weather information, and product SKUs. Other companies, such as Yseop have developed systems that are more interactive, using structured questions to elicit responses that can be used to provide desired or relevant information in response.
Such systems however, due to their structural nature, generate from the input some form of highly structured output story that is hard to differentiate from every other story generated by that same system. However, since the intent of such “stories” is largely to convey the data in narrative form, the narrative is of relatively minor importance.
To the extent the narrative is of increased importance, or of primary importance, the above-referenced systems are largely incapable of varying the content resulting from the exact same data, i.e., without other action multiple runs using identical data will generate the same “story” each time.
In an attempt to overcome that issue, one company, Linguastat, takes its NLG result handling a step further. The Linguastat product, called Marquee, generates titles and descriptions for a retailer's products using NLG techniques and then uses performance metrics to rewrite those titles and descriptions to try to hone in on the title and description that generates the most interest or sales for each media channel, e.g., web, mobile, etc. One major drawback of such a system is that it relies upon “after the fact” information, e.g., clicks or sales data, to make the changes. As a result, such reactive adaptivity can result in inconsistency of presentation and confusion, for example, a user may check the description of a product several times over time before buying, but may find that each time they go to the same product, the description may be somewhat different.
Moreover, even introducing human intervention into the process to review or edit the resulting NLG output cannot address the foregoing issues for at least two reasons. First, the very purpose of NLG is to minimize the human aspect of the process of text generation. Second, human intervention would merely add a subjective component to the process yielding results that would differ based upon the individual reviewing and might even might be inconsistent for the same editor/reviewer depending upon factors unique to that editor/reviewer, e.g., time of day, mood, time lag between reviews, intervening reviews of other NLG output, etc.
Thus, there is an ongoing technical problem present in the art in that current systems and methods are incapable of generating multiple versions of stories from the identical data or, to the extent they can by modifying a generated story, they lack the ability to automatically, prospectively, consistently and repeatably evaluate a particular output of an NLG system relative to any other different output it might generate.