Data and information pertaining to events, circumstances, and entities in various types of domains, such as sports, business and finance, crime, education, real estate, etc., is readily available. The subject invention and previous inventions described above function to use such available data and information to automatically create narrative stories that describes domain event(s), circumstance(s) and/or entity(ies) in a comprehensible and compelling, e.g., audience customized, manner.
The process comprises the following main steps:
1. Inputting relevant raw data
2. Computing derived features
3. Determining applicable angles
4. Ranking, selecting, and ordering angles
5. Generating the narrative in natural language
The output of the first four steps is a language-independent outline of the points, such as facts and interpretations, which constitute the story narrative.
The data and computational elements appropriate for use in each of these steps—raw data, derivations and derived features, angles and applicability conditions, ranking, selection, and ordering metrics and methods, and linguistic/lexical elements (words and parameterizable phrases)—are not universal. On the contrary, they depend critically upon the content vertical under consideration (e.g., sports, and in fact the particular sport, business, etc.), the genre or type of story being written (e.g., preview, recap, top 10 ranking, profile, etc.), the focus of the story (e.g., a company, a team, a person, etc.), its audience (e.g., experts, fans, children, etc.), and other factors.
For example: the vertical might be basketball; the type of story might be a “top 10 ranking” story (in which the top 10 competitors in some set are listed in order, with short descriptions of ranking changes and reasons for each entry); the foci might be players in a given league (e.g., the Big Ten); and the audience might be Big Ten fans. The data and computational elements necessary to appropriately carry out each of the five components of the process outlined above in order to generate such narratives—“top 10 rankings of Big Ten Basketball players”—is dependent on all these defining factors or characteristics of such narratives. To take an obvious example, because the focus is on ranking players rather than teams, the appropriate raw data, derived features, and angles revolve around individual player metrics—e.g., points scored, rebounds, etc.—as opposed to team records—e.g., games won or lost, point spreads, etc.
Because of their dependence on these defining characteristics or factors of the narratives to be generated, adapting a narrative generation system to a new content vertical, story type, focus, and/or audience, etc., requires specifying these computational and data components appropriately. In other words, in order to adapt the system to new story types, content verticals, etc., you must specify the appropriate performance and behavior required to carry out each component of the process outlined above as well as the appropriate data representations required for each step. In particular, you must specify, among other things:    1. The raw features that must be taken as input, how they will be input, and the appropriate data models for structuring, organizing and representing these features in the system.    2. The derived features that must be computed, how they are to be computed, and the appropriate data models for structuring, organizing, and representing the results in the system.    3. The appropriate angles, their conditions of applicability, how those conditions are to be computed, and how the results are to be structured, organized, and represented.    4. Appropriate methods for ranking, selecting, and ordering these angles, and how the results of these processes are to be organized and represented.    5. Appropriate data and methods for generating the resulting narrative in natural language, and appropriate data models for structuring, organizing and representing these data and methods.
To the extent that the necessary derivations, angles and their applicability conditions, etc., are specified as procedures directly expressed as code in some general purpose programming language, specification of the computational and data elements delineated above will require a significant programming effort. Adapting a system architected in this way to a new content vertical, story type, focus, and/or audience will therefore require skilled developers and significant time and effort to design and implement these procedures, in many cases from scratch. As a result, a system architected to carry out the processes described above, where the necessary components are specified directly in a general purpose programming language, is not horizontally scalable—it is relatively difficult and expensive to adapt such a system to a new content vertical, story type, etc.