Natural Language Generation (NLG) systems operate to transform raw input data that is expressed in a non-linguistic format into a linguistically-expressed format. In this manner, data presented in a readily machine-readable format (e.g., a spreadsheet, binary data file, or the like) may be expressed in terms that are more readily consumable by human beings (e.g., sentences, paragraphs, and the like). As an example, NLG systems may be used to monitor and/or analyze datastreams to detect noteworthy events indicated by the data from the datastream, and output natural language notifications that notify users of the occurrence of the event in a manner that assists with interpretation of the data.
As a particular example, raw input data may take the form of a value of a stock market index over time and, as such, the raw input data may include data that is suggestive of a time, a duration, a value and/or the like. An NLG system may be configured to input the raw input data and output text that linguistically describes the value of the stock market index; for example, “securities markets rose steadily through most of the morning, before sliding downhill late in the day.”
Data that is input into a NLG system may be provided in a variety of formal structures. An example recurrent formal structure may comprise a plurality of individual fields and defined relationships between the plurality of individual fields. For example, the input data may be contained in a spreadsheet or database, presented in a tabulated log message or other defined structure, encoded in a ‘knowledge representation’ such as the resource description framework (RDF) triples that make up the Semantic Web and/or the like. In some examples, the data may include numerical content, symbolic content or the like. Symbolic content may include, but is not limited to, alphanumeric and other non-numeric character sequences in any character encoding, used to represent arbitrary elements of information.
The process of generating natural language may include multiple steps and processes. For example, input data may be analyzed to detect the occurrence of particular events, patterns, or inferences from the input data. These events, patterns, and inputs may be translated into a series of messages. Through a process known as document planning, particular messages may be selected and organized to create a document plan. The document plan may then be utilized to generate output natural language through processes known as microplanning and realizing.
Different domains often require different methods of selecting messages for use in a natural language generation process. For example, events, patterns, and inferences that are relevant to a series of messages derived from data provided by oil rig equipment require different analysis techniques than events, patterns, and inferences that are relevant in a medical context, which in turn are different from message selection and document planning techniques employed in a meteorological or financial domain. As such, generating document plans for different domains requires a significant investment of developer time and resources to generate code to select and organize messages into a document plan for each particular domain. Through applied effort, ingenuity, and innovation, Applicant has solved many of these identified problems by developing solutions that are embodied by the present invention, which is described in detail below.