1. Field of the Invention
The present invention generally relates to methods and systems for structuring and formalizing unstructured and imprecise information. In particular, it relates to methods and systems for taking unstructured information and making it both structured and formal, with support from a computer system that provides guidance. Additionally, the present invention relates to a system for taking an imprecise description of a procedure or task and making it precise and formal with the assistance of a computer.
2. Description of the Related Art
A great, many activities—particularly creative activities—including engineering, require a practitioner to produce structured diagrams, documents or specifications of a solution to a customer's needs. For example, a civil engineer may produce blueprints and structural analyses when designing a new bridge, an information technology (IT) architect produces requirements documents, and design models when designing an IT system to meet a new business need, or a composer produces a score for a film. In each of these domains the result of the creative work is a highly structured specification or design sufficient to be constructed or acted upon by others.
At the same time, there are systems dedicated to these domains which represent information in the formal or structured concepts of the domain. As an example, a typical tool for IT architecture can represent IT constructs such as components, interfaces, and the like, while a typical tool for scoring a film represents the parts of the score, the notes played by the instruments, and temporal aspects of the film.
In contrast, at the onset of these activities, the available information is typically unstructured and varied in quality, quantity and detail reflecting the origins of the information from a variety of sources. Much input comes from interviews and meetings during which the practitioner may capture notes. Other input is prepared by people who do not share the practitioner's training or who for other reasons supply materials using an unstructured format, rather than using a tool in the creative domain. For example, an IT customer may specify requirements or business goals in a text document.
As a result of this mismatch between the content and format of the information available to the practitioner at the onset, and that required by the tools and artifacts of the trade, the systems dedicated to these domains often have limited appeal. In both the musical and IT architectural domains, for example, research shows that practitioners do not use domain-specific tools until after they have worked out many details of the solution. The result is often inefficiencies and inaccuracies.
Moreover, although most data in the world is in a format that is unstructured and informal (this includes documents containing information that is in the English language and other languages), computers require that information be in a format that is structured and formal before they can perform sophisticated processing activities upon that information, such as executing a sequence of steps using the information, detecting inconsistencies in the information, or the like. Formal representations of information have a precise syntax and semantics that may be mathematically defined. This is what allows information that is in a formal format to be processed by a machine. For example, creating formal representations with precise semantics is necessary for several information processing activities, such as, for example, querying databases, creating formal models of systems, reasoning over collections of data, and the like.
Unfortunately, the creation of formal representations can be a bottleneck to applying these and related processing activities because the experts who possess detailed knowledge concerning the domain of an application are often unskilled at producing formal representations.
There are two conventional approaches that attempt to address this bottleneck. However, both of these conventional approaches are severely limited. Conventionally, when an unskilled person wishes to create a structured, formalized representation of information, they can either: (1) find an expert in formalized notations to help them, or (2) try to learn the notation themselves. The first approach is severely limited because, as explained above, experts in formalized notations who are familiar with the domain are rare and the second approach is limited because of the difficulty of learning the complexities of any particular formal notation.
There are existing techniques that attempt to address processing and management of unstructured information. These conventional approaches are known as Unstructured Information Management (UIM) approaches. However, these approaches are typically concerned with issues like document categorization, clustering, and retrieval, as opposed to reasoning and machine-based execution of the information.
There is a need for a method and system that can automatically create elements in a formal representation using informal information such that the information becomes suitable for machine processing.