The field of the disclosure relates generally to processing natural language for machine learning, and more specifically, to methods and systems for generating from surveillance observations input as natural language, a mathematical function representative of membership of the observations and applying the function to discover new concepts and anomalous patterns of behavior.
Analysis of surveillance data is one major bottleneck in situational awareness in security applications of public spaces and war theatres. For example, there may be significantly more hours of video data available for a given security application than man-hours to review it. One task of an intelligence analyst in analyzing the surveillance data is the estimation of what an observable agent (e.g., a person, or by extension, a vehicle) intends to do based upon its previously observed behavior recorded on video. Recognizing the intent of such agents from their observed behaviors is a key fundamental computational capability with numerous applications: in intelligence and surveillance (e.g., monitoring vehicle movements on a large scale from overhead assets), cyber-security (estimating the continuation of a cyber-attack sequence), or health care (e.g., assistive technologies recognizing the intended goal of an elderly or disabled person).
Computer technologies, such as machine learning systems, are one technological route to artificial systems for understanding and tracking the behavior of others. Typically, the more “prior knowledge” that can be made available to the machine-learning system, the better the results that can be obtained. However, additional data typically requires additional analysis time, and the prior knowledge data must be coded using complex computer languages in order to be used by the machine-learning system. In typical systems, prior knowledge is typically resident with “domain experts”, i.e., people with experience performing the same task). Often, the domain experts (also called subject matter experts, or SMEs) are typically not mathematically sophisticated, do not have computer programming experience and do not have adequate time to program a machine-learning system with their prior-knowledge data in the midst of a mission-critical real-time task. Thus, it is desirable for the user to provide this domain-specific information in natural language.
Background knowledge expressed as natural language text that cannot be processed as given generates feedback to indicate how the text or sentences need simplification for machine understanding. Typically, the changes required are shortening of sentences, simplifications of syntax, and reduction in the number of clauses and prepositional phrases to be handled. Such simplified versions of natural language, similar to what a new speaker of a foreign language might comfortably handle, are called controlled natural languages.
Computer processing of natural language is very difficult, as compared to processing of traditional computer language. As mentioned, controlled natural language is a subset of natural language that maps to formal representations. Generally, controlled natural languages do not have provisions for processing time and space values other than distinguishing between times and locations in terms of an answer to a simple when or where question. At least one known controlled natural language relies on situation calculus for its formal semantics, and is implemented in a frame-like knowledge representation language, with a context mechanism, where each context is called a situation. Another known controlled natural language uses event calculus and relies on first-order logic theorem provers to implement its inference. In such controlled natural languages, the focus is on representing possible worlds that result from actions, and making inferences in those worlds that are definitely true or definitely false. Consequently, existing controlled natural languages do not address whether events are close to places or times of interest or the interrelationship of possibly overlapping regions in time and space that may have degrees of overlap. Rather, known controlled natural languages address temporal reasoning oriented towards planning and hypothetical reasoning. Spatial reasoning is not addressed as there is no translation from spatial concepts to a mathematic representation. Temporal reasoning is addressed only in the manner of providing hard logical constraints, either true or false, incapable of mathematically formalizing vaguely defined concepts, such as “near”, “close to”, “around”, or “at” versus “in”.