By way of general background, in computer science, an abstract syntax tree (AST) is a tree representation of the syntax of some source code that has been written in a programming language, or some other functional equivalent representation of the source code. Each node of the tree denotes a construct occurring in the source code. The tree is abstract in that the tree may not represent some constructs that appear in the original source. An example of such an omission is grouping parentheses, since, in an AST, the grouping of operands is implicit in the tree structure.
An AST is often built by a parser as part of the process of compiling source code. Once built, additional information is added to the AST by subsequent processing, e.g., semantic analysis, which can result in the production of an abstract semantic graph (ASG) based on an AST. An ASG is a higher level abstraction than an AST, which is used to express the syntactic structure of an expression or program. In computer science, an ASG is a data structure used in representing or deriving the semantics of an expression in a formal language, for example, a programming language.
An ASG is typically constructed from an abstract syntax tree by a process of enrichment and abstraction. For example, enrichment can be the addition of back-pointers or edges from an identifier node where a variable is being used to a node representing the declaration of that variable. Abstraction, for example, can entail the removal of details, which are relevant only in parsing, not for semantics.
In this regard, current representations for semistructured data such as XML are limited to representing tree structures. With XML, representing graph structures requires using explicit references, such as XML_ID, which introduce complexity and lacks flexibility with respect to representation and storage of the underlying graph structures. For instance, use of XML ID requires that the type system define what an identifier is and what a reference is, which means such definitions are external to the underlying graph structures introducing difficulty of use.
Accordingly, there is an outstanding need for the ability to author complex graph structured data using a compact, human friendly syntax without the use of explicit identifiers. The above-described deficiencies of current representations of semistructured program data as graphs are merely intended to provide an overview of some of the problems of conventional systems, and are not intended to be exhaustive. Other problems with conventional systems and corresponding benefits of the various non-limiting embodiments described herein may become further apparent upon review of the following description.