1. Technical Field
The present invention relates to the analysis of ad hoc configuration languages for validation of an Internet protocol (“IP”) network configuration.
2. Description of the Related Art
Each network component has associated with it a configuration file containing commands that define that component's configuration. Different vendors offer syntactically different configuration languages. However, the semantic information stored in these files is the same. This information is about the logical relationships and structures associated with standardized protocols. This information needs to be extracted from files and stored in a vendor-neutral format. Then, algorithms for validating configurations (i.e., checking whether they are consistent with requirements) can be written just once against this format, instead of once for every combination of vendor configuration language. A common format is a database with a vendor-neutral schema. A schema defines all the tables in a database and the column names and types in each table. This database is called the “semantic database.”
There are three basic challenges in the design of a configuration acquisition system. The first is the design of a vendor-neutral database schema for storing configuration information. The second is extracting information from configuration files without knowing the entire configuration language for a given vendor. The third is making the extraction algorithms robust to inevitable changes in the configuration language.
In such systems, the structure of the configuration file is first computed. Then, this structure is analyzed to compute or build the semantic database. As illustrated in FIG. 1, the file structure in these prior art systems is computed by writing a grammar that recognizes the configuration language and produces an abstract syntax tree. This, however, involves a complicated structural analysis algorithm. By definition, grammars recognize languages. Since the content of a configuration file cannot be known in advance, the grammar has to recognize the content of every possible configuration file, i.e., account for every possible protocol and its associated commands. Configuration languages are vast. Only a subset of these associated commands needs to be analyzed for the schema at hand.
To avoid having to recognize a vendor's entire configuration language, previous systems incorporate a pre-processing phase where they remove commands from a file that are not needed for the intended schema. However, a removal logic is hard to design for several reasons. First, it is dependent upon the schema—only the information that will definitely not be needed in any schema table can be deleted. Second, as the schema evolves to analyze new protocols, the removal logic has to be updated—what was irrelevant before may now become relevant. Third, sometimes the removal logic is expressed in the grammar itself. This distorts the grammar idea because grammars are used to specify legal syntax, not illegal syntax. In the absence of a clear guiding principle for writing grammar rules, these rules become prone to error.
A second drawback of previous approaches is the use of algorithmic methods for analyzing the abstract syntax tree to compute the semantic database. Because of the flexibility of ad hoc languages, the different pieces of information to synthesize semantic database tables can be located anywhere in the configuration file. This information has to be searched for based on definite criteria. Such a search is best implemented with a database engine with the criteria specified in a logical language such as SQL or Prolog. Algorithmic methods end up re-implementing the search features of databases, and therefore their complexity increases.
Telcordia IP Assure and PADS/ML systems compute file structure by writing a grammar recognizing the vendor's configuration language and use a parser, generated from this grammar, to construct an abstract syntax tree representing the file's configuration commands. See Y. Mandelbaum, K. Fisher, D. Walker, M. Fernandez, and A. Gleyzer, “PADS/ML: A functional data description language,” ACM Symposium on Principles of Programming Language, IP Assure, Telcordia Technologies, Inc. (2007).
IP Assure employs a schema loosely modeled after the Distributed Management Task Force (“DMTF”) schemas. It uses the ANother Tool for Language Recognition (“ANTLR”) system to define a grammar for configuration files. The parser generated by ANTLR reads the configuration file and, if successful, returns an abstract syntax tree exposing the structure of the file. This tree is then analyzed by algorithms implemented in Java to create and populate tables in its schema. Often, information in a table is assembled from information scattered in different parts of the file.
The IP Assure system can be illustrated in the context of a configuration file containing the following commands in Cisco's IOS configuration language:
hostname router1!interface Ethernet0 ip address 1.1.1.1 255.255.255.0 crypto map mapx!crypto map mapx 6 ipsec-isakmp set peer 3.3.3.3 set transform-set transx match address aclx!crypto ipsec transform-set transx esp-3des hmac!ip access-list extended aclx permit gre host 3.3.3.3 host 4.4.4.4
A configuration file is a sequence of command blocks consisting of a main command followed by a zero or more indented subcommands. The first command specifies the name routers of the router. It has no subcommands. Any line beginning with is a comment line. The second command specifies an interface Ethernet0. It has two subcommands. The first specifies the IP address and mask of this interface. The second specifies the name mapx of an IPSec tunnel originating from this interface. The parameters of the IPSec tunnel are specified in the next command block. The main command specifies the name of the tunnel, mapx. The subcommands specify the address of the remote endpoint of the IPSec tunnel, the set transx of cryptographic algorithms to be used, and the profile aclx of the traffic that will be secured by this tunnel. The next command block defines the set transx as consisting of the encryption algorithm esp-3des and the hash algorithm hmac. The last command block defines the traffic profile acix as any packet with protocol, source address, and destination address equal to gre, 3.3.3.3, and 4.4.4.4, respectively.
Part of an ANTLR grammar for recognizing the above file is:
commands: command NL (rest=commands|EOF) ->{circumflex over ( )} (COMMAND command $rest?);command: (‘interface’) => interface_cmd   |(‘crypto’)=> crypto_cmd   |(‘ip’)=> ip_cmd   |unparsed_cmd;interface cmd: ‘interface’ ID (LEADINGWS interface_subcmd) *  -> {circumflex over ( )} (‘interface’ ID interface_subcmd *)interface_subcmd: ‘ip’ ‘address’ a1=ADDR a2=ADDR -> {circumflex over ( )} (‘address’ $a1 $a2) |‘crypto’ ‘map’ ID -> {circumflex over ( )} (CRYPTO_MAP ID) |unparsed_subcmd;
The first grammar rule states that a command is a sequence of one or more command blocks. The ^ symbol is a directive to construct the abstract syntax tree, whose root is the symbol COMMAND, whose first child is the command block just read, and whose second child is the tree representing the sequence of subsequent command blocks. The next rule states that a command block begins with the keywords interface, crypto, or ip. The symbol => means no backtracking. The last line in this rule states that if a command block does not begin with any of these identifiers, it is skipped. Skipping is done via the unparsed_cmd symbol. Grammar rules defining it skip all tokens until the beginning of the next command block. The last two rules define the structure of an interface command block. ANTLR produces a parser that processes the above file and outputs an abstract syntax tree. This tree is then analyzed to create the tables below. Note that the ipsec table assembles information from the interface, crypto map, crypto ipsec, and ip access-list command blocks.
ipAddress TableHostInterfaceAddressMaskrouter1Ethernet01.1.1.1255.255.255.0
ipsec TableHostSrcAddrDstAddrEncryptAlgHashAlgFilterrouter11.1.1.13.3.3.3esp-3deshmacaclx
acl TableHostFilterProtocolSrcAddrDstAddrPermrouter1aclxgre3.3.3.34.4.4.4permit
IP Assure's vendor-neutral schema captures much of the configuration information for protocols it covers. Its skipping idea allows one to parse a file without recognizing the structure of all possible commands and command blocks. However, the idea is quite hard to get right in the ANTLR framework. While an attempt is made to avoid writing a grammar for the skipped part of the language, the only method one can use is to write rules defining unparsed_cmd.