Some computer programs, such as search engines, must process extensive amounts of data. In the course of this processing, these programs often create huge files which tend to take the form of a flat-file database, e.g., a table of entries which are separated by delimiters but which lack the structural relationships one would find in a relational or object database. Colloquially, one might refer to a flat-file database as a spreadsheet or a log file.
In order to express a computation over large databases, programmers must either develop simple, non-parallel scripts which take a long time to run or complex, parallel scripts which take a long time to implement, debug, and maintain. Consequently, there has been considerable effort to develop systems that allow programmers to create computational expressions which are relatively simple and which are processed in parallel over a distributed system, possibly comprising a high-availability cluster of commodity servers.
In this regard, the Apache Software Foundation has developed a collection of programs called Hadoop (named after a toddler's stuffed elephant), which consists of: (a) a distributed file system (see U.S. Pat. No. 7,065,618, whose disclosure is incorporated herein by reference); and (b) an application programming interface (API) and corresponding implementation of the MapReduce functionality developed by Jeffrey Dean and Sanjay Ghemawat. As to the latter functionality, see “Scalable Computing with MapReduce” by Doug Cutting (Aug. 3, 2005; OSCON). While an improvement over what went before, programmers using Hadoop must still implement, debug, and maintain relatively complex computational expressions in the form of structured calls to the interfaces in the Hadoop API and/or significantly extend Hadoop's implementation, and possibly also its API, in order to efficiently process large databases in parallel using a distributed system.
Formal languages, and their corresponding translators, enable computational expression. A formal language might comprise a programming language or a scripting language. Some programming languages are procedural or imperative, such as C and Java. These languages typically require that the programmer specify an algorithm, in terms of instructions, to be executed or run by a computing platform.
Other programming languages are declarative and allow the programmer to specify the result to be achieved, leaving the implementation for achieving the result to other supporting software. An example of a declarative programming language is Structured Query Language (SOL), which is ordinarily used to process data in a relational database. A scripting language might be a general-purpose language, such as Perl, or a special-purpose or application-specific language, such as Game Maker Language. To specify a formal language, one might create a formal grammar for that language, such as a context-free grammar.
A translator is a program that takes another program as its input. A translator might be a preprocessor (or pre-compiler) such as a C preprocessor, a compiler such as a C++ compiler or a Java JIT (Just-In-Time) compiler, or an interpreter such as a Perl interpreter. Typically, a preprocessor runs before a compiler and performs textual substitution on source-code programs. In the case of embedded SQL, the SQL preprocessor or pre-compiler substitutes procedure calls to an API for declarative SQL statements embedded in a host source-code program written in a procedural language such as C or COBOL. In Java, embedded SQL often involves the use of an API called Java Database Connectivity (JDBC), which in turn makes use of an API called Open Database Connectivity (ODBC).
The difference between a compiler and an interpreter is that a compiler is a pure translator that translates its input program into a program in another language, typically byte code or executable machine code. An interpreter ordinarily executes its input program on the interpreter's computing platform.
One might think of a translator such as an interpreter, as having a front-end parser and a back-end interpreter. Typically, the front-end parser will translate an input program into an intermediate representation, such as an abstract syntax tree, while detecting any lexical, syntactic, or semantic errors dictated by the language specification. Then the back-end interpreter will execute the intermediate representation, e.g., by walking the abstract syntax tree.
Likewise, one might think of a translator such as a compiler as having a front-end parser and a back-end code generator. Typically, the front-end parser will translate a source-code program into an intermediate representation and the back-end code generator will generate optimized code, e.g., executable machine code, from the intermediate representation.