In computer science, lexical analysis is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of tokens (strings with an assigned and thus identified meaning). A program that performs lexical analysis may be termed a lexer, tokenizer, or scanner, though scanner is also a term for the first stage of a lexer. A lexer is generally combined with a parser, which together analyze the syntax of programming languages, web pages, and so forth.
More specifically, a lexer is a software program that performs lexical analysis. Lexical analysis is the process of separating a stream of characters into different words, which in computer science is termed “tokens.” A parser goes one level further than the lexer and takes the tokens produced by the lexer and tries to determine if proper sentences, in the computer science sense, have been formed. Parsers work at the grammatical level, lexers work at the word level. When a document is “lexed” it is broken into tokens. When a document is “parsed” the result is the extraction of information from the document. Generally lexing proceeds parsing and most parsers include a lexer. So, for example a sentence lexer would take a block of text and break it into sentences by determining where the sentence breaks are. A sentence parser would take as input the output of the sentence lexer and create a structure that describes how the concepts in the sentence relate (something akin to a sentence diagram). As another example many XML parsers parse an XML document into a DOM (a tree like structure that contains all the information from the document in its structured form). In the process one would lex the input stream into tokens that the parser then uses to create the DOM. There exists another type of XML “parser” called a SAX parser. This parser may be viewed as a lexer that breaks the input stream into tokens and then utilizes the calling implementation to create the meaningful output.