Reading text from photographs is a difficult computer vision problem that is important for a range of real world applications. For instance, one application of interest is the problem of identifying building numbers posted on buildings or adjacent thereto. With this information, more accurate maps can be built and navigation services can be improved. Unfortunately, while highly restricted forms of character recognition are essentially solved problems (e.g., optical character recognition of printed documents, or recognition of hand-written digits), recognizing characters in natural scenes is more difficult: characters and digits in photographs are often corrupted by natural phenomena that are difficult to compensate for by hand, such as severe blur, distortion, and illumination effects on top of wide style and font variations. As a result, systems based on hand-engineered representations perform far worse reading text from photographs than a typical human.