In general, handwriting recognition is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touch-screens and other devices. Among various recognition techniques, a complete handwriting recognition system also handles formatting, performs correct segmentation into characters and finds the most plausible words. Similarly, optical character recognition (OCR) is the mechanical or electronic conversion of scanned images of typewritten or printed text into machine-encoded text. In handwriting recognition and OCR, the task is to recognize a handwritten sample or a scanned document and provide a Unicode string matching the text as output. During a training phase of handwriting recognition and OCR systems, a received text input may be segmented into graphemes. These grapheme recognition units may be used to train the system and generate a trained text recognition model based on the graphemes for use during a recognition phase in the system for recognizing a received text input. In some cases, however, the graphemes can be long and may present an obstacle during training and/or recognition which can adversely impact the recognition efficiency and computational cost in a text recognition system.