Segmentation in optical character recognition (OCR) typically involves extracting individual characters from an image comprising more than one character. Segmentation accuracy can affect the output accuracy of OCR systems.
Some conventional segmentation techniques involve determining cross-correlation of an image with a kernel, similar to histogram-based methods. Such techniques can be ineffective, however, in situations where an input image has a significant amount of noise between characters. Other conventional techniques involve using a sliding window running across the image, testing whether the portion of the image within the window represents a known target character or a non-character. These methods can be ineffective, however, in situations where the font set is not known in advance.