The present application is related to image processing and more particularly to generating original groundtruthed images at the level of pixels.
The need for groundtruth is pervasive in document image analysis to train machine learning based algorithms and to evaluate algorithm performance. However, groundtruth data can be difficult to obtain. It is tedious and time-consuming to produce and relies on formats and standards that differ from research group to research group. The lack of abundant and high quality groundtruth data is seen as an impediment to improvements in various areas of image processing. Evidencing this state of affairs is that commercial OCR companies view their databases of groundtruthed documents as competitive assets.
Today groundtruthing is performed primarily by two means. The dominant approach is to perform groundtruthing in terms of enclosures. Largely as a matter of expediency, most enclosure groundtruthing focuses on labeling rectangular regions, as it is relatively straightforward to devise a user interface for dragging rectangles color coded according to label type. Another enclosure type groundtruthing process employs polygonal regions as the enclosing mechanism.
A second groundtruthing approach is to use a standard image editing tool such as Microsoft Paint or Adobe Photoshop to label images on a pixel basis. For example the brush action of these tools may be employed for the groundtruthing operation. However, this approach is tedious, imprecise and rapidly becomes very labor intensive.
Previous document image groundtruthing tools are discussed in L. C. Ha et al., “The Architecture Of Trueviz: A Groundtruth/Metada Editing And Visualizing Toolkit”, Pattern Recognition, 36(3):811-825, 2003, which uses the layout and visual structure in document images. Another, groundtruthing tool is described by Yang, et al., “Semi-Automatic Groundtruth Generation For Chart Image Recognition”, DAS, pages 324-335, 2006.
A new tool and method which eases the tedium and time to obtain groundtruth data is considered beneficial.