To simplify the understanding, data is typically represented graphically in all kinds of electronic as well as paper-based media. For example, charts are commonly used for presenting data on web pages, online- and print magazines. Charts depicted in print media can be digitized by capturing the chart with a camera or a scanning device for providing a digital representation of the chart. A large variety of different chart types exists, e.g. bar charts, pie charts, doughnut charts or the like. Charts of a particular type, e.g. a bar chart, may vary greatly e.g. in respect to the colors or textures, the size, type, position and orientation of bars and bar segments.
In case the data that is graphically represented in a digital bar chart image shall be used as a basis for further data processing, the complexity and diversity of digital bar charts has hitherto often resulted in poor data quality, the extraction of erroneous data or has precluded an automated extraction of data completely. Thus, in many cases a user had to type in the data represented by a bar chart in a target application by hand.
Even in case the data extraction from a digital bar chart image was performed automatically, the image analysis was computationally highly demanding. Accuracy and/or performance problems were observed in particular for chart segmentation tasks and for tasks related to the assignment of identified bars in bar chart images to categories and series.
Conventional bar chart image segmentation algorithms with high segmentation accuracy exist, but they were observed to consume lots of CPU power and thus often could not be used for performing chart segmentation for a real time data extraction task. Computationally less demanding bar chart image segmentation algorithms also exist, but they were often observed to be unsuited for extracting data from a chart due to insufficient segmentation accuracy. Due to the variability of chart types and their segments, a highly accurate detection of chart segments and segment borders is crucial for any further image analysis steps, e.g. for assigning bars or bar segments to particular series or categories.
Even in case a state of the art bar chart image segmentation algorithm was able to correctly identify all bar segments in a chart, the automated assignment of identified bars to respective categories and data series was often performed erroneously due to the high variability of existing bar chart types, in particular due to many possible coloring schemes and exceptions in bar coloring, e.g. for emphasizing single bars. Further sources of error are the use of similar colors and textures for bars of different data series and bars which represent a value corresponding to a height of zero pixels with the effect of not being visible in the chart image.