The present invention relates generally to the field of imagery annotation, and more particularly to medical imagery annotation.
In various domains, there is need for an automated way of assessing and evaluating multi-modal content (text/image/video/audio) without having the true reference data: Medical Imagery/Report consistency, Student book publishing, Instruction Manual Developers, Construction Planning architects, Error Snapshot documenting for bug resolution. One key aspect in all of these textual descriptions are images associated with them. Usually these images are annotated with labels identifying the different segments of the image. There is no standardized way of labeling images in any of these domains and the labeling techniques can be completely open ended and subjective. No gold labeled image exists for these images, which can be used as reference. To the best of our knowledge there isn't any system that understands the annotated image and validates the free-form textual description (position, size, texture etc.) with the image referred. Additionally, on an average 1 to 3 dollars is spent per page on basic proof-reading and the general turnaround time is 3 days per chapter. There are a plethora of methods available for consistency/typo in natural language, however, there are no system for doing this between image and text.