Identification of textual content embedded in multimedia content is a challenging problem with many practical applications. Currently available optical character recognition (OCR) systems are mainly used in order to recognize such textual content. However, such solutions are insufficient in cases where the input content is not properly scanned, captured or printed into an accurate computer-readable text.
In additional, prior art solutions may have difficulty recognizing textual content that is in an unexpected font type and/or a particularly small font size (e.g., smaller than 12 point font). Inability of such solutions to appropriately identify textual content may lead to data loss and/or decreased efficiency as a result of compensating for such data loss.
It would therefore be advantageous to provide a solution that would overcome the deficiencies of the prior art by recognizing natural language character in multimedia content.