Sheets of paper containing text may be put through a scanner to create an electronic document with each page having a text image. The scanner could output the text image in any digital format, such as PDF (Portable Document Format), JPEG (Joint Photographic Experts Group), GIF (Graphics Interchange Format), TIFF (Tagged Image File Format), PNG (Portable Network Graphics), or others. Typically, a scanning process does not encode the text image in a way that tags words or characters having stylistic emphasis, such as underlining and bold. It may be desirable in some situations to identify words or characters having stylistic emphasis, referred to herein as emphasized text. Upon tagging, emphasized text can be the subject of further processing. For example, a system may perform a character recognition process that considers only the emphasized text in order to generate a brief abstract of the document without having to process other words in the document. Computing resources are conserved and processing could take less time if other text (non-emphasized text) are ignored by the character recognition process. Even in cases where computational cost is not a driving concern, lower recognition accuracy on emphasized text can be an issue in many character recognition (OCR) software. Accuracy can be even worse when there is a mixture of text styles on the text image. Thus, identification of emphasized text may allow a different character recognition algorithm to be applied to emphasized text to improve character recognition accuracy. In another example, a person using a computer to read the electronic document may want to jump directly to the emphasized text within the electronic document, in which case, the electronic document need not be subjected to any character recognition process. Accordingly, there is a need for a method and system for identifying emphasized text in an efficient and effective way.