Security software that needs to intercept data being printed so as to perform text analysis on the data faces challenges in analyzing such data. For example, items of data being sent to a printer are commonly not in plain text. Rather, such items of data are typically in-memory representations, such as one or more glyphs, of the how the printer should print the data.
Existing approaches for glyph-to-text conversion include calculating the text from the glyph using a constant difference value. However, with such approaches, the text calculated for non-American standard code for information interchange (ASCII) and/or non-English characters is often inaccurate. Such inaccuracy leads to incorrect text being classified as well as failures in the classification of multilingual documents when printed.
Accordingly, there exists a need for techniques to use font information present in a given operating system to accurately convert glyphs rendered to a device to text for content analysis.