For computing tasks that require unstructured textual input for Natural Language Processing (NLP) tasks, it is often difficult to convert documents from a myriad of formats to “normal” sentences. For example, for a Question/Answer (QA) System that relies on volumes of unstructured sentences to parse in order to form the corpora during the ingestion process, documents that include diagrams provide great difficulty. Even if the diagram can be converted to a more textual format (e.g. HTML, etc.), the diagram can prove very difficult for the system to correctly interpret the semantics. One common approach is to just ignore diagrams and images and text that is structured in ways other than sentences. This tends to be easy to implement, but some content that might be very important is discarded. Another approach is to write a new converter for each of the many types of content, such as each type of diagram. While effective, this can be costly and time consuming as the types and layout of content multiplies.