Electronic document file formats and in-memory representations used by software that process the documents will generally be either stream representations or object representations. Stream representations generally consist of a sequence of character codes or other primitive data elements into which are interspersed special non-character values or sequences of values that signal a change in state or context (e.g., text style, transform or drawing properties, or the beginning or end of a mode of parsing). Software that processes the data in such a stream must start at the beginning and visit each element in turn in order to determine the state reached at a given data element, since this state is the net result of all changes occurring up to that point.
On the other hand, object representations consist of a collection of objects representing component parts of the document. The objects can contain object properties, pointers or references to other, related objects, and a portion of the content of the document. For example, a paragraph object might include a line spacing property and the text of the paragraph, while a section object might include an ordered collection of pointers to all the paragraph objects and illustration objects that comprise that section. Often object representations are primarily hierarchical, the graph of object references forming a tree (e.g. documents contain pages which contain zones which can contain other zones and layout areas which contain columns which contain paragraphs). However, even in the case of hierarchical models, there may be additional object references separate from the tree graph (e.g. zones may contain references to the shared graphic objects that contribute to their boundary and the boundary of other zones, while paragraphs may contain references to shared text style objects).
Applications often choose a different representation for their file format than their in-memory representation. While each of the formats is efficient for certain operations, each format is also inefficient for other operations. For instance, while a stream representation is useful for string searches and other processing that does not depend on detailed knowledge of object properties, the stream representation will not be useful for accessing a particular object.