Microsoft Office 2007 stores content as XML conforming to a schema. XML content may be found within user-generated documents such as, for example, Microsoft Word documents, Microsoft Excel spreadsheets, Microsoft PowerPoint presentations, etc. A user can define an XML schema within his document. Microsoft Office allows the user to insert XML tags from the XML schema into the document, thereby annotating the document. For example, if the XML schema defines a “name” element, and if the document includes an actual name, then a user may attach the “name” element to the name (typically, causing the name to be enclosed within opening and closing XML tags), thereby creating metadata that identifies that particular portion of the document's data as being a name specifically.
Microsoft Office stores the document in a file system as a compressed archive of multiple XML files. Typically, one of the XML files contains the document data itself, including any XML annotations that the user has made using the technique discussed above, while the rest of the XML files contain information regarding formatting, presentation, and other aspects of the document. For example, these additional XML files may describe information relating to fonts used in the document, multiple character set support, templates, etc. Microsoft Office 2007 documents conform to the Office Open XML format. Microsoft Office 2007 also supports the Open Document Format, which is another schema-based XML format.
Prior to Microsoft Office 2003, Microsoft editors (such as earlier editions of Microsoft Word) stored data in a binary, proprietary format. These editors (which are also called “document editing applications” herein) would store these binary files in a standard file system rather than a database system. Storing data in a file system instead of a database system has some distinct disadvantages. For example, a file system generally lacks the scalability that a database system provides in the quantity of data that can be stored. Existing database systems are capable of efficiently storing and searching petabytes of data. Database systems have also been able to provide reliability and high availability features that most file systems have usually lacked. However, because traditional file systems have, at least in the past, satisfied the rather simple needs of most users of document editors, documents traditionally have been stored in file systems.
In 2003, Microsoft Office started storing document data in an XML format rather than the binary format mentioned above. Database systems are much better at parsing and understanding XML-formatted data than they are at parsing and understanding data that is in a binary format. Data stored in a binary format is usually little more than a stream of bytes that cannot be interpreted. XML, in contrast, is a format that many database systems can understand natively.