Markup language is used throughout the Internet for files containing structured information. The structured information includes both the data itself, as well as information about the data. Such a markup language is, for example, the eXtensible Markup Language (XML) or the Standard Generalized Markup Language (SGML). One of the useful traits of markup language, and XML in particular, is that it is user-defined and can therefore be implemented on any machine—in other words, the markup language is Internet browser “agnostic.” Therefore, a web page administrator may set up a page using a markup language such as XML, and can be certain that any visitor to the page will be able to view the content, regardless of the visitor's browser software.
Unfortunately, this flexibility comes with a price, which is large file size. For example, XML is self-defined and easy for administrators to use because of its user-friendly textual environment. To achieve such a user-friendly environment, XML requires redundant tags and long, easy to understand references. When all such elements are combined into a file, the size of the file increases tremendously. As the size of an XML file increases, the formatting information in such a file can become a large percentage of the file's total size. When numerous XML files are transmitted over a network or other computer communications system, the system can become bogged down with the large files, which may slow system performance and cause errors.
Numerous compression schemes exist that attempt to resolve the problem of large file size due to the presence of formatting information, but such schemes are typically processor intensive and relatively expensive to implement. Many require expensive software programs, and therefore need to be run on a computer with full computing functionality, as opposed to a more specialized device, such as for example a router. Such a requirement increases system overhead associated with processing the files. Still other schemes depend upon finding repetitive sections of data of preselected minimum length within a large file and then using a shorthand reference in place of the repetitive section. A shortcoming of such systems is that compression cannot occur until such a repetitive section of data is located, and is therefore typically only effective in very large files.
What is needed is a quick and efficient method of compression that can be implemented inexpensively. Such compression preferably could begin at or near the beginning of a file containing structured information, and would achieve meaningful compression without unduly burdening the device carrying out the compression. As a result, a level of compression would not be the maximum achievable compression, but rather would be an efficiently-achievable level of compression. Also, such a method could be implemented by a specialized device, such as for example a network router, and would be simple enough that it could be performed without undue interference with the device's other functions.