XML post processing such as schema validation is typically done by software running on general purpose CPUs. Hardware acceleration techniques have been applied to allow the performance of some aspects of schema validation to be increased significantly. However, one of the essential parts of schema validation in accordance with the W3C XML Schema standard, namely simple type checking (e.g., string and format checking), is difficult to accelerate. This is mainly due to the requirements of the standard related to the handling of whitespace characters, which include, for example, spaces, tabs, line feeds, and carriage returns.
The W3C XML Schema standard specifies three ways of handling whitespace: “preserve”; “replace”; and “collapse”. “Preserve” keeps the whitespace as is, “replace” normalizes the whitespace to 0x20 characters (spaces), and “collapse” removes all leading and trailing whitespace and collapses any contiguous whitespace to a single 0x20 character (space). The handling of whitespace is performed before any string checking. While “preserve” and “replace” are relatively simple, “collapse” requires keeping state, which can slow processing down significantly (depending on how much whitespace is part of the string).