1. The Field of the Invention
This invention relates to systems, methods, and computer program products for transforming interchange file format messages, such as XML messages, in an efficient manner between a sender and a receiver.
2. Background and Relevant Art
Interchange file formats, such as XML are becoming increasingly popular due in part to the options they afford for data viewing or presentation across a variety of computing platforms. There are several different types of interchange format files, some of which are geared toward picture data, others for sound, and still others for general text. In any event, interchange data formats are commonly used for a wide range of data exchanges, ranging from low complexity instant messaging exchanges to high complexity, secure electronic commerce exchanges.
In general, an interchange data format comprises a series of elements and codes, a such as namespace codes in XML, that can be read by virtually any computing platform with an appropriate reader. For example, a file represented in an interchange format may include one or more elements that indicate that the file is an interchange format file as well as an identifier that indicates what type of file the message is (i.e., text, audio, video, etc.) The type of interchange format may also specify certain codes in the file that indicate a number of visible and hidden properties associated with the given message. For example, visible properties associated with the document might include information about how certain types of text should be viewed, such as may relate to color, font size, arrangement, and so forth. Hidden properties, on the other hand, might indicate the entity that created the document, that entity that is intended to view the document, security information, and so forth.
For a document to be sent, received and viewed as intended between a sender and receiver, the sender will usually transform the entire message appropriate for the relevant interchange format. On the receiving end, the receiver will then read the file upon receipt, assuming the receiver is capable of reading the specific interchange format. One type of transformation method often used with interchange formats such as XML is called “canonicalization”, transforming the message or message unit to reduce it to a standard form so as to eliminate or minimize insignificant differences. For example, in the case of XML message units, canonicalization rules include specific rules on namespace selection for rendering, linefeed normalization, character translation, scope of internal commands, and so forth.
Unfortunately, some communication modes are more sensitive to inadvertent intermediary changes than others, and can result in an intended recipient failing to view the message appropriately, or receiving the message in the first instance. For example, in high security implementations, data that is transformed with a small amount of variability or mismatch at one end may not be viewed or received appropriately by the intended recipient, or even transformed in the first instance. Similarly, high security data that may be transformed appropriately at the sending end may still not be received appropriately at the receiving end due to altering of the data during transit, which can occur in allowed ways.
Accordingly, accurate and consistent transformation methods are increasingly important for interchange format data transfers, particularly in sensitive areas such as secure communications. Unfortunately, due at least in part to the care required to transform a message appropriately, transformation methods can constitute an expensive aspect of a computerized system's processing.
For example, when a message is intended to be transformed before being sent, a buffer is created proportional in size to the entire message size. A representation of the entire message is then loaded into the buffer, and a reader and writer are created for the message representation. In the case of XML messages, the message representation may also be loaded into a Document Object Model (“DOM”) component, which reads each element of the message into the buffer, and subsequently writes out the message as a single transformed output. In any event, a transformed output is then sent to the recipient.
At least one problem with this approach is the need to create a buffer large enough to read the entire message, and a buffer large enough to receive the entire transformed message. In some cases, one or more buffers will also be allocated both for the entire transformed and non-transformed version of the message before it is sent to the recipient, or at the recipient computer system after being received. Especially for larger messages that may be in the gigabyte size range, such buffer allocation can be taxing on the processing and memory resources of a computerized system handling a large number of requests. These and other similar problems can be exacerbated in some cases since the sending computer system will implement both read and write functionality for the sending process, even though the sending computer system typically reads at a much slower rate than it can write.
A reverse process, having similar complications, occurs at the recipient end. In particular, the recipient computer system generally needs to allocate a buffer large enough for the entire message, even though the recipient computer system is receiving only small chunks of the message from the sender at a time. In some cases, the recipient computer system may also allocate duplicate buffer space so that there is a buffer for transformed and non-transformed versions of the entire, received message. Furthermore, the recipient computer system has to expend additional resources loading the transformed message into a buffer, as well as reading and writing out the elements of the message. Having a similar but reverse problem compared to the sending computer systems, the recipient computer system typically writes much slower than it reads. As such, the allocation of several large buffers, and the creation of both read and write process functionality for both sending and receiving computer systems represents fairly taxing on processing and memory resources.
Accordingly, an advantage in the art can be realized with systems, methods, and computer program products that provide efficient transformations of interchange format data at both a sending and receiving end. In particular, methods of transforming data without loading each entire message into memory would be an advantage in the art. Furthermore, methods of transforming data which take advantage of the relative strengths of the sending computer system or the receiving computer system would be an advantage in the art.