The present disclosure relates generally to data parallel processing of string and unstructured text, and more specifically, to exception preserving access to strings and unstructured text as a sequence of data parallel operation.
In general, contemporary parallel data processing systems that execute parallel vector operations to process strings require access to multiple bytes in memory concurrently. Yet, during this type of parallel data processing, using the vector memory access instructions implemented in these systems to load a string may exceed the string's termination character and, as a result may raise spurious exceptions if the memory access spans a protection boundary, which lead to processing termination. Further, contemporary data processing systems configured to avoid parallel accesses that span across protection boundaries have shown to produce excessively long routines that are not suitable for in-lining and have made short string operation performance suffer. Recent vector instruction set advancements have been implemented that permit software to speculatively load beyond the end of a string using a no-fault vector load instruction, realizing the performance advantage of using a high-bandwidth vector load instead of a low-bandwidth scalar load, such that if the memory access spans a protection boundary into a region of memory that does not exist. Instead of a protection violation exception, the no-fault vector load instruction returns a default value for any data in the memory access corresponding to that region of memory without raising a protection violation exception. The expectation is that the termination character of the string will be located prior to the protection boundary even though the vector load happens to load beyond the end of the string and even span a protection boundary. The default value returned is configured to allow processing to continue normally without providing any data to any region in memory that the program does not have permission to access. For example, for string processing, the default value could be configured as 0, corresponding to a string termination character. However, a problem is created with malformed strings (i.e., strings that erroneously are defined to span a protection boundary into a region that does not exist) with this solution, making a malformed strings to appear to the processing as a normally-terminated string, when normal string processing would have encountered the protection violation since the string terminating character would not have been encountered. Thus, the loss of a valuable software debug tool (i.e., detection of malformed strings and similar errant data) is the cost of accelerating string processing using data-parallel vector instructions.