The present disclosure relates to text processing, and, more specifically, to matching un-shaped characters to shaped characters in a database.
Some languages contain characters which vary in shape based on the location of the character in a word. Such characters can have an “un-shaped” representation of the character and a plurality of “shaped” representations of the character. The plurality of shaped representations of the character correspond to various locations the respective character can appear in a word.
Windows operating system (OS) can store characters and words in un-shaped format. Power Systems or Mainframe computers running various operating systems (e.g., z/OS, OS/400, zLinux etc.) can store characters and words in shaped format. A structured query language (SQL) query can be used to retrieve text data from relational database management systems (RDBMS) storing data in shaped or un-shaped format. Thus, a query containing a string of characters in un-shaped format may not accurately identify a matching string of characters stored in shaped format.