1. Field of the Invention
This patent application is a continuation-in-part of U.S. patent application Ser. No. 11/249,938, now U.S. Pat. No. 7,433,877, filed on Oct. 13, 2005, by Yen-Fu Chen, et al. This invention relates to the fields of data control, and especially to fields of determining and checking input data characteristics to databases.
2. Background of the Invention
Various types of databases such as hierarchical, relational and object-oriented databases, offer consistent data storage, and provide transaction persistence, security, concurrency and performance. Consequently, a distributed architecture (30) that uses databases (34-35) as the back-end storage mechanisms and applications (33) for programming logic have become prevalent, as shown in FIG. 3. Many of these database arrangements may be accessed over a network (32) by users of devices (31) such as web browsers, wherein users can retrieve and enter information to the databases via the application program.
Most databases have a maximum input string length requirement which is often specified in characters. Most database designs, however, actually implement their maximum string length in bits, bytes or words. In such a computing environment, a front-end application (33) normally checks the length in characters of user input strings prior to submitting the queries to the back-end database (34, 35) so that it can prevent users from entering strings (36) that are longer than what database allows.
If the input strings are longer than database allowable length, an error message (37) is typically generated from database, will is often returned (37′) to the end user. However, this is an undesirable result because database error message may reveal table and column names, which is not only unprofessional in appearance to the user, it may violate one or more security guidelines. Moreover, the error message may not be user friendly.
In today's world, multi-language operating environments have increasingly become the norm of everyday business, and the application programs those enterprises use are required to handle multi-language input strings. It is not a troublesome issue in an purely English environment, such as a system using exclusively the American Standard Code for Information Interchange (“ASCII”), to check user input string length corresponding to database allowable fields since each character in ASCII encoding schema uses only one byte, and it isn't a big issue in other fixed byte-length native language encoding schema. In such a case, if a database specifies a maximum input string length of 128 characters in ASCII, one can assume that the database can handle input strings of length 128 bytes.
In another example, consider a database application which is operating in a Chinese-only environment which is utilizing GB5 encoding. GB5 stores every Chinese character in two bytes. To check input string length, the front-end application program can predict exactly how many characters are allowed corresponding to database fields by dividing allowable text entries in half (e.g. two bytes per character).
However, as different languages are used simultaneously within the same database, this can be much more problematic to address. For example, a common multi-language encoding schema is UTF-8. UTF-8 encoded strings can store characters using between one byte and three bytes per character, depending on the language from which the character or symbol is taken. For instance, a Chinese character in UTF-8 requires three bytes for encoding, while an Arabic character consumes only two bytes, a Hebrew character takes two bytes, a French character takes one or two bytes, an English character takes one byte, and special characters like currency symbols can take two bytes.
Many of today's front-end database applications are hard-coded to validate text entry length against database allowable length. Moreover, these applications are also often hard coded with logic to check whether text entry fields have at least one character to fulfill database requirement for not-nullable fields. Examples of validations done in code are shown in Table 1, using Sun Microsystem's Java™ code, and Table 2 using Java Script™.
TABLE 1Example Java Code to Validate Database Input String Lengthif (ss.strPoNumber.length( ) < 1) {throw new AsErrorException(getMessage(“50001”));}
TABLE 2Example Java Script to Validate Database Input String Length// Use Maximum attribute in the text entry field in web pages.//Maxlimit is a hard-coded value in the html page.if(field.value.length > maxlimit) {field.value = field.value.substring(0, maxlimit);} else {countfield.value = maxlimit − field.value.length;}
The length( ) function in the example of Table 1 checks whether the user's text entry has at least one character, and the maxlimit in the example of Table 2 requires a declaration of variable for allowable character length within the code scope. These are fundamentally flawed processes for checking input string length, especially in multi-lingual applications, for two reasons.
First, the maxlimit variable and the maximum attribute only counts the number of characters, not the number of bytes. In a multi-language environment, checking character length may produce wrong results because characters in UTF-8 can be one to three bytes in length, and the front-end applications cannot accurately predict whether a text string reaches the allowable database length.
For example, if there is a text entry field in a front-end application that uses a 10 byte database field, and a user enters a text string such as “I like IBM very much” in Chinese:                IBM        
Today's applications would calculate the total number of characters of this entry as 9, but this string actually uses 18 bytes (5*3+3) when encoded in UTF-8. The application will consider the text entry is less than the maximum length in database, so it will submit the entry to the database, the database will detect the error, and will throw back an error message that the length is too long. At this point, the user will not be able to know how many characters to remove in order to fit into the database field.
Second, even if the front-end applications check the data length in bytes, it is tedious to change hard-coded variables when requirements or design desire changes in database field length or from null to not-null attribute. Such simple changes require considerable of code re-work on front-end applications, increasing the project risk and slowing down the development pace.
Therefore, a method and mechanism is needed in the art to calculate text string lengths in bytes for multi-lingual text encoding schemes. Further, there is a need in the art in some circumstances to centralize input string length checking logic for applications, in order to enable rapid changes in text entry length and enforce the not-null attribute. In other circumstances, there is a need in the art to distribute input string length checking in order to efficiently leverage distributed and locally cached database storage efficiencies.