1. Field
The disclosure relates to computerized systems, and, more specifically, to systems and methods for correlating free text.
2. Description of Related Art
Customers typically provide variations while entering data for a given desired input into computerized forms. The variations often occur while entering data in free text fields where the customer has the option to speak, type, or otherwise indicate an entry without preselections being offered to the client as the proper response. Otherwise, preselections may be offered in customary lists and other compilations, such as “drop-down menu” input selections. The free text fields are especially helpful when the preselections to the customer may be quite large. For example, entering the name of a financial institution into a new credit card application may create an enormous drop-down menu of the tens of thousands of possible financial institutions in the United States. While the free text fields offer flexibility to a customer, the variations in possible entries for a known financial institution or other identity that the customer intends to enter into the free text field presently creates challenges in computer software in correlating the free text entry to a known identity. Electronic data systems currently have difficulty at times correlating the free text entry with the known identity due to variations in spellings, different first words, abbreviations instead of the full words and vice versa, and other differences between the free text entry and the known identity.
From the customer perspective, the result is sometimes a frustrating experience in online entries by the inability or slowness of the computerized data entry process to correlate what the customer considers to be a correct entry with the known identity. From a computer programmer perspective, the result is loss of productive hours for a department in managing large databases of even small variations in free text entries that are periodically updated and sometimes manually correlated to an appropriate known identity.
Outside the context of computerized correlation with its more exacting requirements than manual entries, the U.S. Government in the 1930's developed an algorithm for manually transcribing Census Bureau records of last names into compressed terms and indexing the terms for review. The algorithm appears to have included the following steps: (i) capitalizing all letters in the word, dropping all punctuation marks, and padding the word with rightmost blanks as needed during each procedure step; (ii) retaining the first letter of the word; (iii) changing all occurrences of the following letters to zero: A, E, I, 0, U, H, W, and Y; (iv) changing letters from the following sets into the digit given to the set:
1=B, F, P, V
2=C, J, K, Q, s, X, Z
3=D, T
4=L
5=M,N
6=R
The steps then included (v) removing all pairs of digits which occur beside each other from the string that resulted after the changing of the letters to the digits; (vi) removing all zeros from the string remaining from step (v) that were placed in the string from step (iii); and (vii) padding the string with trailing zeros and return only the first four positions, which have the form of an upper case latter followed by three digits.
In the 1950's, some adjustments were made to this procedure to replace various other consonant sounds with letters that may be referenced to the sets above and changed into the appropriate digit, as follows:
DG with G
GH with H
GN with N
KN with N
PH with F
MP with M when it is followed by S, Z, or T
PS with S when it starts a word
PF with F when it starts a word
MB with M
TCH with CH
A or I with E when it starts a word and is followed by a vowel from the list of A, E, I, 0
However, the algorithm apparently did not gain widespread favor. Apparently, the steps were not utilized for broad coding requirements in computer-based systems. Further, the steps lacked the ability to realistically correlate a free text entry of an identity to a known identity to a user, database, or other system.
Thus, there remains a need to provide a method and system for customers to enter free text data that may be correlated to a known identity using computer-based systems.