The searching of alphanumeric databases for one or more search character strings can be a time consuming procedure. In one application, an electronics equipment company (e.g., Lucent) that produces many different products may use parts obtained from many different parts manufacturers. The company may also assign Company Part Numbers (CPNs) that are different than the Manufacturers Part Numbers (MPNs). A problem exists when a Part Change Notice (PCN) process requires that the MPNs supplied by the parts manufacturer need to be found in the company's internal CPN database. The problem is that the part numbers MPNs provided by the manufacturer may be different than what is stored in the company's internal CPN database. Typically the PCN may contain hundreds of MPNs so that finding the MPNs in the company's internal CPN database can be a significant effort.
Traditionally a user does the look-up of these MPNs manually, one by one or with the use of “wildcards.” A wildcard is a technique of searching for one or more groups of characters, where each group contains a predefine set of characters and the groups are separated by a group of non-defined characters. For example, one wild card may be AB***CDE, where you are looking in character strings for the predefined group of core characters AB and CDE that are separated by 3 non-defined characters. Thus, such a wildcard search would select the character strings such as ABXYZCDE and ABABCCDE as satisfying the search criteria. Utilizing wildcards to find matching MPNs is user dependant since the user has to select the core characters very carefully.
There are a number of problems with this manual wildcard approach. It can be quite time consuming working through a long PCN list (one PCN could take nearly a week to complete). It also has a high probability for error (parts could be easily missed). Any error could be quite expensive. If a MPN is discontinued and the Last Time Buy (LTB) is missed then the company's production could be disrupted and products not shipped.
Data cleansing techniques are known to eliminate duplication (Deduplication) improve data accuracy and reliability. One data cleansing technique is described in the article entitled “Fuzzy Lookups and Groupings Provide Powerful Data Cleansing Capabilities” by Jay Nathan, MSDN magazine SQL Server 2005, September 2005, pages 87-92, which is incorporated by reference herein. The Nathan article describes a duplication (Deduplication) process that use tokens (subsets of reference values) to search the different reference fields (name, address, etc.) of customer records to eliminate the duplication of customer records in a database. The Nathan article uses delimiters within the reference fields to identify search tokens. However, since the use of delimiters in manufacturers part numbers (MPN) are not reliable, the technique of using delimiters to identify tokens will not be reliable in searching an company's database for such MPN numbers. Additionally, since the location of the relevant characters of the MPN for match purposes do not align with the location of the relevant characters in the company's CPN, the techniques described in the Nathan article will not produce matches.
Thus, there is a continuing need for a reliable technique for searching a company's CPN database for MPN numbers.