In today's information age, data has become a lifeline for most business establishments. Starting from financial institutions such as Banks to Medical Research Institutions performing genetic research, almost all business establishments generate and use a large amount of data. Many of such institutions frequently require matching a set of data with a reference data. For example, banks need to verify the identity of a person before accepting that person as a customer. Similarly, Medical Research Institutions may require DNA matching from a reference data set of known DNA structures. Most of such business establishments employ matching systems for matching a set of data with available reference data.
Typically, such matching systems represent a set of data which is to be matched as an entity. Depending upon requirements of a business establishment, examples of an entity can include, a person, a DNA sample, pixel representation of a photograph. An entity can have one or more attributes. Attributes of an entity defines characteristics of the entity. If an entity is a person, examples of attributes can include, a first name, a middle name a telephone number etc. Typically, for matching an entity against a reference set of entities, respective attributes of entities are matched. Based on degree of match between entities, an entity matching system may result in a True positive, a True negative, a False positive and a False negative. A True positive is a scenario when the entity matching system correctly determines two entities to be matching as confirmed by a human Expert. Similarly, a True negative is a scenario when the entity matching system correctly determines two entities to be non-matching as confirmed by a human Expert. Further, a False positive is a scenario when the entity matching system incorrectly determines two entities to be matching as confirmed by a human expert who determines the two entities to be non-matching. Similarly, a False negative is a scenario when the entity matching system incorrectly determines two entities to be non-matching as confirmed by a human expert who determines the two entities to be matching. It is hence the objective of any entity matching system, to minimize the number of false positives and false negatives. However, there exists an inverse correlation between the two and rigid applications of conventional systems typically only manage to reduce one at the expense of the other.
The process of entity matching becomes increasingly difficult as the entity becomes complex in its structure and the volume of data increases. For example, in the case of Name Screening, the matching of name entities against a reference set of names becomes difficult due to variations in, spelling, grammar, typographical errors, language, cultural differences, linguistic differences and phonetic differences as well as variations caused by punctuation.
There are a number of algorithms available for entity matching which address variations caused by one or more of, punctuation, cultural and phonetic differences. Some existing methods select algorithms based on the requirements of the business establishment and sequentially implement the algorithms for matching an entity. Such methods typically give equal importance to all the algorithms which are selected for a business establishment. However, some algorithms may be relatively more important compared to others for a particular business context. Similarly, some of the attributes may be relatively more important compared to other attributes. Conventional methods for entity matching typically implement algorithms in a sequential fashion and regardless of the relative importance of the algorithms and/or attributes involved. Such methods many a times result in False negatives or False positives depending on a rigidness of the method.
There is therefore a need for a method and system which facilitates verification of an entity by providing flexibility to assign a relative importance to a set of algorithms and attributes of the entity based on business requirements of a business establishment. There is further a need for a method and system which is highly flexible and maintains an optimal balance between False negatives and False positives.