Finding potential duplicates values (as opposed to exact duplicates) is a problem that several organizations need to solve for over dozens of use cases including fraud detection, maintaining information governance, running their analytics, reduction in storage, Master Data Management, etc. With emergence of Big Data (that implies increase in volume, variety and velocity) dealing with the problem of finding data has become more acute. Traditional algorithms may not be scalable for higher volume of data, nor are they often equipped to deal with variety of data as they often need a prior understanding of the data to properly standardize it and finding duplicates.