This invention relates, in general, to processing data. More particularly, this invention relates to grouping of database entries and detection of outliers.
In a business environment, companies need to control and optimize the money they spend. The companies also need to determine the areas in which costs can be cut in order to save money for the company. Therefore, the management of payables referred to as spend items herein over a period of time is required.
When an organization procures a product or service, an entry of the spend item is made into a table known as a spend table. The spend table comprises record identifiers, item descriptions, supplier names and spend amount. For the management and analysis of spend data over a period of time, similar spend items in the spend database need to be grouped together in the form of clusters.
Furthermore, since the spend items are entered in the spend table manually, the item descriptions of the spend items may be entered inaccurately. The item descriptions may be represented in an unstructured manner, or with spelling mistakes, with non-standard abbreviations, arbitrary symbols, numeric values, etc.
In some cases, two entirely different items may have the same item description, for example, “Notebook of price $2” and “Notebook of price $1000”. By considering merely the item descriptions, both the spend items will be inadvertently grouped into one cluster. Moreover, due to human mistakes or fraud, there may be a discrepancy in the spend data. To detect discrepancies in the spend data, suspicious items, namely outliers, are determined and made apparent to the user.
Spend data is critical to a company and high accuracy is required for the grouping, whereas at other times moderate accuracy is acceptable. Therefore, configurable accuracy levels of grouping are required. To determine the distribution of a company's total spend among various clusters, manual verification of the clusters is required. Further, human effort is required to determine the clusters covering a predetermined fraction of the total spend.
Hence, there is a need for a computer implemented method and system for efficient spend management and outlier detection in a company with minimum or no human intervention.