1. Technical Field
The present invention relates generally to data enrichment and classification. More particularly, the invention relates to systems, methods and computer program products for enrichment and classification of spend data into a set of selected taxonomies.
2. Description of the Prior Art
In any organization, procurement spend plays a very significant role in analyzing its expenditures, savings and profits. Further, the analysis of spend data can help in taking measures for reducing expenditures, maximizing savings, making key business decisions etc. To get the visibility from spend data, it is very important to identify the category in which it belongs. In most organizations, goods and services are procured from various service providers and it becomes tedious to get insights from the spend data. Identifying the categories, in which a transaction belongs to, can help in making key decisions related to business.
Various challenges are faced in classification of spend data due to the characteristic of such data. For example, spend data is large in volume and the spend transactions have very little information about the services procured from a vendor/service provider. Furthermore, transaction records often have an absence of, or very little information regarding, vendor name, invoice description, purchase order description, material description, and general ledger account information. Furthermore, the records associated with redundant transactions in data and inconsistencies in data occurring at the time of data entry may be absent or unclear. Also, in spend data, even though the vendor name and description columns are text fields they are very short and contain very limited information making them difficult to be directly utilized. Hence, this classification problem has more challenges than normal text classification problems.
In case a supplier information from a spend data or any entity information from a transaction/spend data is unavailable due to some error, the entire information has to be removed or made redundant for the purpose of classification. Any data classification done on basis of such incomplete information does not provide accurate results.
There are prevailing arts on data classification such as US 8239335 titled “Data classification using machine learning techniques”. However, none of the existing prior art provide classification of spend data with high accuracy. Also, the existing art does not provide a solution in case there is lack of information in the spend data to be classified.
Accordingly, there is a need in the art for improved systems and methods of data classification pertaining to procurement spent data.