In today's society, users and organizations are increasingly utilizing network and other service providers to gain access to the Internet, request and access various types of content, access software applications, access software services, access large volumes of data, and perform a variety of other tasks and functions. As the number of users and organizations has continued to grow, the amount of data being generated by devices, applications, and processes utilized by such users and businesses continues to grow at a tremendous rate. As a result, big data including various types of data are being collected and analyzed today on an unprecedented scale, and organizations are routinely making important decisions based on data stored in their databases. Massive amounts of network resources and data storage facilities have been utilized to handle big data. Nevertheless, with the huge volume of generated data, the fact velocity of arriving data, and the large variety of heterogeneous data, the veracity or quality of the data in databases is far from ideal.
Currently, many data feeds associated with organizations contain data errors or glitches in many domains, such as, but not limited to, medicine, finance, law enforcement, and telecommunications. Such data errors may have severe consequences to the organizations associated with such data feeds, and may also have severe consequences to those interacting with such organizations. Data errors can often arise throughout the data lifecycle, from data entry through storage, data integration, data analysis, and decision making. Currently existing technologies have focused on detecting and correcting errors in data after the data has been collected in a database or during data integration processes. While currently existing commercial tools provide capabilities for performing record-level data quality checks and data cleansing during batch processes, there is still considerable room for improvement.