The present invention relates generally to processing data tables and more particularly to data matching for column-oriented data tables.
Column-oriented data tables such as those used in NoSQL databases are often used in distributed data processing environments to increase data processing throughput by assigning specific columns to specific servers and storage devices. Column-oriented data tables are also used for applications with unstructured or sparse data due to their flexible storage capabilities. Consequently, the processing demand for column-oriented data tables continues to increase.
Despite the advantages of column-oriented data tables, matching data in a probabilistic manner (e.g., using a probabilistic matching engine) is a challenge—particularly when sparse and/or unstructured data sources are involved.