This description relates to matching metadata sources using rules for characterizing matches.
Metadata discovery (also known as metadata scanning) can be used to discover relationships between data elements representing metadata that describes values appearing within datasets, such as the names of fields or columns of database tables or spreadsheets. In some cases, the metadata for data appearing within a given dataset is stored in a variety of different sources. During the metadata discovery process, a match may be found between a data element in a first source and a data element in a second source. A match can correspond to similar field names and/or descriptions of metadata for fields in a table, for example. The match may indicate that the matching data elements represent metadata for the same types of data values in respective datasets. In some cases, a database of synonyms including user-specified, or dictionary-based databases, e.g. WordNet, can be used to determine matches between data elements that have similar semantic meanings (e.g., a match between “day” and “date,” or between “gender” and “sex”). A master collection of metadata (sometimes called a “metadata registry”) can be generated or updated to store metadata based on the discovered relationships, or to link to metadata that has been found in the metadata discovery process.