Present invention embodiments relate to data processing systems, and more specifically, to techniques for merging, with a data processing system, identified synonymous data from multiple data sources and generating a dataset for merged terms related to the synonymous data.
Data processing and analysis systems are being implemented across many fields to facilitate research and improve efficiency (e.g., to improve the speed and precision of research and development by streamlining a process for testing a hypothesis). In order to implement these systems, a knowledge base must be created by aggregating data from a large number of data sources. However, many fields or sectors (e.g., domains, like a life science domain) use multiple words or phrases to refer to the same object, item, entity, etc. This may create confusion and/or inaccuracies. This is particularly true in the life sciences domain, since life science data is vast, scattered, and typically lacks global identifiers. For example, drugs or diseases may have multiple names, but may not have a global identifier. Consequently, in data processing systems, query results across databases may be inaccurate, inefficient, and generally unhelpful.