The present invention relates to systems and methods for importing and managing data in a large scale data repository. Particular aspects relate to import processes, data relationship discovery and metadata collection.
Organisations maintain increasingly large and complex collections of data. Often there is a need to bring together data from diverse data sources into a central data repository to enable processing and analysis of data sets. However, this presents a number of technical challenges. For example, important information about the structure and content of source databases may be lost during import, making it harder to access and manage the data efficiently. Furthermore, the exploitation of the data can be hampered by inconsistencies in data definitions across different sources, and the complexity of data sets often means that expert knowledge is required to access and manipulate the data, e.g. by creating data queries. As a result, making such data sets accessible to ordinary users in an efficient manner has in the past has presented significant technical difficulties.