The present invention relates generally to the field of data mining and more specifically to analysis of resources within a data set.
A data set is a collection of data resources. A data resource may include one or more graphs, charts, diagrams, text portions, images, videos, or other organized portions of data. Each such portion of data can be described as a component within a data resource of the data set. Large quantities of data resources are often organized into one or more data sets. Big data is a term used to describe a quantity of data resources so large that that processing, analysis, or using the data resources is difficult. Big data is difficult to work with and limitations arise (e.g., for scientists and researchers) due to the amount of data. These limitations can also affect, for example, internet searching and business analytics. Data mining is often used to mitigate these limitations. Data mining uses information about data resources within a data set to understand the structure of the data set and discover patterns therein.
Current data mining systems can use data resource metadata in determining the structure and patterns of a data set. The metadata of the data resource is data about the information contained in the resource (e.g., that it is a chart about bike sales, or a graph about the weather). A data resource can include one or more distinct pieces of metadata, each of which describes one or more components of the data resource. A piece of metadata corresponds to each component that the metadata describes. Current data mining systems often rank data resources based on the data resource metadata.