This disclosure is directed to analyzing temporal data and in particular to providing a temporally meaningful a data sample of a network.
When dealing with data in the form of a graph, a very common issue is that the size of the resulting dataset can be unmanageable. One approach to solve this is to sample the graph to arrive at a smaller subset that attends the needs of the problem in a similar way as the original data. Ideally this sampling process needs to preserve the original properties of the graph. Sampling large networks while preserving useful attributes is a complicated task. Sampling nodes or edges reduces the size of the network, but can make it disconnected and this may significantly alter its properties. In one approach uses network models to try to reduce graph size by inferring how these networks were created. In these cases, temporal information plays a key role in network analysis, but little attention has been paid to this information. Links can lose relevance over time and new links can gain more importance, depending on the network being modeled. This is true for several graph mining tasks, such as link prediction and community detection.
One of the main problems involved in temporal sampling is how to define the granularity of the sampling, especially in the cases where edges represent connections in years, months or hours.
Existing methods and systems providing sampling features, such as temporal clustering, which clusters information based on time, do not keep these properties or create a graph that is highly disconnected. Another problem is that, in many of the known approaches, the link temporality is not observed, thus an aged link has the same meaning/probability/importance as a new link, which in most real world situations is not true.