Time series data (e.g., a collection of CPU utilization measurements on a set of servers over a period of several days) is a key data source for IT analytics that helps data center administrators manage the health of their information systems and monitor the performance and availability of the services the information systems provide to an organization. Gathering time series data from its source (e.g., element managers responsible for monitoring individual devices and IT infrastructure components) into an analytics data warehouse is a difficult task.
One approach is to build bespoke (i.e., custom, build-to-order) collectors for each element manager, using the data export protocols exposed by those element managers. This approach can yield good runtime performance, but is expensive to produce and maintain as the number of different time series data sources increases, and is dependent on good performing data export protocols to be available from the data source. Furthermore, many element managers mask the inherent parallelism available, by aggregating data collected from multiple IT infrastructure components (e.g., routers, servers, virtual machines, network nodes, arrays, switches, etc.). This aggregation can prevent collecting information from the element manager in a scale out fashion. This may be referred to as a funneling effect.