In the telco domain, network traffic is generated from a large amount of nodes (e.g., consumer devices, routers, servers, base stations, etc.) continuously at a very high speed. As used herein, the telco domain refers to a networking domain of Internet Service Providers (ISPs). Network traffic analytics is fundamental and critical to understanding the behavior of the network and optimizing the performance of the network and applications. Network traffic analytics also play an important role in identifying attacks to the network and allowing network administrators to take appropriate security measures. To cope with the high volume and high speed of the traffic data, big data technologies could be applied in the telco domain to help develop network traffic analytics. As used herein, big data technologies/analytics refer to technologies for processing/characterizing big data (i.e., data of high volume, speed, and variety). However, current big data technologies originated largely in the Internet domain. As used herein, the Internet domain refers to a networking domain of content providers such as Google, Yahoo, Facebook, Twitter, etc. Such content providers aggregate and process human-generated content in centralized data centers. Due to the fundamental differences between the properties of data in the telco and Internet domains, those technologies would be suboptimal for the telco domain.
Maintaining normal operation of the network is the topmost concern in the telco domain. Running data analytics should not degrade, disrupt, or compromise network operation. Uploading all the traffic data to a few centralized data centers would itself require significant network resources. Aggregating data at this scale would compromise the utilization of the network.
In the telco domain, it is machines rather than human users that automatically and continuously generate data at a very high speed. To add real values, data collection and analytics must at least be able to keep pace with data generation, if not faster. Centralized data processing at this scale would cause long lags, which may render the analytic results irrelevant, e.g., in detection of WORM and DDoS attacks.