The large distributed systems in the world usually compose of thousands of servers deployed in a data center. These systems acting as the infrastructure can effectively support businesses such as electronic commerce, network news, and a social network, etc. However, since there are large number of servers in the distributed systems, the energy consumption is very huge. Meanwhile, the huge energy consumption also increases the global carbon emission, which is detrimental to the environment. According to a related research, the annual energy consumption of a large distributed network composed of 100000 servers can be 190000 MWH. The energy is equivalent to the annual consumption of one hundred thousand families.
The content distribution network CDN is a typical representative of the large-scale distributed system. A primary purpose of a CDN network is to transfer the content from a far-end server to a copy server close to the terminal user, so as to improve the network performance. The traditional CDN network, for example, Akamai, is composed of ten thousand servers deployed among worldwide data centers. Generally, a server in a traditional CDN network is composed of a firewall, a server, a router, a content gateway, and the like. The components are redundantly combined to meet the peak flow of the network, so as to improve the service experience of the user. However, the research shows that the average load rate of the CDN component designed for meeting the peak flow is only 40%-60% under the non-peak value condition. The traditional CDN component is usually composed of special hardware equipment. The energy consumption of the special hardware is surprisingly high, and it is very inconvenient to dynamically change the scale of the hardware equipment. In the network function virtualization environment, the CDN network component can be deployed on a universal hardware server by special equipment software, so that the number of the software can be dynamically adjusted through the control center to adapt to the dynamically changing flow. Thus, the purpose of energy saving is achieved.
There are lot of existing researches on the network flow prediction problem. One of the network flow prediction methods is called a BP neural network. The BP neural network is a bionic method or an artificial intelligence method. The BP neural network is divided into two processes: (1) a working signal forward transmission process and (2) an error signal reverse transmission process. In the BP neural network, a single sample has m inputs and n outputs. Usually, a plurality of hidden layers are arranged between the input layer and the output layer. In the year of 1989, Robert Hecht-Nielsen proves that one continuous function in any closed interval can be approximated by a BP network of one hidden layer. Therefore, a three-layer BP neural network (an input layer, a hidden layer, and an output layer) can achieve any mapping of m-dimension to n-dimension. As to time series prediction, any continuous m historical average flow data are mapped to an average flow data in the next n time periods, so that the BP neural network can be applied to predict the network flow. However, this technology has the following disadvantages:
(1) The BP neural network has a high complexity and long convergence time.
(2) The predicted result is the average flow in the next time period. Thus, if the CDN scale is determined based on the predicted result, the flow at a moment that exceeds the average value in the time period cannot be effectively served.
(3) In terms of the BP neural network algorithm, the selection of the hidden layer lacks theoretical guidance, and there is a tendency of learning the new sample while forgetting the old sample during training.
In the CDN energy-saving scheme, the relevant research uses the average flow of the previous time period as a direct reference to determine the network scale of the next time period. Meanwhile, redundant servers are added to prevent server congestion caused by peak flow. In the process of determining the network scale, the non-full load capacity of the server is used as a basis for judgment, and the purpose is to increase the redundant capacity to deal with the sudden peak flow. The technology also has the following disadvantages:
(1) Although the network flow of the previous time period is used as a basis to predict the network flow to a certain extent, the error is large.
(2) The redundant design in the scheme increases the energy consumption of the CDN to a certain extent.