A conventional process for generating a network control policy is to convert a problem into a series of optimizations. Inputs for the optimizations are topology structure of the network, link bandwidths in the network, traffic matrix of the network, and the like. Results of the optimizations are generally an optimized end-to-end path of the network, an optimized transmission rate at a transmitting end, etc. The conventional method for generating a network control policy has the following disadvantages: 1. Optimizations are generally obtained by means of linear planning or integer planning. Limited by complexity of the linear planning or the integer planning, scalability of the method is relatively poor. Especially, as quantities of network elements, service types, and traffic increase constantly, an optimization solution may become too complex to be implemented or may require an excessively high cost (such as calculation time). In addition, it is difficult to achieve dynamic instant policy adjustment based on offline optimization; 2. When network configuration such as topology changes (nodes increase or decrease), the optimization needs to be performed again. Re-performing the optimization, on one hand, is markedly hysteretic, and on the other hand, a large amount of manpower is needed to make an optimized model adapted to the new scenario. Due to the above disadvantages, conventional method for generating a network control policy results in low efficiency in network control.