The adoption of SON (Self-Organizing Network) is being considered for automated optimization of control parameters used to control various pieces of equipment in a wireless communication network. One example of wireless communication standards considering the use of SON is LTE (Long Term Evolution) on which 3GPP (Third Generation Partnership Project) is working for standardization.
On the other hand, it is known to provide, as an optimization algorithm, reinforcement learning which is one form of machine learning. In reinforcement learning, an agent learns from interaction with its environment by referring to the state variables of the environment, and improves the policy so as to maximize the total amount of reward that it finally receives.
An agent learning machine is known which learns based on its environment and which performs actions on the environment based on the results of the learning. The agent learning machine includes: an environment abstracting means for observing the state of the environment in the form an observed variable through an observation function, and for abstracting the environment from a continuous state to a discrete state; a state determining means for determining an index for specifying one state that best abstracts the environment at the current time from among the discrete states after the abstraction by the environment abstracting means; an action determining means for determining an index for specifying one action to be taken by learning in the one discrete state determined by the state determining means; a low-order reward selecting means for selecting, as a low-order reward, one low-order reward function having the index determined by the state determining means and the index determined by the action determining means, from among a plurality of low-order reward functions which are continuous functions; and a control output determining means for determining a control output to the environment in such a manner as to maximize the low-order reward function selected by the low-order reward selecting means, and for performing an action on the environment by using the control output.
There is also known an access prediction method which predicts the number of accesses by using a layered neural network that is constructed from an input layer, an intermediate layer, and an output layer, each having one or more units, and that provides weights for connections between the layers. According to this method, each unit value of the neural network at the time that the current number of accesses is predicted is calculated using the past number of accesses, each connection weight of the neural network is updated so that the prediction of accesses a plurality of cycles ahead can be made from the current number of accesses, and the number of accesses the plurality of cycles ahead is predicted from the current number of accesses by using the neural network having the thus obtained unit values and connection weights.
It is also known to provide a learning process supervising apparatus for use with a network configuration data processing apparatus. The data processing apparatus forms a layered network using basic units each of which receives one or a plurality of inputs from a preceding layer together with a connection weight by which to multiply each input and produces a sum of products, the value of the sum of products then being converted by a threshold function to provide a final output, wherein the layered network is constructed by forming an input layer with a plurality of basic units, one or a plurality of stages of intermediate layers each with a plurality of basic units, and an output layer with one or a plurality of basic units, and wherein internal connections are formed between the input layer and the intermediate layer at the first stage, between each intermediate layer, and between the intermediate layer in the final stage and the output layer, and the weight is set in corresponding relationship to each internal connection. The data processing apparatus includes an output signal deriving means for supplying a plurality of prescribed input signals to the basic inputs at the input layer and thereby deriving an output signal corresponding to the input signals from the basic unit at the output layer, an error calculating means for taking as inputs the output signal of each layer unit obtained by the output signal deriving means and a teacher signal specifying a value to be taken by an output signal held in a learning pattern holding unit and for calculating an error value representing the degree of mismatching between the two signals, and a weight learning means for performing processing so that a weight value is obtained such that the sum of the error values falls within predetermined tolerance by sequentially updating the connection weight from an initial value in accordance with an amount by which to update the weight based on the sum of the error values calculated by the error calculating means.
Related art is disclosed in Japanese Laid-open Patent Publications No. 2007-52589 and No. 2000-122985, and Japanese Patent No. 2732603.