Neural networks are used in the field of machine learning and artificial intelligence. Neural networks comprise arrangements of sets of nodes which are interconnected by links and which interact with each other. The principles of neural networks in computing are based on information about how electrical stimuli convey information in the human brain. For this reason the nodes are often referred to as neurons. They may also be referred to as vertices. The links are sometimes referred to as edges. The network can take input data and certain nodes perform operations on the data. The result of these operations is passed to other nodes. The output of each node is referred to as its activation or node value. Each link is associated with a weight. A weight defines the connectivity between nodes of the neural network. Many different techniques are known by which neural networks are capable of learning, which takes place by altering values of the weights.
FIG. 1 shows an extremely simplified version of one arrangement of nodes in a neural network. This type of arrangement is often used in learning or training and comprises an input layer of nodes, a hidden layer of nodes and an output layer of nodes. In reality, there will be many nodes in each layer, and nowadays there may be more than one layer per section. Each node of the input layer Ni is capable of producing at its output an activation or node value which is generated by carrying out a function on data provided to that node. A vector of node values from the input layer is scaled by a vector of respective weights at the input of each node in the hidden layer, each weight defining the connectivity of that particular node with its connected node in the hidden layer. In practice, networks may have millions of nodes and be connected multi-dimensionally, so the vector is more often a tensor. The weights applied at the inputs of the node Nh are labelled w0 . . . w2. Each node in the input layer is connected at least initially to each node in the hidden layer. Each node in the hidden layer can perform an activation function on the data which is provided to them and can generate similarly an output vector which is supplied to each of the nodes N0 in the output layer 0. Each node weights its incoming data, for example by carrying out the dot product of the input activations of the node and its unique weights for the respective incoming links. It then performs an activation function on the weighted data. The activation function can be for example a sigmoid. See FIG. 1A. The network learns by operating on data input at the input layer, assigning weights to the activations from each node and acting on the data input to each node in the hidden layer (by weighing it and performing the activation function). Thus, the nodes in the hidden layer operate on the weighted data and supply outputs to the nodes in the output layer. Nodes of the output layer may also assign weights. Each weight is characterised by a respective error value. Moreover, each node may be associated with an error condition. The error condition at each node gives a measure of whether the error in the weight of the node falls below a certain level or degree of acceptability. There are different learning approaches, but in each case there is a forward propagation through the network from left to right in FIG. 1, a calculation of overall error, and a backward propagation from right to left in FIG. 1 through the network of the error. In the next cycle, each node takes into account the back propagated error and produces a revised set of weights. In this way, the network can be trained to perform its desired operation.
One problem which can arise with a neural network is “overfitting”. Large networks with millions or billions of parameters (weights) can easily overfit. Overfitting causes a network to remember each training sample that has been provided to it (a training sample providing data to the input nodes), rather than be trained to extract relevant features so that the neural net is appropriate, after it has been trained, for application to more generally extract features from samples. A wide range of techniques has been developed to solve this problem by regularising neural networks to avoid overfitting/memorising.
When processing large datasets using neural nets, there are techniques involving the use of random numbers which can improve their performance. One technique is so-called Monte Carlo sampling which is a term used for a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying idea of Monte Carlo sampling is that randomness may be able to solve problems that might be deterministic in principle. When using Monte Carlo sampling, a prescribed probability distribution of the random numbers is desirable. Monte Carlo sampling can be used for example in generative models.
Techniques have recently been devised for improving the performance of neural nets by adding random noise to weights or activations. Gaussian noise has been explored as a possibility in this respect.
Implementing neural networks using known computer technology has various challenges. Implementing randomising techniques, for example using a CPU or GPU is non-trivial and may impact the full benefits that could be achieved with efficient implementation.