Artificial Neural Networks are devices intended to simulate or mimic the behaviour of the network of neurons that exist in the human brain. Artificial neural networks generally consist of one or more layers containing neurons. The neural network is trained by presenting known data at an input and then testing the actual output against the desired output (training data) and adjusting the neural network accordingly. While having a number of potential applications, the growth of neural network technology has been hampered by issues involving the number of neurons needed to make a functional neural network, the training data/time required and the performance of the neural network when implemented in software or hardware.
One common Artificial Neural Network (ANNs) format consists of multi-layer perceptrons trained using the error back-propagation algorithm (MLP-BP). An MLP-BP network can be used in a wide variety of applications. However, to date, an MLP-BP network has typically only been implemented in software systems or in statically designed hardware systems.
A major issue in using an MLP-BP network is the difficulty of determining a clear methodology in setting up the initial topology and parameters. Topology has a significant impact on the network's computational ability to learn the target function and to generalize from training patterns to new patterns.
If a network has too few free parameters (for example, weights), training could fail to achieve the required error threshold. On the other hand, if the network has too many free parameters, then a large data set is needed to provide adequate training. In this case, the possibility of over-fit is higher, which jeopardizes generalization as well. Generalization is the ability for a network to predict the outcome (network output) for previously unseen input patterns or vectors. Over-fit occurs during training when input patterns of a limited dataset are presented too many times, and the network has more free parameters than needed. This results in a network that is capable of recognizing previously seen patterns very well, but fails to produce a good generalization to predict outcomes for some or all of the remainder of possible different input patterns.
It is typically not possible to experiment with a large number of topologies to determine various effects of the changes on network performance because of the long training sessions required. As a result, heuristics have typically been used to speed the training process while preventing over-fitting. Yet even with the use of heuristics, this training process is generally limited to off-line learning, to applications where training data is static, or where conditions initially determined will stay the same for the duration of network's useful function.
However, when on-line learning is necessary or when the solution space is dynamic and new data is being added continuously, there exists a need for testing a wide range of topologies in real-time. For example, real-time data mining of customers' databases that are continuously updated is a growing area with significant commercial interest. Moreover, since ANNs are inherently parallel architectures, there have been some efforts to explore real-time parallel computing architecture implementations.
Conventional ANN implementations range from software-based implementations on general-purpose computers to specialized hardware dedicated to ANN simulations (neurocomputers). Other efforts include designing and building parallel systems based on transputers, digital signal processors (DSPs), or Application Specific Integrated Circuits (ASICs) that include multiple parallel processing units and act like ANN accelerators.
However, software designs tend to be slower in operation and conventional hardware designs require using special hardware boards or ASIC chips, which limit their use on a large scale. In addition, the resource utilization is static and implementations cannot adapt to differing amounts of available hardware resources. The resulting networks are constrained by size and type of algorithm implemented.
More recently, the focus on ANN hardware implementation has shifted toward reconfigurable platforms, and particularly Field Programmable Gate Arrays (FPGAs). One past effort used a Runtime Reconfiguration (RTR) to improve the hardware density of FPGAs by dividing the BP algorithm into three, sequentially executed stages. The FPGA was configured to execute only one stage at a time. However, the enhanced processing density was at the expense of significant deterioration in performance.
Another past effort involved using a systolic array to implement a MLP network with a pipelined modification of the on-line back propagation algorithm. However, the modification itself requires circumventing some temporal properties of the algorithm creating a marginal degradation in training convergence. Moreover, the resource utilization of this design is static, increasing with the increase of ANN size and topology regardless of the available resources on the hardware device. The resources required for implementing large-scale networks may make this design impractical for current configurable hardware device (e.g. FPGA) sizes.
As such there is a need for improved architectures, systems and methods of implementing ANNs, for example on configurable hardware devices, that overcome at least some of the problems with conventional systems and methods.