The present invention generally relates to embedded devices and, more particularly, to a method and apparatus for executing neural network applications on a network of embedded devices.
An embedded device 100 is a portable device with an embedded electronic chip (which we call a central processing unit or CPU 120) and memory 130 which enable it to perform digital computations and communicate with other computers and embedded devices. Such devices are becoming endemic. Examples include digital cellular telephones, hand-held devices like the Palm Pilot, digital watch, calculator, pen, and even household appliances like television sets, radio sets, toasters, microwaves etc. Embedded devices can communicate with each other using telephone or cable wires, or cellular wireless communication.
The embedded chips in embedded devices have relatively small processing power, which is insufficient to solve complex tasks like recognizing speech phonemes or natural language understanding, etc. Currently, the processing of such complex tasks requires the use of non-embedded devices with sufficient computation resources (e.g. desktop computers, laptops etc.).
One approach to enabling complex computation through embedded devices is to use a client server interface in which client programs executing in embedded devices communicate (wirelessly) with a remote server on a workstation. FIG. 1 shows an embedded device 100 (cellular phone) communicating to a remote server 110 (a mainframe computer) using cellular wireless technology. Using the setup shown in FIG. 1, the cellular phone can execute complex applications. However, bandwidth limitations on typical current wireless communication channels severely limit the utility of this approach.
There are other disadvantages of much of this prior art. For example, often there is a lack of fault tolerance and a lack of speedy execution. The prior art often cannot recover from a cell phone going out of range and cannot take advantage of more cooperative cell phones coming into range. Also bandwidth limitations cause slow computation.
Another approach to enabling complex computation on embedded devices is to perform parallel distributed processing on distributed representations of task input. Neural networks are an eminently suitable mechanism for achieving this. This approach has the advantage of increased fault-tolerance and can make use of newly available embedded devices. Failure of some device does not fatally impair overall computation. Also, there is a much speedier execution of target application even on devices with low compute power and limited bandwidth.
FIG. 2 shows a feedforward neural network. A feedforward neural network 200 is a network of simple processing units, called xe2x80x9cnodesxe2x80x9d 210, each of which computes an activation function 230 of all the inputs received by it and sends the result of the computation, called the xe2x80x9cactivationxe2x80x9d 240 to some other nodes. Designated input nodes 250 do not perform any computation and simply send the inputs received by them (the inputs to the neural network 220) to their connecting nodes. The activation 240 at designated output nodes 260 is the xe2x80x9coutputxe2x80x9d 270 of the neural network. Each connection between two nodes is directed. For example, n5 is the starting node 211 and n7 is the ending node 212 for the connection w75 which is the xe2x80x9cweightxe2x80x9d, typically 280, attached to it. This weight 280 is used in the computation of the activation function 230 (FIG. 3 below) at the ending node 212 of the connection. We refer to all the starting nodes of connections feeding into a node as the xe2x80x98incoming nodesxe2x80x99 (typically 213) for that node. Similarly, we refer to all the ending nodes of connections feeding out of a node as the xe2x80x98outgoing nodesxe2x80x99 (typically 214) for that node. To continue the example, all nodes feeding node n5, i.e. nodes n1 and n2, are incoming nodes 213 for n5 and all nodes receiving information from n5, e.g. nodes n6 and n7 are outgoing nodes 214 of node n5. The pattern of connectivity of the nodes, the weights associated with connections, and the specific function computation at each node determine the output 270 of the neural network.
Neural networks 200 are usually implemented as software simulations of the networks. Neural networks are widely applied to statistical pattern classification, regression and time series analysis tasks. In most applications, the inputs to the neural network represent mathematical representations of task related experience, which are used to learn the weights 280 of the connections, such that the correct output can be predicted with minimal error.
FIG. 2 shows a three layered feedforward neural network 200, where n1, n2, n3, n4, n5, n6 and n7 are the nodes 210 and w31, w32, w41, w42, w51, w52, w63, w64, w65, w73, w74 and w75 are the weights 280 of the connections between the nodes. Nodes n1 and n2 are the designated input nodes 250 of the network. Nodes n3, n4, and n5 receive inputs from nodes n1 and n2. Nodes n6 and n7 are the designated output nodes 260 which receive inputs from nodes n3, n4, and n5. Nodes n3, n4, n5, n6 and n7 compute an activation function 230 which is a weighted sum of their inputs from other nodes as shown in FIG. 3. The result of computations (activations 240) of nodes n3, n4, and n5 are sent to nodes n6 and n7. The activations 240 of nodes n6 and n7 represent the output 270 of the neural network. In this example, the inputs 220 to the network (i.e. inputs to nodes n1 and n2) might represent two parameters (e.g. pitch and fundamental frequency) from which the gender of a speaker needs to be determined. In such a scenario, the outputs of nodes n6 and n7 might represent the two genders male and female. The actual classification is achieved by comparing the numerical values of the activations of the nodes n6 and n7 and assigning the gender corresponding to the node with the greater numeric value. The weights 280 of the network are learned by presenting the network with several examples of (pitch, frequency, gender) triplets and xe2x80x9ctrainingxe2x80x9d the network. There are a number of well known neural network training algorithms.
It is an object of the present invention to provide a method and a system for combining the computational resources in embedded devices for executing neural network based applications.
It is yet another object of this invention to provide a method and a system for representing each embedded device as a node in a neural network that communicates with other nodes (embedded chips) for executing neural network based applications.
This invention is directed towards a system and a method for combining the computational resources of numerous embedded devices to enable any of them to perform complex tasks like speech recognition or natural language understanding. A distinguished master device communicates with a network of embedded devices, and organizes them as the nodes of a neural network. To each node (embedded device) in the neural network, the master device sends the activation function for that node and the connectivity pattern for that node. The master device sends the inputs for the network to the distinguished input nodes of the network. During computation, each node computes the activation function of all of its inputs and sends its activation to all the nodes to which it needs to send output to. The outputs of the neural network are sent to the master device. Thus, the network of embedded devices can perform any computation (like speech recognition, natural language understanding, etc.) which can be mapped onto a neural network model.