The present invention is generally related to the field of Artificial Neural Network based control systems. The invention provides a method of Artificial Neural Network training without the need of an offline learning phase and training vectors.
Moore""s law has long been cited as the prime mover in the information technology revolution. Now this revered law may have to share the spotlight with another, equally important phenomenon: the amazing growth of areal densities in the storage industry.
Driven by the increasingly rich content of files, easy access to Internet downloads and data collected by corporate web sites (not to mention the pervasive reluctance among users to delete old e-mail files), the need for storage capacity is growing at rates estimated as high as 100% per year.
Luckily for corporate space planners, areal density is keeping pace with the need. Data Storage Magazine reports annual increases of 60 w, with 100% increases projected in the near future. Maintaining such increases in a highly price-competitive environment puts tremendous pressure on every aspect of disk drive technology. Given the relentless market pressure to increase areal density by at least 60% annually and the problems associated with increasing linear bit densities (bpi), considerable interest has been focused on radial track density (tpi).
Magnetic disk drive motion control design has been dominated by classical control techniques. While this tradition has produced many effective control systems, recent developments in radial track density and micro-actuators, have forced a growing interest in robust, nonlinear and artificial intelligence control.
Disk drives store data on constantly spinning disks made of aluminum or glass and coated on both sides with a thin film of magnetic material. Magnetic read/write heads store data on both sides of the disk by magnetizing very small areas on the disk surface in closely spaced, concentric tracks. A positioning device called an actuator or, voice coil motor, moves the heads rapidly from one track to another by the servo control system. An example of such Prior Art disk drive architecture is disclosed, for example in C. Denis Mee, Magnetic Recording, McGraw Hill Inc., 1988, incorporated herein by reference.
The expected increase in track density places an extra burden on the servo control system, which must hold the off-track motion of the heads within increasingly tighter limits for errorless reading and writing. This limit, known as track mis-registration (TMR), amounts to about 10% of the total track width. At today""s average track density of 25000 tracks per inch, the TMR budget is approximately four micro-inches. This implies eliminating as much as possible, any disturbance that might cause the head to move off track.
The major causes of off-track disturbances include Non-repeatable runout from the spindle bearing, Residual vibrations due to actuator modes, Servo writer errors, Nonlinear friction effects from the VCM bearings, Slipped disks, leading to repeatable runout, and Casting warpage.
The servo bandwidth frequency provides a measure of how well the control system will dampen the effects of off-track disturbances.
Voice coil motor control can be classified into two basic problems: tracking a reference trajectory, (seeking) and track following. Several linear controllers as well as nonlinear controllers have been proposed in the Prior Art for solving these problems. The main idea behind these control systems is to achieve suitable bandwidth to obtain the track density required by the marketplace.
Artificial Neural Networks are known in the art. Although Artificial Neural Networks have been around since the late 1950""s, it wasn""t until the mid-1980""s that algorithms became sophisticated enough for general applications. Today, Artificial Neural Networks are being applied to an increasing number of real-world problems of considerable complexity. They are good pattern-recognition engines and robust classifiers, with the ability to generalize in making decisions about imprecise input data. They offer ideal solutions to a variety of classification problems such as speech, character and signal recognition, as well as functional prediction and system modeling where the physical processes are not understood or are highly complex.
Artificial Neural Networks may also be applied to control problems, where the input variables are measurements used to drive an output actuator, and the network learns the control function. The advantage of Artificial Neural Networks lies in their resilience against distortions in the input data and their capability of learning. They are often good at solving problems that are too complex for conventional methods and are often well suited to problems that people are good at solving, but for which traditional methods are not.
In its most general form, an Artificial Neural Network-is a machine that is designed to model the way in which the brain performs a particular task. Artificial neural networks are collections of mathematical models that emulate some of the observed properties of biological nervous systems and draw on the analogies of adaptive biological learning. The key element of the Artificial Neural Network paradigm is the novel structure of the information processing system. It is composed of a large number of highly interconnected processing elements that are analogous to neurons and are tied together with weighted connections that are analogous to synapses.
Learning in biological systems involves adjustments to the synaptic connections that exist between the neurons. This is true of Artificial Neural Networks as well. Learning typically occurs by example through training, or exposure to a training set of input/output data where the learning algorithm iteratively adjusts the connection weights. These connection weights store the knowledge necessary to solve specific problems.
There are many different types of Artificial Neural Networks. Some of the more common include the multilayer perceptron which is generally trained with the back-propagation of error algorithm, learning vector quantization, radial basis function, Hopfield, and Kohonen, to name a few. Some Artificial Neural Networks are classified as feedforward while others are recurrent depending on how data is processed through the network.
Another way of classifying Artificial Neural Network types is by their method of learning, as some Artificial Neural Networks employ supervised training while others are referred to as unsupervised or self-organizing. Supervised training is analogous to a student being guided by an instructor. Unsupervised algorithms essentially perform clustering of the data into similar groups based on the measured attributes or features serving as inputs to the algorithms. This is analogous to a student who derives the lesson totally on his or her own. Artificial Neural Networks can be implemented in software or in specialized hardware.
FIG. 1 illustrates the general architecture of a two layer artificial neural network. The left layer represents the input layer, in this case with three inputs nodes 110, 120, and 130 receiving inputs X1 through X3. The middle layer is called the hidden layer, with five nodes 140, 150, 160, 170, 180, and 190. It is this hidden layer which performs much of the work of the network. The output layer in this case has one node 190 outputting signal Y1 representing output values determined from the inputs.
Each node 140, 150, 160, 170, and 180 in the hidden layer may be fully connected to the input nodes 110, 120, and 130. That means what is learned in a hidden node is based on all the inputs taken together. This hidden layer is where the network xe2x80x9clearnsxe2x80x9d interdependencies in the model. FIG. 2 provides some detail into what goes on inside a hidden node.
As illustrated in FIG. 2, a weighted sum 210 may be performed as follows: X1 times W1 plus X2 times W2 and so on through X3 and W3. This weighted sum is performed for each hidden node and each output node and is how interactions are represented in the network. Each summation is then transformed using activation function 220 before the value is passed on to the next layer. The activation function translates the weighted input into a value which may be used as an input to a next step or portion of the neural network.
Examples of activation functions include a linear function 230, a clipped linear function 240, and a Gaussian distribution 250. Of course, other types of functions may be applied in activation function block 220. However, in general, neural networks work best with activation functions with limited dynamic ranges.
In the human brain, information is passed between the neurons in form of electrical stimulation along the dendrites. If a certain amount of stimulation is received by a neuron, it generates an output to all other connected neurons and so information takes its way to its destination where some reaction will occur. If the incoming stimulation is too low, no output is generated by the neuron and the information""s further transport will be blocked. A further description of the operation of neural networks may be found in S. Haykin, Neural Networks: A Comprehensive Foundation. Macmillan, NY, 1994, incorporated herein by reference.
Neural nets try to simulate the human brain""s ability to learn. Unlike the biological model, an artificial neural network generally has an unchangeable structure, built of a specified number of neurons and a specified number of connections between them, which have certain values (weights). What may change during the learning process are the values of those weights. Incoming information exceeds a specified threshold value of certain neurons that pass the information to connected neurons or prevent further transportation along the weighted connections. The value of a weight will be increased if information should be transported and decreased if not. While learning different inputs, the weight values are changed dynamically until their values are balanced, so each input will lead to the desired output.
The training of a neural net results in a matrix that holds the weight values between the neurons. Once a neural net had been trained correctly, it will be able to find the desired output to a given input that had been learned, by using these matrix values.
For control engineers, the approximation capability of artificial neural networks is usually used for system identification. However there is very little about the use of Artificial Neural Networks in closed loop controllers that yield guaranteed performance. A problem with the use of Artificial Neural Networks in control applications is the uncertainty of how to initialize the Artificial Neural Network weights, which leads to the necessity of offline tuning. See, e.g., T. Yamada, Remarks on a learning type self-tuning neural network controller, Int. Conf. Adv Robotics, 1993, pp 43-48, and F. C. Chen, Adaptive Control of nonlinear systems using Neural networks, IEEE Control Systems Magazine, vol 55 no. 6, pp. 1299-1317, 1992, both of which are incorporated herein by reference.
Additionally, the processing requirements of Artificial Neural Networks for a voice coil motor control system, until recently, has been prohibitive. New high-performance 32-bit processors such as the ARM7 and ARM9 logic cores used in the Cirrus Logic 3Ci platform, the industry""s most-advanced mixed-signal system-on-a-chip for magnetic hard disk drives, are playing a major role in enabling the recent developments in motion control. With these processors and associated support peripherals, sophisticated algorithms from modern control theory, nonlinear control and AI control can be implemented in realtime, and the effective control bandwidth can be pushed to higher limits.
Traditional controller design usually involves complex mathematical analysis and yet has many difficulties in controlling highly nonlinear plants. These nonlinearities are due in part to VCM bearing friction and electro-mechanical properties of the motor and power driver. The use of the learning ability of a neural network helps control design to be rather flexible, especially where plant dynamics are complex and highly nonlinear. This is a distinct advantage over traditional methods.
Artificial Neural Network voice coil motor controllers are known in the art (e.g., Khan, U.S. Pat. No. 5,471,381, issued Nov. 28, 1995 and incorporated herein by reference). However, such conventional Artificial Neural Network controllers may have to be trained such that they learn the characteristics of the plant or the plant and controller combination. Learning typically occurs by example through training, or exposure to a training set of input/output data where the learning algorithm iteratively adjusts the connection weights.
These connection weights store the knowledge necessary to solve specific problems. However, the use of Artificial Neural Networks in a control system application has limitations with respect to its practicality. The general learning phase as illustrated in FIG. 2 is very broad. To achieve maximal generalized learning, rigorous learning in the form of using many sets of training patterns may be required.
In order to train a neural network to perform some task, the weights of each unit may need to be adjusted in such a way that the error between the desired output and the actual output is reduced. This process requires that the neural network compute the error derivative of the weights (EW). In other words, it must calculate how the error changes as each weight is increased or decreased slightly. The back propagation algorithm is the most widely used method for determining the EW.
The back-propagation algorithm is easiest to understand if all the activation functions in the network are linear. The algorithm computes each EW by first computing the EA, the rate at which the error changes as the activity level of a unit is changed. For output units, the EA is simply the difference between the actual and the desired output.
To compute the EA for a hidden unit in the layer just before the output layer, all the weights between that hidden unit and the output units to which it is connected are first identified. Those weights may then be multiplied the EAs of those output units and add the products. This sum equals the EA for the chosen hidden unit.
After calculating all the EAs in the hidden layer just before the output layer, the EAs of other layers may be computed in like fashion the EAs for other layers, moving from layer to layer in a direction opposite to the way activities propagate through the network. This is what gives back propagation its name. Once the EA has been computed for a unit, it is straightforward to compute the EW for each incoming connection of the unit. The EW is the product of the EA and the activity through the incoming connection.
Note that for non-linear units, the back-propagation algorithm includes an extra translation step. Before back-propagating, the EA must be converted into the EI, the rate at which the error changes as the total input received by a unit is re changed.
Due to the complexities of off-line training and weight initialization, it would therefore be desirable to have an Artificial Neural Network based control system that does not require this specialized learning phase and have a stable system when the weight are initialized to zero. Thus, a need exists in the Prior Art for an artificial neural network voice coil motor controller system which does not require a learning phase in order to operate properly.
An Artificial Neural Network control system is derived using VCM control techniques. This means that the neural network weights are tuned online, implying that no offline learning is needed. Additionally, constant stimulus of the plant is not required for the Artificial Neural Network to learn online as is true for some adaptive systems as well as Real Time Recurrent Learning. The control system ensures good performance during the initial period if the neural network weights are initialized to zero. Tracking performance is guaranteed using a Lyapunov approach, even though there do not exist ideal weights, such that the neural net perfectly reconstructs the required nonlinear function.
The control system is comprised of an Artificial Neural Network incorporated into a VCM dynamical system, where the structure comes from some error notations standard in VCM control. Unlike adaptive VCM control, where a regression matrix must be computed from the dynamics of the structure, the basis functions for the Artificial Neural Network controller can be derived from the physics of the VCM structure.
The present invention uses an Artificial Neural Network controller in solving the tracking problems associated with the nonlinear properties of the voice coil motor of a magnetic disk drive. The present invention provides a method and apparatus for training an Artificial Neural Network based controller in the normal mode of operation, (i.e., without the need for a specialized learning phase). The resulting Artificial Neural Network control system weight update algorithm overcomes many of the inherent difficulties associated with the standard Artificial Neural Network control architectures.
The approach of the present invention uses the error states of the system to modify the weights. As the error states approach zero, the weight changes approach zero. The present invention is particularly novel when compared to conventional supervised learning methods which use training sets of input-output pairs and gradient-descent methods for weight update algorithms. Examples of such conventional supervised learning methods are disclosed, for example, in B. Kosko, Neural Networks and Fuzzy Systems, 1992, and C. S. George Lee, Neural Fuzzy Systems, 1996, both of which are incorporated herein by reference.
Additionally, the control system of the present invention is stable when the initial weights are set to zero.