The present invention relates to a method for controlling and preconfiguring a steelworks or parts of a steelworks. In this context, the term parts of a steelworks is intended to mean rolling mill trains, rolling stands, continuous or strip casting systems and units for heat treatment or cooling.
The present invention also relates to a method for controlling and/or preconfiguring a rolling stand or a rolling mill train for rolling a strip, the rolling stand or the rolling mill train being controlled and/or preconfigured by means of a model of the rolling stand or the rolling mill train, the model having at least one neural network whose parameters are matched or adapted to the actual conditions in the rolling stand or in the rolling mill train, in particular to the properties of the strip.
In order to control and preconfigure rolling stands or a rolling mill train for rolling a strip, models may be used which have at least one neural network whose parameters are matched or adapted to the actual conditions in the rolling stand or in the rolling mill train, in particular to the properties of the strip. Model-assisted control or preconfiguration of this type is in particular possible for applications as described in DE 41 31 765, EP 0 534 221, U.S. Pat. No. 5,513,097, DE 44 16 317, U.S. Pat. No. 5,600,758, DE 43 38 608, DE 43 38 615, DE 195 22 494, DE 196 25 442, DE 196 41 432, DE 196 41 431, DE 196 42 918, DE 196 42 919, DE 196 42 921. If they are adapted on-line, neural networks for these applications are adapted at constant adaptation rates. This means that, on the basis of each strip which is rolled, the error function for this strip is calculated. The leave of this error function is then determined and, with a view to a gradient optimization, a procedure is adopted whereby the error function is reduced by the chosen adaptation rate. It has been shown that, using on-line adaptation, the term on-line adaptation being intended to mean the adaptation of a neural network on the basis of a strip which is rolled, the quality of rolled steel is significantly improved. Difficulties are, however, found in terms of reliability problems pertaining to the convergence during the adaptation. If, because of deficient adaptation to malfunctioning, incorrect control or deficient preconditioning arise, this may lead to large losses for the application on account of inferior rolled steel or damage to the rolling mill train. Furthermore, because of the high investment costs for a rolling mill train, downtimes are very expensive. This being the case, the adaptation of neural networks for the control or preconfiguration of rolling stands or rolling mill trains is problematic.
An object of the present invention is to provide a method for making the control or preconfiguration of a steelworks or parts of a steelworks more reliable. It is furthermore desirable to improve the accuracy of the model values determined by means of a neural network.
The object is achieved according to the invention by providing a method in which the rate at which the parameters are matched or adapted to the actual conditions in the rolling stand or in the rolling mill train, in particular to the properties of the strip, is varied. It is in this way possible, for example, to distinguish whether the neural network has already properly mastered the function to be approximated at the corresponding point, whether the data point belongs to an infrequent event, that is to say to steel which is rarely rolled, or whether, because of a measuring error or an error in the subsequent calculation, the data point to be trained is in fact completely unusable. This leads to much more robust adaptation. In an advantageous embodiment of the present invention, the rate at which the parameters are matched or adapted to the actual conditions in the rolling stand or in the rolling mill train, in particular to the properties of the strip, is varied as a function of the information density, in particular the training data pertaining to strips of the same or a similar type.
The information density D is in this case an (abstract) measure of how much information is present at a given point in the input space (typically, how many strips of the same or a similar quality have already been rolled). An illustrative embodiment for a definition of the information density is       D    ⁡          (              x        n            )        =            ∑              k        =        1            sizenet        ⁢                            b          k                ⁡                  (                      x            n                    )                    ⁢                        D          k                ⁡                  (                      x            n                    )                    
D(Xn) is the estimate of the information density for point xn, after treating all the patterns x1 to xnxe2x88x921. bk(xn) is the activity of the k-th neuron in the hidden plane or the hidden planes of the neural network on application of the pattern xn. Dk(xn) is the estimate of the local information density at the site of the k-th neuron, after processing all patterns x1 to xnxe2x88x921. sizenet corresponds to the number of neurons in the hidden plane or the hidden planes of the neural network. bk is calculated from                               b          k                ⁡                  (                      x            n                    )                    =              exp        ⁡                  (                                    -                              1                2                                      ⁢                                          (                                  x                  -                  μ                                )                            T                        ⁢                                          ∑                                  -                  1                                            ⁢                              (                                  x                  -                  μ                                )                                              )                      ⁢          xe2x80x83                                                with            ⁢                          xe2x80x83                        ⁢            x                    =                      [                                                                                x                    1                                                                                                                    x                    2                                                                                                ⋮                                                                                                  x                    n                                                                        ]                                                        μ          =                                    [                                                                                          μ                      ⁢                                              xe2x80x83                                            ⁢                      1                                                                                                                                  μ                      ⁢                                              xe2x80x83                                            ⁢                      2                                                                                                            ⋮                                                                                                              μ                      ⁢                                              xe2x80x83                                            ⁢                      n                                                                                  ]                        ⁢                          xe2x80x83                        ⁢            and                                                                    ∑                          -              1                                ⁢                      =                          [                                                                                          1                      2                                                                            0                                                        …                                                        0                                                                                                              σ                      1                                                                                                  xe2x80x83                                                                                                  xe2x80x83                                                                                                  xe2x80x83                                                                                                            0                                                                              1                      2                                                                                                  xe2x80x83                                                                            ⋮                                                                                                              xe2x80x83                                                                                                  σ                      2                                                                                                  xe2x80x83                                                                                                  xe2x80x83                                                                                                            ⋮                                                                              xe2x80x83                                                                            ⋰                                                                              xe2x80x83                                                                                                            0                                                        …                                                                              xe2x80x83                                                                                                  1                      2                                                                                                                                  xe2x80x83                                                                                                  xe2x80x83                                                                                                  xe2x80x83                                                                                                  σ                      n                                                                                  ]                                          
xcexci being the expected value and "sgr"2i the variance of xi.
Dk(xn) is calculated as:             D      k        ⁡          (              x        n            )        =                    I        k            ⁡              (                  x          n                )                    I      ⁡              (                  x          n                )            
Ik(xn) is the information accumulated locally over the entire history of all patterns xn to xnxe2x88x921 at the k-th neuron of the hidden plane or of the hidden planes of the neural network, I(xn) is the information similarly acquired overall in the network. Ik(xn) is calculated as             I      k        ⁡          (              x        n            )        =            ∑                        x          xe2x80x2                =                  {                                    x              1                        ⁢                          xe2x80x83                        ⁢            …            ⁢                          xe2x80x83                        ⁢                          x                              n                -                1                                              }                      ⁢                            b          k                ⁡                  (                      x            xe2x80x2                    )                    ⁢              f        ⁡                  (                                    E              ⁡                              (                                  x                  xe2x80x2                                )                                      ,                          η              ⁡                              (                                  x                  xe2x80x2                                )                                              )                    
f is a function of the prognosis error E(xxe2x80x2) (see below) and the learning rate xcex7(xxe2x80x2). It takes into account that, for the patterns learned in the past only with a low learning rate, there is only a small amount of information. In the simplest case, it would be possible to set
f=1∀(xxe2x80x2∈x1 . . . xnxe2x88x921)
For I(xn):       I    ⁡          (              x        n            )        =                    ∑                  k          =          1                sizenet            ⁢                        I          k                ⁡                  (                      x            n                    )                      =                  ∑                              x            xe2x80x2                    =                      {                                          x                1                            ⁢                              xe2x80x83                            ⁢              …              ⁢                              xe2x80x83                            ⁢                              x                                  n                  -                  1                                                      }                              ⁢              f        ⁡                  (                                    E              ⁡                              (                                  x                  xe2x80x2                                )                                      ,                          η              ⁡                              (                                  x                  xe2x80x2                                )                                              )                    
Since, for all xxe2x80x2xcex5{x1 . . . xnxe2x88x921], then             ∑              k        =        1            sizenet        ⁢                  b        k            ⁡              (                  x          xe2x80x2                )              =  1
In a further particularly advantageous embodiment of the present invention, the rate at which the parameters are matched or adapted to the actual conditions in the rolling stand or in the rolling mill train, in particular to the properties of the strip, is varied as a function of the expected error, in particular the average error over the entire adaptation phase or the average error over a long time interval during the adaptation.
The expected error F is, for example, the average error over the entire history at the point xn in space. It may, for example, be of the following form:       F    ⁡          (              x        n            )        =            ∑              k        =        1            sizenet        ⁢                            b          k                ⁡                  (                      x            n                    )                    ⁢                        F          k                ⁡                  (                      x            n                    )                    
Fk(xn) being the local expected error for the n-th pattern at the k-th neuron of the hidden plane of a neural network. If Fk(xn) is given as             F      k        ⁡          (              x        n            )        =                    ∑                              x            xe2x80x2                    =                      {                                          x                1                            ⁢                              xe2x80x83                            ⁢              …              ⁢                              xe2x80x83                            ⁢                              x                                  n                  -                  1                                                      }                                    xe2x80x83                    ⁢                                    b            k                    ⁡                      (                          x              xe2x80x2                        )                          ⁢                  E          ⁡                      (                          x              xe2x80x2                        )                          ⁢                  f          (                      E            ⁡                          (                              x                xe2x80x2                            )                                                                    I          k                ⁡                  (                      x            n                    )                    ⁢              xe2x80x83            
Through multiplication of the error E(xxe2x80x2) with bk(xxe2x80x2), the numerator contains a measure of the local error. This error is divided by the local information density.
A further approach for calculating the expected error is for the calculation to be carried out in the form of local statistics, in which not only the average of the local error but also its variance are taken into account.
In a further advantageous refinement of the invention, the rate at which the parameters are matched or adapted to the actual conditions in the rolling stand or in the rolling mill train, in particular to the properties of the strip, is varied as a function of the current error during the adaptation, i.e., the current error between the conditions in the rolling stand and/or in the rolling mill train, in particular the properties of the strip, determined by means of the neural network and the actual conditions.
The current error E is, for example, the Euclidean or other distance between the network prediction, i.e., the value determined by means of the neural network, and the actual value. The Euclidean distance, which is advantageously used as the current error E, is defined as
E=(yn(xn,w)xe2x88x92tn(xn))2
xn being the input variable or the input variables of the network, yn(x,w) being the output variable, for example the rolling force, of the neural network for a pattern xn as a function of the network weights w, and tn(x) the actual value corresponding to yn(xn,w). n corresponds to the chronological sequence of the training patterns.
A case distinction is drawn according to the invention for at least one of the three variables information density, expected error and current error. In this case, in a particularly advantageous embodiment of the present invention, distinction is made between a normal case (well-trained network), unusual case (typically a very infrequently rolled steel, for example coin steel), aberrant (for example due to failure of a measuring sensor) and an unstable process (for a very similar type of steel, the target value in the past fluctuate considerably). The degree of the adaptation of the network is chosen in accordance with this error distinction, as shown by Table 1. In this case, ↑ indicates high, ↓ indicates low (possibly equal to zero) and xe2x86x92 indicates medium.
If the information density is high, the expected error is low and the actual error is low, then a well-trained network is assumed and the adaptation rate is kept at a medium value. If the information density and the current error are small, then it is assumed that the neural network, in the case of an infrequent kind of steel, i.e., an unusual case, achieves good generalization. The adaptation rate is kept at a medium value. If, however, the current error is high with a small information density, then the adaptation rate is increased. A combination in which the information density and the current error are high, but the expected error is small, is interpreted as aberrant and the adaptation rate is accordingly reduced, or no adaptation takes place. If both the information density and the expected error are high, this is assessed as an indication of an unstable adaptation process. The adaptation is terminated.
The method according to the present invention is maybe used in conjunction with the applications described in DE 41 31 765, EP 0 534 221, U.S. Pat. No. 5,513,097, DE 44 16 317, U.S. Pat. No. 5,600,758, DE 43 38 608, DE 43 38 615, DE 195 22 494, DE 196 25 442 DE 196 41 432, DE 196 41 431, DE 196 42 918, DE 196 92 919, DE 196 42 921.