European Published Patent Application No. 0 399 491 further describes the establishment of a communications connection between the first logical component and the second logical component, and the transmission of an information message across the established communications connection from the second logical component to the first logical component. The information message contains information regarding the current application status of the second logical component, as well as a comparison of the information of the transmitted information in the first logical component with corresponding information available at the first logical component.
The present invention can be applied in wide areas of communications technology, but it is explained in greater detail, by way of example, only in relation to a CAN network system for use in the automotive area.
In systems of this type installed in a motor vehicle, there often arises the situation that only individual components of the network system carry out a restart, i.e. reset, in response to a failure or fault. A typical trigger for a reset of this type is the detection of an undervoltage condition, which can vary from component to component.
From this arises the fundamental problem to define specific communications mechanisms between the components, in order to achieve a restarting of the distributed applications as rapidly as possible. In this context, it is generally advantageous to reproduce to reproduce or coordinate the state of the entire system before the reset. This is of particular significance for systems that have user interaction, for example via a display and operating element, in order to minimize the disturbances triggered by resets during operation, such as images on a display that are no longer current, delays in the servicing of components, lost inputs, etc.
The problem underlying the present invention thus generally lies, in a network having distributed applications, in effectively coordinating the network components, corresponding to applications. This applies specifically to a restart after a failure in individual components.
In general, the following four cases must be detected and treated appropriately.
i) “Normal” start-up of all components (system start).
ii) Simultaneous failure in all components after systems start, e.g., as a result of an undervoltage for all components in the network.
iii) The failure in a partial system after system start, i.e., a failure condition in one or a plurality of components.
iv) A hardware reset after a first power-on or fatal failure.
First, the specific problem of a reciprocal coordination or synchronization of distributed applications and the logical components that correspond to the latter will be explained in greater detail.
In systems in which components communicate with each other via a network, for example, a bus system, a specific basic prerequisite for the method described here is a separation of communication and application within one component (see ISO 7490, Information Processing Systems—Open Systems Interconnection Basic Reference Model, 1984).
At issue here are systems that communicate via a network, e.g., a bus system, communication and application being defined as follows.
Communication denotes all functions that are required for the purpose of reliable data exchange with other components. Typically, a stratification is used that is derived from the OSI model of the ISO (see above), i.e., the conversion from the physical layer to the application interface. For in the current approaches in the automotive area, only one subset of the OSI model is used, adjusted to the requirements of this application area, i.e., some layers remain “empty.” Furthermore, as an expansion of the OSI model, a network management that includes the different layers is usually used, synchronizing the different components with respect to communication.
Application denotes the specific task of each component, e.g., the functionality of the CD player or the various functions of a car telephone.
In systems having at their disposal logical point-to-point (1:1) connections between the individual components, the “commands” of the application of a component A are relayed to component B via a connection of this type, component B then responding, e.g., with an application acknowledgment. An example is the activation, using an operating element, of a CD changer to play.
Advantageously, these 1:1 connections of the application plane are reproduced in the 1:1 connections of the transport layer (layer 4 in the OSI model). While the 1:1 transport connections can be set up or, in the case of failure, reset by both participating components, which corresponds to a symmetrical method, this does not apply to the application plane. Here, for example, only one component A—the “master”—is entitled to control one component B—the “slave.” This is especially true for switching the main states of the slave, for example, “on” and “off.”
Advantageously, a system of this type is based on network management, which also distinguishes between master and slave functionality. In this case, an “application master” is usually also simultaneously a “network management master.” In addition, however, systems are possible that, in network management, recognize only equal-access stations, in which therefore the master/slave distinction is limited to the application plane. A typical example for the latter network management method is “decentralized network management,” on CAN networks in the chassis area of the automotive industry.
Unless otherwise expressly stated, in what follows the designations “master” and “slave” always apply to the application plane.
Therefore, a relatively simple system is composed of one master and at least one slave, employing in each case a 1:1 connection between master and each slave. Of course, more complex systems can be constructed, that are composed of a plurality of master components having the same or even different slaves. In this context, however, the assumption is that for every logical connection, it is unambiguously established, which component is the master and which component is the slave. In this manner, a hierarchical system can then be formed out of master, sub master(s), and slaves, as is described in German Published Patent Application No. 196 373 12.
Usually, the master is largely responsible for the coordination of network-wide applications. For the slave, it is sufficient to detect the reset of the master, for example, via a network management service.
For the slave, this leads, for example, to the initiation of certain emergency functions or to an autonomous shutdown.
The master detects the reset of the slave component either through a cyclic querying of the slave status or through a fresh communication set up, initiated by the slave (e.g., a communication system having network management and a transport protocol in accordance with the prior applications, German Published Patent Application No. 41 31 133 or German Published Patent Application No. 196 373 12).
The above-mentioned approach is disadvantageous because the cyclic querying of the slave status by the master is cumbersome and communications-intensive, since usually no change of status is present. Furthermore, this querying mechanism is inflexible, since generally only components are queried that have already been installed.
Finally, a fresh communication setup by the slave, as an alternative to the above cyclic querying mechanism, brings with it the disadvantage that, except for the information that the communication has again been established, the master receives no messages concerning the cause of the reset and/or its prehistory, i.e., by way of example, the previous application status.
The type of local reset can be detected by the master, e.g., through the entry of local status information into a non-volatile memory, such as an EEPROM, and through evaluation of the entry at a subsequent restart.
If, after a reset, the master, for example, discovers the entry, “system started and in normal operation,” then it can infer a restart as a result of a failure condition. On the other hand, if “system is shut down” is entered, then it is a question of a normal startup. In the case of a failure, the problem is the limited memory possibilities in the master.
In general, only the local status can be saved, since, for the status storage of all connected slaves, the time and/or the memory capacity is usually not sufficient. In addition, in the event of a failure-reset of the master, having recourse to status information concerning the slave stored, for example, in an EEPROM is risky, since the slave could also have carried out a reset and thus the stored slave state deviates from the current state.
Since, in the above standard approaches, no detailed and stored status information concerning the slave is available, the master will generally restart or initialize the slave application.
Since, as a result, the previous settings generally must be reset, this signifies the loss of knowledge concerning the prehistory of the slave, i.e., the original operating state or status of the application can no longer be derived. Noticeable delays generally result from the restart of the slave application.