The present invention relates generally to microcontrollers. More specifically, the present invention relates to a method for verifying the proper operation of a microprocessor-based control module.
The use of electronics in automobiles is continually increasing. Many electronic applications include a microcontroller unit (MCU) that is comprised of a central processing unit (CPU) and associated peripheral devices. The peripheral devices may be specific or customized to the controller application. These can include communication devices such as serial peripheral interfaces, memory devices such as RAM, ROM/FLASH and EEPROM, timers, power supplies, A/D converters and other devices, either built on the same integrated circuit or as separated devices. The CPU and its peripheral devices are linked together by a communications bus.
An MCU dedicated to the control of one subsystem (such as anti-lock brakes-ABS) is said to be embedded in that subsystem. When the MCU is part of an application Electronic Control Unit (such as an ABS ECU) which contains interface circuits for example, to aid in the collection of data or support high current drive requirements, the combination may be referred to as an embedded controller. The method as described, is not limited in use to embedded controllers.
MCUs typically include self-tests to verify the proper operation of the CPU and the associated peripheral devices. The self-test will detect illegal memory access decoding, illegal opcode execution or a simple Watchdog/Computer operating properly (COP) test. More fault coverage than this is required for a mission critical system. In a mission critical system, the correct operation of the CPU and the MCU""s peripherals (such as timer module, A/D converters, and Communication Modules, etc.) that comprise the MCU is important for the satisfactory operation of the vehicle. Correct operation of the MCU must be established during the initialization phase following. power on, and during repetitive execution of the control program.
Allowing the device under test (such as the CPU) to test itself is a questionable practice. Test methods that are implemented so that execution occurs as the application algorithm is running will be referred to as xe2x80x9cOn-Linexe2x80x9d or xe2x80x9cconcurrentxe2x80x9d testing. Further, xe2x80x9cOff-Linexe2x80x9d testing will reference the condition when the device is placed in a special mode in which the execution of the application algorithm is inhibited. Off-line testing is used for manufacturing test or for special purpose test tools, such as that a technician might use in the field to run unique diagnostic tests. On-line, concurrent testing using redundant software techniques is throughput Consuming. The ability of the CPU to test it""s own instruction set with a practical number of test vectors is limited at best.
Tens of thousands of test vectors are generated for manufacturing tests are required to establish a 99% fault detection level for complex microcontrollers. Writing routines to test the ability of a CPU to execute various instructions by using sample data and the instruction under-test that will be used in the application is practically futile.
Even if a separate xe2x80x9cTest ROMxe2x80x9d was included in the system to either:
1. Generate a special set of inputs and monitor the capability of the CPU and application algorithm or a test algorithm to properly respond.
2. Generate and inject test vectors derived from manufacturing fault detection testing and then evaluate the capability of the CPU to properly process, and produce the correct resultant data at circuit specific observation points.
In a complex system a test ROM would become inordinately large in order to guide the CPU through a limited number of paths or xe2x80x9cthreadsxe2x80x9d of the application algorithm. The data used must be carefully selected and necessitates detailed knowledge of the MCU by the test designer. MCU suppliers rarely supply sufficient information to allow effective design. Thus the first test ROM method would be contrived and limited in its ability to simulate an actual operating environment. If the second technique were employed, and unless all of the manufacturing test vectors were used, the resulting tests would be partial and lengthy. If an attempt were made to isolate the portion of the system that was used and then target it with the proper vectors (to reduce the overall vector quantity), every time the algorithm changed the subset of vectors would be subject to further scrutiny, and possible modification. The technique would have further implementation difficulties for continuous validation of the system in a dynamic run mode of operation. The technique does not consider the concept of monitoring a system based on execution xe2x80x9cDwell Timexe2x80x9d in any particular software module or application xe2x80x9cRun Time Modexe2x80x9d condition.
Modifying the CPU to have built in-self test, such as parity to cover the instruction set look up table, duplication or Total Self Check (TSC) circuit designs, etc., of sub-components of the CPU, may result in a significant design modification to a basic cell design. CPU designers are reluctant to modify proven designs for limited applications.
Software techniques that involve time redundancy, such as calculating the same parameter twice, via different algorithms, also require that multiple variables be used and assigned to different RAM variables and internal CPU special function registers. Thus time redundancy also requires hardware resource redundancy to be effective. Because of the substantial amount of CPU execution time needed for redundancy, the CPU requires excess capacity to accomplish the redundant calculations in a real time control application. Because of the added complexity necessary for this implementation of redundancy, the verification process is commonly long and lengthy.
The process of requiring the CPU to perform the self-test is time consuming and inadequate, especially in applications having a relatively large memory and with many peripheral devices. To date, the most direct way to solve this problem has been to simply place two microcontrollers into the system. In such systems, each microcontroller is the compliment of the other and each memory and peripheral module are duplicated. Both devices then execute the same code in near lock step. The technique is effective because it checks the operation of one microcontroller against the other. Although the system tests are performed with varied threads through the algorithm, variable dwell in any portion of the application, and with the random-like data that occurs in the actual application environment, the following must be considered:
1. Data faults or hardware faults that may occur, are used to calculate system parameters. In a dual microcontroller system these parameters may be filtered before they are compared by the second microcontroller. Thus parametric faults are xe2x80x9csecond orderxe2x80x9d to the data or hardware faults that initially created them.
2. Parameters have to be carefully checked against tolerance ranges.
3. The number of times that a miscompare between the two devices actually occurs before a fault is actually logged and responded to must be established.
4. The fail-safe software is not independent from the application algorithm.
As adding parameters modifies the application algorithm, fail-safe software alterations must also be evaluated.
This technique is not an efficient form of resource allocation. Two identical, fully equipped, microcontrollers doing the same task is expensive. Also, extensive communication software is used to synchronize the data between the two microcontrollers.
Other dual microprocessor systems may use a smaller secondary processor to do a limited check of a few portions of the algorithm, or to accomplish a control flow analysis of the main controller to validate its execution from one module to the next or its ability to transfer to and return from subroutines. These schemes are inherently limited and can only detect a small subset of all possible system faults.
A common technique for verifying the operation of a MCU memory peripheral is to use a check sum. A check sum arithmetically sums the bits of a block of memory. The check sum is then compared to a reference value for that particular time for that block of memory in the operation of the CPU. One disadvantage of check sums is that if two opposing bits of the memory are flipped to the opposite state then the checksum will continue to be proper. This is referred to as aliasing. This technique is also slow, and the memory may not be validated within the time response of the system.
Another technique for verifying the operation MCU memory peripherals is to use parity. Single bit parity is faster than the checksum method described above, and synchronizes the memory validation with its use in the execution of the application algorithm. It will also however require the memory array design to be modified and it will require decoding by special hardware. The consequences of a parity fault must be processed by the CPU. Single bit parity is also insensitive to double bit flips in a data byte, The failure to correctly detect data faults is known as aliasing.
To circumvent the problem of adding special hardware to the CPU or software to the application, multiple bit parity schemes and standalone Error Detect and Correct (EDC) processors have been developed. The problem of modifying the memory array to include the extra parity bits still exists. In a typical application, 6 bits may be added to a 16-bit word. Using Hamming Codes, this technique can detect and correct single bit errors, detect but not correct all double bit errors, and detect some triple bit errors.
In the automotive market, check bits added to each word of a memory array is considered an excessive cost burden. The circuits involved are complex, and will add significant cost, but these systems can be integrated into the MCU bus architecture. The draw back to this scheme is that it is intrusive. All data must first be channeled though this device for processing before it is sent to the CPU, adding a delay to the system on every memory read.
There still exists a small amount of configuration software needed to run these devices. If a two or three bit error is detected in the data, an interrupt must be handled to alert the CPU that the affected data is not valid.
Finally, these systems target memory only. The device described in this patent will significantly reduce the possibility of aliasing. Further, the device and method described in this patent will process and detect faults in the CPU instruction streams. The device as described, can ensure that select software modules are processed by the CPU the same way each time they are executed. In this fashion fault detection coverage is added to the memory and the CPU in a single, non-intrusive, cost effective module.
It would be advantageous to verify the MCU memory in automotive applications at startup initialization and during operation of the vehicle. However, to verify that the memory is functioning properly using either the constant checksum or dual microcontrollers with synchronization and data communications software/parameter validation, may place such a burden on the CPU as to slow its operation so that it will not function as required. An alternative may require upgrading the CPU system capacity to re-gain the appropriate throughput. Providing additional capacity increases system cost.
As mentioned before, providing a second Microcontroller operating in parallel with the first is not very cost effective. This led to the development of a dual CPU system incorporated into a single microcontroller unit (MCU). In such a system each CPU operates from a common memory. The main function is to compare the operation of the extra CPU with the primary CPU in a functional comparison module. Only the step by step code execution of the dual CPUs is compared as the two devices execute out of the same memory. If the data from the memory is corrupt, it will be discovered at a later step in the validation process. To ensure that the CPUs are healthy, both CPUs must respond to the same data in exactly the same way. The dual CPU system employs continuous cross-functional testing of the two CPUs as multiple paths are taken through the application algorithm. If the system dwells in one software module or mode disproportionatly to others, the testing is similarly proportionate. Further, the random-like parameter data is xe2x80x9coperated onxe2x80x9d by the algorithm and any inappropriate interaction with the current instruction data stream is detected. This technique is effective for all environmental conditions such as temperature, voltage, or Electro-Magnetic Interference (EMI).
In essence the actual algorithm and data execution become the test vectors used to ensure xe2x80x9ccritical functionalityxe2x80x9d of the system. This is a corollary to common test methods that are designed to detect xe2x80x9ccritical faultsxe2x80x9d. The system tests only those resources the software application algorithm utilizes, and does not spend any time testing unused portions of the MCU system. If the algorithm is modified to use a previously unused set of available instructions (such as a possible fuzzy logic instructions set), or new operational modes are added (such as ABS Adaptive Braking or Vehicle Yaw Control), modification of the system is not required.
The dual CPU fail-safe system architecture is inherently independent of the application algorithm. Also, the primary design intent of a dual CPU system is to respond to a fault on its first occurrence.
Another disadvantage of previously known verification methods is that the increased complexity of both hardware and software results in degraded reliability of dual MCU systems. Further, increased care must also be taken to reduce EMI susceptibility.
In dual CPU concept, successful testing of peripheral modules by the main CPU is predicated on its correct state of health (the ability of the CPU to execute the algorithm as intended), and the xe2x80x9cBuilt In Self Testxe2x80x9d (BIST) circuits incorporated into the MCU peripheral modules. The job of the secondary CPU/Functional Compare Module is to guarantee the correct state of health of the main CPU. Then, as secondary step, the Main CPU methodically tests all subordinate peripherals by exercising or polling their unique BIST circuits.
This sequential scheme of first validating the CPU and then the MCU peripheral modules can be considered as a xe2x80x9cbootstrapxe2x80x9d validation system. Because of the sequential nature of the bootstrap method and since this scheme is run at the initialization phase and during repetitive execution of the control program, the speed at which the CPU can detect faults in MCU support peripherals is essential. It is therefore advantageous to the execution speed of this method to incorporate peripheral BIST circuits that are independent of, and require minimal interaction with the CPU.
It is therefore one object of the invention to provide a microcontroller unit capable of self testing in a time efficient manner.
Since a dual CPU system has a limited ability to detect corrupt data streams, a further object of this invention is to ensure that whenever possible, the data streams that the CPUs operate on are not corrupt. Further, the present invention eliminates the secondary step regarding memory peripheral validation (checksums), in the bootstrap process. Hence throughput capacity is returned to the CPU.
In one aspect of the invention, a circuit for determining the health of a microcontroller is provided having a circuit that includes a data line and instruction line. A first CPU is coupled to the bus. A reference memory stores a reference signature. A shift register is coupled to the bus and generates a second signature in response to the data line and instruction line. A controller is coupled to the register and the reference memory for controlling reading of the data line and instruction line. The controller compares the reference signature to the second signature. The controller generates a fault signal when the reference signature is unequal to the second signature.
In a further aspect of the invention, a method of validating the memory of a microcontroller unit comprises the steps of: obtaining a reference signature; reading the contents of a memory block; generating a second signature in response to contents of the memory block; comparing the reference signature to the second signature to obtain a comparison; and, indicating a fault in response to the comparison.
One advantage of the invention is that by using the circuit and method of the present invention faults in the microcontroller may be more readily determined.
Another advantage of the invention is that the implementation of the apparatus can be incorporated into the die. It is estimated that the present invention may be implemented into approximately 0.1% of the overall die area. A further advantage is that the circuit has minimal impact on CPU throughput and thus is very resource effective.
As will be shown in the following description, there are some modes in which the invented apparatus operates with total independence of the xe2x80x9cstate of healthxe2x80x9d of even the Dual CPU/Functional Compare System. Another advantage of this invention is that the speed at which corrupt data is detected is several orders of magnitude faster than the response time of an automotive vehicle.
These and other features and advantages of the present invention will become apparent from the following description of the invention, when viewed in accordance with the accompanying drawings and appended claims.