This invention relates to a method and apparatus for debugging faults occurring in a router or other network device and more particularly to compressing core file and storing the compressed core file into an internal flash memory.
Network servers and other types of network devices often experience unrecoverable faults. One example of an unrecoverable fault occurs when a routine writes an invalid address value into core memory. When a process tries to access the illegal address value, a fault occurs. For example, a process may request a memory address for a status register used for conducting a direct memory access (DMA) operation. If the memory address is invalid, a fatal error occurs when the process attempts to access the memory address, which causes the router to reset.
Viewing core files is vital to resolving fatal fault errors. A core file is essentially a copy of DRAM which contains the program, program pointers, program variables, etc. The core file provides a snap-shot of the router at the time the fault occurred. DRAM is used to meet performance requirements of the system and since the contents of the DRAM are destroyed after a reset operation, the core file must be downloaded to another storage device. Routers can be equipped with some flash memory. However, due to the cost of flash memory, the flash memory is not large enough to hold all DRAM contents. Thus, the core file must be downloaded to an external server connected to the router through a local area network (LAN). The core file can then be analyzed by an engineer from a computer or workstation to identify the source of the fault.
The problem with copying a core file to an external device is that the fault condition causing the router to shutdown may be caused by a process that must be operational in order to download the core file. For example, the fault may be caused by a software error with a network protocol or LAN media drivers. If these network interface processes are not operational, the core file cannot be successfully downloaded to an external network device. Thus, in the past, a special image had to be created in order to investigate the fault. The special image is produced by modifying operating code to print out specific identified information before the fault occurs. Generating special images to locate faults requires a large amount of trial and error which is extremely time consuming. Alternatively, the router is taken out of production so that the current content of the main memory can be analyzed with a ROM monitor.
Accordingly, a need remains for a faster more reliable way to save core file after a fault condition occurs in a network device.
A network device, such as a router or switch, downloads a core file into a local flash memory. In order to increase storage capacity, the core file is compressed before being dumped into the local flash memory. The flash memory is local and internal to the network device. Because network interface elements do not have to be functional for a successful core download, the core download is faster and more reliable than existing download techniques.
In one embodiment, the network device comprises a router having a CPU for controlling packet processing operations. DRAM is used for a main memory and its contents constitutes the core file. Network interface elements are coupled between the CPU and different external networks. The network interface elements process and route the packets received from the external networks. The core file is downloaded from the main memory to local flash memory independently of these network interface elements.
During the shutdown routine, interrupts are disabled for any processing elements, such as the network interface elements, that are not needed to perform the core download. Thus, the CPU is not interrupted by routines that could generate additional fault conditions. Because these processing elements are disabled, the DRAM contents cannot be modified by other processes that might be operating after the fault condition. Thus, the core file will more accurately represent a snapshot of the system at the time the fault condition occurred.
In one embodiment of the invention, the CPU downloads the core file to the same local flash memory used for storing the router operating routine and the router shutdown routine. Router platforms may contain more than one flash memory device and different flash memory configurations. The network device can also be configured by a user to download all or part of the core file into one or more of the different flash memory devices used in the specific platform.
In order to increase download capacity, each byte of the core file is compressed using a standard compression routine. The compressed core file is written into a temporary buffer in main memory. Once the temporary buffer is full, the contents of the buffer are downloaded into the local flash memory.
The router is coupled to a network server through a LAN. The router is reset after completing the core download. The server uses a file transfer operation to access the router and read the core file from local flash memory. The core file is then analyzed to determine the state of the router when the shutdown event occurred.