1. The Field of the Invention
This invention relates to computer software and, more particularly, to novel systems and methods for parsing textual streams and extracting desired contents within textual data from the textual streams and collating the desired contents into an output stream.
2. The Background Art
In today""s world, a typical work environment includes one or more computers. The software run on these computers enables users to provide documentation through word processing programs, analyze and track financial information through spreadsheets, plan and organize callendars through planning software, access information through database software and/or providers, etc.
To insure the proper functioning of this software, software companies test each computer program before it is sold to the public. Some companies test their software more extensively than others. Testing their software helps the authors of the software find and correct problems with the software. To accurately track the operation of software, some software engineers program the software to write files as the software is run. These files typically contain a type of log or record of what happened when the software was running. Usually, such a log file is useful to an engineer because when a problem is encountered this file enables someone to go back and look at this file (the xe2x80x9clogxe2x80x9d) and see what types of events occurred before, during, and after the problem.
On large computer programs or large computer systems including several computer programs, the number of errors and/or problems that should be analyzed may become very large. Often the different computer programs and/or different pieces of the computer programs were authored by different software engineers. In testing their portions of the larger system, these isolated engineers may have their own format for files containing information helpful in their debugging of their portion. When all the pieces are placed together a set of heterogeneous log files are usually created containing information that may be helpful in achieving the proper functioning of the computer program. The files are heterogeneous in that they have different formats. For example, one file may have all the errors at the top of the file under a heading marked xe2x80x9cERRORSxe2x80x9d with a log of events following. On the other hand, another file may only contain a log of events where the errors are scattered throughout the file marked with a simple xe2x80x9cprob:xe2x80x9d. Depending upon the number of engineers involved and the number of differing log file formats, there may be a good number of heterogeneous log files.
Many software testing teams write separate computer programs for testing computer programs. This testing software will most likely have its own set of log files created as it is run. Some testing software may be quite complicated and create a number of log files. These log files may be heterogeneous for the reasons stated above that the different parts of the program will often be written by different people.
Some testing software may be designed to remotely launch any test which can be run from the command line (e.g., xe2x80x9cc: xe2x80x9d) on any given operating system. Different tests may be desirable to fully test one or more computer programs. The tests that are likely to be run need not be developed by the same engineer or team. As a result, the log files produced by running any given group of tests can differ radically in format and content. This makes it very difficult for someone trying to interpret the results of a series of tests to analyze the data returned by these tests. The ideal situation would be for the results of all the tests to be output in a standard format so that analyzing the results becomes an easy task even for someone not completely familiar with the tests being run.
In addition, it is often desirable to be able to have the output of the results of a test in different places or devices. For instance, a user may want to print a summary report for a supervisor while at the same time storing all of the log information in a database which can be queried to analyze the data. This becomes very difficult if the user is using output from a test which the user didn""t write and for which the user does not have the source code. A user would likely have to write a parser to parse the different log files and put the data where the user wants it.
From the foregoing it would be an advancement in the art to provide a method for extracting multiple log files of differing formats, converting the extracted log files into a standard format, and write the extracted log files to various devices.
Such an invention is disclosed and claimed herein.
The present invention provides a method for extracting desired contents from multiple heterogeneous textual streams to provide normalized data which represents the desired contents. The invention selects input streams containing text data wherein the text data of one input stream is a different format and content than the text data of another input stream. The invention further selects a first set of parse rules corresponding to one input stream and a second set of parse rules, distinct from the first set, which correspond to a second input stream. The invention extracts desired contents from the input streams and provides normalized data which represents the desired contents. The invention selects an output interface and adapts the normalized data representing the desired contents to the output interface. The invention sends the normalized data to the output interface and the output interface is instructed to transform and format the normalized data into device specific data.
The invention includes data structures stored on a memory device which include inputted textual streams which contain the desired contents. The data structures comprise an opening module for opening the textual streams and an extraction module for extracting the desired contents from the textual streams. Device configuration data is also included within the data structures and serves to define a configuration of the output interface. The device configuration data comprises identification data for identifying an output device and comprises format data for formatting the normalized data. The data structures further comprise parse rules which are associated with the textual streams defining locations of the desired contents relative to other textual data in the textual streams. Finally, the data structures comprise output interface modules which are executable by a processor for processing the device configuration data, receiving the normalized data, and formatting the normalized data to provide device specific data.
Thus, it is an object of the invention to provide a process for extracting desired contents from multiple heterogeneous or homogeneous textual streams in accordance with applicable parse rules.
It is another object of the invention to collate the desired contents into normalized data.
It is yet another object of the invention to format the normalized data representing the desired contents into device specific data for one or more output interfaces devices and writing the normalized data to the one or more output interfaces.