The present invention relates generally to methods and systems for data file creation. Embodiments of the present invention provide for automatic data extraction and data feed file creation at remote locations and transferring completed data feed files to locations electronically accessible to a software application.
The ability to gather and manipulate data is increasingly important in today""s society. Small businesses and large corporations alike are aware that compiling, manipulating and reviewing data can provide valuable information about their present and future customers. Whether it is data on product usage or data on seasonal trends in customer purchases, the need to locate, gather on demand, and manipulate data is important. Even computer software designers are aware that the ability to manipulate and present data may provide valuable insight to business owners and managers. To that end, many software designers have developed software applications capable of manipulating and presenting data in a variety of ways to help improve data compilation and manipulation.
Companies have long recognized the need for gathering and storing data relating to its customers, suppliers etc. At one time, this data was kept in boxes and stored in rooms. When storage space for data became cumbersome, data was often transferred to mainframe computers and other like electronic data storage devices. These storage devices allow massive electronic data files to be stored in one location. However, when mainframes and other like devices became the preferred method of storing large volumes of data, the need and the technology to perform data manipulations to present data in a variety of ways was not as prevalent as it is today. Instead, the mainframe was often used to archive data that was important but infrequently accessed.
As times changed so did the purpose of storing data on a mainframe and the frequency with which this data was accessed. No longer is data stored on a mainframe primarily because it is historic information that needs to be archived. To the contrary, data is sometimes stored on a mainframe because it is so voluminous that it would occupy too much space on a local area network server, a wide area network server or a desktop computer hard drive. The increased frequency with which data on a mainframes is needed creates problems with efficiently providing desktop software applications access to the data when it is stored on a remote mainframe or other like data storage device.
To obtain information stored in data files on a mainframe, end-users often have to print the entire data file from the mainframe. Oftentimes, the data manipulation performed by the desktop software application does not require all the data stored in the data file on the mainframe. The end-user is, therefore, forced to manually filter through the voluminous data file, pinpoint the data needed and manually create a data feed file for use by the desktop software application. Thus, there is an obvious need for the capability to automatically and electronically select the data stored on the mainframe and generate a data feed file that is accessible to desktop software applications for data manipulation and presentation.
Generally speaking, the present invention relates to a method and system for generating a data file by extracting specified data from a larger raw data file. The goal of this invention is to generate a data feed file that can be used by desktop applications for any number of operations including, but not limited to, data compilation and data tracking.
According to an embodiment of this invention, the raw data file contains large volumes of related data. The raw data file is comprised of multiple records, each record comprised of at least one data field. Each data field contains the same category of data in each record of the raw data file. Assume, for this example and all examples hereafter, that the raw data file contains customer-billing information. Each record in the raw data file would therefore contain customer-billing information pertaining to an individual customer. Each record in the raw data file would further be comprised of data fields, each containing a different category of data, such as customer name, customer address, the customer phone number etc. Below is an example of the records and data fields of a raw data file. 
In an embodiment of this invention, the raw data file is so voluminous that it is stored on a remote data storage device, such as a mainframe, instead of being stored on a computer hard drive, local area network (LAN) or other like electronic storage device accessible to a user-preferred desktop software application or applications.
In an embodiment of this invention, desktop software applications include, but are not limited to, the following commonly known software applications: MICROSOFT EXCEL, MICROSOFT ACCESS, and ORACLE. These software applications and other like applications, are used to perform operations such as, data compiling, data tracking and other types of data manipulation. In an embodiment of this invention, the user-preferred software application uses data contained in a raw data file that resides on a mainframe or other like data storage device. In an aspect of the invention, the operation performed by the desktop application, may not require all the data contained in the raw data file. Instead, the desired data manipulation may only require select data contained in the raw data file. It is the goal of this invention to generate a data feed file by extracting only needed data from the raw data file. The data feed file will then contain only the data the user-preferred desktop software application needs to perform its designated operation.
According to an embodiment of this invention, the data feed file is generated by extracting the desired data from the raw data file and populating the data feed file with the extracted data. To determine what data to include in the data feed file, the end-user develops a set of data feed file criteria.
In an embodiment of the invention, the data feed file criteria identifies the location of the desired data in the raw data file by identifying the data field in each record of the raw data file that contains the desired data. The data field containing the desired data may be identified in any number of ways. In another embodiment of the invention, the location of the data filed is identified by the category of data the data field contains. For example, a data field in each record of the raw data file may contain the amount payable by each customer. In such cases, the data field could be identified as the xe2x80x9camount payable data fieldxe2x80x9d.
In another embodiment of the present invention, the location of the data field is identified by the position of the data field in relation to other data fields in each record of the raw data file. For example, the data field that contains the desired data may be the fifth data field from the left in each record of the raw data file. In such cases, the data field could be identified as data field number five (5).
In addition to identifying the location of the desired data in the raw data file, the data feed file criteria may specify additional requirements for data to be included in the data feed file. For example, the data feed file criteria may specify that data located in the amount payable data field should be included in the data feed file only if the amount payable is greater than two hundred dollars ($200).
In yet another embodiment of the invention, the set of data feed file criteria may further contain criteria for including data not in any data field of the raw data file. For example, different service fees may apply to customers depending on the customer""s zip code. However, the raw data file may not contain the various fees. In such cases, the data feed file criteria may specify that a data field in the data feed file include a designated service fee for customers depending on the customer""s zip code.
In an embodiment of the invention, multiple raw data files may contain the data needed for the data feed file. In such cases, the data feed file criteria may further identify the raw data file from which specified data should be extracted.
Once the end-user develops a set of data feed file criteria for the data feed file, the end-user provides a data extraction tool the data-locating criteria. The data extraction tool then locates and extracts the desired data in the raw data file using the data-locating criteria. The data extraction tool further creates a report containing the extracted data.
Further, in another embodiment of the invention, data extracted from multiple data fields of the same record are placed on the same line in the report. The data extracted for each data field is separated by a space in the report. Each line of the report is also delineated by a print stop character.
As described above, the data feed file criteria may include criteria that defines the characteristics of data to include in the data feed file (xe2x80x9cdata-limiting criteriaxe2x80x9d). To ensure that the data feed file contains only the data specified by the data-limiting criteria, a discrimination tool reads the report created by the data extraction tool and verifies that the data on each line of the report complies with the data-limiting criteria. When the discrimination tool locates data on a line in the report that does not comply with data-limiting criteria, the discrimination tool deletes the entire line of data from the report. By eliminating the entire line of data, the discrimination tool eliminates all the unnecessary data from the report and prevents unnecessary information from being included in the data feed file.
In an embodiment of the present invention, the discrimination tool may not always be required. When the end-user only develops data-locating criteria, the discrimination tool may not be necessary and the step involving the discrimination tool may be eliminated.
After the report is created and includes only the data necessary to generate the data feed file, a formatting tool is used to complete the data feed file. The data feed file, like the raw data file, contains at least one record. Each record in the data feed file will further comprise at least one data field and each data field in each record will comprise a single category of data. The number of data fields in the data feed file is determined by the data feed file criteria. For example, if the data-locating criteria identifies the location of five (5) data fields in the raw data file from which data should be extracted, the data feed file will contain at least five data fields. Additional data fields may also be included in the data feed file if the data feed file criteria designates data to be included in the data feed file that is not contained in the raw data file.
In an embodiment of the invention, the formatting tool uses the data from the report to populate each data field in each record of the newly created data feed file. The formatting tool may further populate fields in the data feed file with additional data specified by the data feed file criteria. The formatting tool may also arrange data fields in the data feed file in any order specified by the end user.
Once the formatting tool has populated the data fields of the data feed file, the data feed file is ready to be used by the user-preferred software application or applications. The completed data feed file, is thereafter transmitted through the appropriate network to a location electronically accessible to the user-preferred desktop software application. In one embodiment of the present invention, the data feed file is transferred using a file transfer protocol (FTP) to a local area network (LAN) accessible to the user-preferred desktop software application.
An advantage of this invention is that the data feed file is created automatically. The invention eliminates the need to manually review the raw data file and manually enter the desired data from a raw data file to use with the preferred desktop application.
It is a further object of the invention to improve the integrity of information generated by desktop applications by eliminating data entry errors that inevitably occur when a data feed file is manually generated.
It is a further object of this invention to reduce the time needed to compile data for use with desktop applications.
It is yet another object of this invention to improve the ability to change and update reports created by desktop applications using data located on mainframes or other electronic storage devices.