Enterprise applications such as banking, healthcare, and others often use flat files to import and export data between applications. Flat files contain machine-readable data that is typically encoded in printable characters. The term “flat” means that the file is not indexed. The term also implies to some that a flat file does not have a hierarchical structure; however, many flat files do have a hierarchical structure.
Data stored in a flat file is most often formatted as text delimited by a character or group of characters or based on fixed length formatting. This provides a structure for the data and a way to differentiate between sections of the data. Because they are relatively simple text files and lack an index, flat files are not easily queried nor do they provide rigorous validation functions. Further, flat files can contain a vast amount of redundant data, as data may be repeated in several locations. This wastes disk space and slows down queries. Also, data entry is time consuming because the same data often must be entered. For example, to record the sale of 500 widgets, the price, description and supplier details will have to be recorded 500 times. Moreover, delimited and fixed length flat files are not very easily interpreted by humans.
This repetition of data entry can lead to typographical errors if the same data has to be entered multiple times. Also, changes to existing data may have to be updated for each occurrence. Because of these drawbacks, and the need for flat-file based applications to interact with XML-aware applications and Web services, there is a growing need to convert flat file data to an XML format.
XML is suited for the interchange of data as XML documents are tagged, easily parsed, and can represent complex data structure. As a result, many large entities wish to convert their legacy data, stored in a flat file format, to XML. The conversion of a flat file to an XML format requires proper representation of the data embedded in the flat file in some template form so that it can be converted to XML.
One way to accomplish this task is by hand, simply copying the flat file data into a new XML documents. This is unwieldy for large files and prone to human error. Another current method to convert flat files to XML documents is to write complex scripts in languages such as Perl. These scripts attempt to parse the flat file data and create a new XML file with the flat file information properly tagged and in the correct hierarchical structure. This method is also unsuitable for large and complex flat file data, as developing and debugging the scripts takes significant time and resources.
Another approach is taken by commercial software products that convert flat files to XML instances based on proprietary templates and conversion routines. These approaches are deficient in that they rely upon closed-format, proprietary technology that a developer must first have access to, and then learn, before implementing a solution. In addition, these approaches are not tailored to meet specific needs and often do not scale to fit the requirements of generic flat file-to-XML instance generation.
Consequently, there is a need for an improved technique that does not suffer from the shortcomings previously described.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, the approaches described in this section may not be prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.