The present invention relates to a method and apparatus for processing data files, the data files being generated in accordance with different protocols. In particular, the present invention relates to the generation of and reading of XML (eXtensible Mark-up Language) files generated in accordance with different document structures (e.g. DTD""s, schemas).
XML is a mark-up language which is used for transferring structured data. The XML files include a number of mark-up tags (e.g. xe2x80x9celementsxe2x80x9d, xe2x80x9cattributesxe2x80x9d and xe2x80x9centitiesxe2x80x9d etc.) which are associated with respective data. Each respective mark-up tag has a specific meaning within the context of the particular XML file and this allows third parties to determine the nature of the data associated with the respective mark-up tags.
The mark-up tags associated with an XML file are defined by the XML document type definition or schema. This leads to a large number of different file formats being available, which in turn leads to many problems with the extraction and storage of data from the files.
Currently, data is extracted from XML files by parsing the file to locate specific mark-up tags and then extracting and storing the corresponding data accordingly. However, a respective data loader must be provided for each specific type of XML file. As a result a large number of different data loaders can be required for extracting and storing data from a number of different XML files.
As the use of electronic information interchange increases rapidly, users are now desiring that data is transferred to and from the systems in a variety of XML formats. Accordingly, it is desirable to be able to process different XML file formats using a single piece of software.
In accordance with a first aspect of the present invention, we provide a method of processing data files, the data files being generated in accordance with different protocols, each protocol defining a number of mark-up tags and each data file including a number of respective mark-up tags, each mark-up tag having respective data associated therewith, at least some of the data files also including a protocol definition indicating the protocol used to generate the data file, wherein the method comprises storing the data contained in a data file by:
a. receiving the data file;
b. determining the protocol definition;
c. using the protocol definition to determine storage location(s) or processing action(s) for each of the mark-up tags of the data file;
d. using the protocol definition to determine contextual information for each of the mark-up tags of the data file;
e. extracting the marked up data contained within the data file; and,
f. storing or processing the data in accordance with the determined contextual information and at least one of the storage location(s) or the processing action(s).
In accordance with a second aspect of the present invention, we provide apparatus for processing data files, the data files being generated in accordance with different protocols, each protocol defining a number of mark-up tags and each data file including a number of respective mark-up tags, each mark-up tag having respective data associated therewith, at least some of the data files also including a protocol definition indicating the protocol used to generate the data file, the apparatus comprising:
a. a processor; and,
b. a store, the processor being adapted to storing data contained in a data file by:
i. receiving the data file;
ii. determining the protocol definition;
iii. using the protocol definition to determine storage location(s) or processing action(s) for each of the mark-up tags of the data file;
iv. using the protocol definition to determine contextual information for each of the mark-up tags of the data file;
v. extracting the data contained within the mark-up tags of the data file; and,
vi. storing or processing the data in accordance with the determined contextual information and at least one of the storage location(s) or the processing action(s).
Accordingly, the present invention provides a method and apparatus for storing data contained in a data file, and in particular a structured data file. The system operates by examining the data file to determine a protocol definition which is then used to determine contextual information and storage location(s) (also referred to as locational information) or processing action(s) for each of the marked-up tags in the data file. The data associated with the marked-up tags is then stored or processed in accordance with this contextual information and the storage location(s) or processing action(s), as appropriate. As the protocol definition is different for each different type of data file, different mark-up tags contained in different types of data file can resolve to the same contextual information and storage location(s) or processing action(s) allowing the data to be stored or processed in the same way irrespective of the type of data file.
If the data file does not include a protocol definition, then the method of determining contextual and locational information for each of the mark-up tags of the data file typically comprises parsing the data file to locate the mark-up tags and thus generating a protocol definition for use with this file, considering each mark-up tag and the data associated therewith to determine contextual and locational information for each of the mark-up tags. Accordingly, if no protocol definition can be generated to determine contextual information, it is then necessary to look at each of the mark-up tags and the data contained therein to determine the contextual and locational information directly.
Typically, when no protocol definition is available or can be generated the contextual and locational information is determined by requesting the input of contextual and locational information from an external source, such as the user. Thus, the user of the apparatus and/or method would examine the data and the mark-up tags and use their own knowledge of the database receiving system and the way in which data is stored therein to determine the contextual and locational information appropriate to the given data.
Once this has been completed, a protocol definition can then be defined for the mark-up tags for which contextual and locational information has been derived. This can then be used in subsequent processing of data files.
In accordance with a third aspect of the present invention, we provide a method of processing data files, the data files being generated in accordance with different protocols, each protocol defining a number of mark-up tags and each data file including a number of respective mark-up tags, each mark-up tag having respective data associated therewith, at least some of the data files also including a protocol definition indicating the protocol used to generate the data file, wherein the method comprises generating a data file by:
a. determining the protocol definition of the protocol to be used;
b. locating the data to be incorporated into the file, the data being stored in accordance with contextual and locational information;
c. using the protocol definition and the contextual information to determine the mark-ups with which the data should be associated;
d. generating a data file by associating the data with respective mark-up tags in accordance with the contextual and locational information.
In accordance with a fourth aspect of the present invention, we provide apparatus for processing data files, the data files being generated in accordance with different protocols, each protocol defining a number of mark-up tags and each data file including a number of respective mark-up tags, each mark-up tag having respective data associated therewith, at least some of the data files also including a protocol definition indicating the protocol used to generate the data file, the apparatus comprising:
a. a processor; and,
b. a data dictionary, the processor being adapted to generate data files by:
i. determining the protocol definition of the protocol to be used;
ii. locating the data to be incorporated into the file, the data being stored in accordance with contextual information;
iii. using the protocol definition and the contextual and locational information to determine the mark-up tags with which the data should be associated;
iv. generating a data file by associating the data with respective mark-up tags in accordance with the contextual and locational information.
Accordingly, the present invention also provides a method and apparatus for generating the data files. This is achieved by using stored protocol definitions, and the corresponding locational information of the data to be incorporated into the file, to determine the respective mark-up tags which should be used for generating the file and the data associated with them for a user determined context.
Typically the first, second, third and fourth aspects of the present invention use the protocol definition to access a data dictionary, the data dictionary including an indication of the locational or processing information for each of the mark-up tags. This provides a simple method of locating the locational information for each of the mark-up tags in the particular data file.
However, alternatively for example, the locational information may be associated with a data file itself, for which a corresponding protocol definition is available.
Typically in this case the data dictionary also includes an indication of the contextual information for each of the attributes as well as an indication of the specific data associated with each of the entities. However, it is not essential that each attribute and entity is defined as any undefined attribute and entities can simply be determined by requesting the information from the user of the system.
As will be understood by a person skilled in the art, the protocol is typically an extensible Mark-up Language (XML) with the protocol definition being a document definition type or schema. However, the system may also apply to other markup languages such as SGML, HTML or the like.