In the operation of computer software applications, oftentimes a need arises to store and share tabular data. For present purposes, tabular data includes collections of data values organized in rows and columns. Each element or table cell of such a collection represents an individual data value and is formed using a sequence of bytes. Such data values may be alphanumeric (i.e., text), in which case the contents of the bytes making up the value are limited to a subset of the possible values that bytes may store. Alternatively, such data values may be binary (image, sound, program etc), in which case the bytes of the value may store any possible value. Conventionally, however, it is not possible to store both alphanumeric data values and binary data values in the same tabular data collection.
One of the reasons for tabular data collections to not include both alphanumeric data values and binary data values relates to the need of computer software applications using such tabular data to parse such a hybrid collection. Parsing the tabular data collection, usually a sequential process involving analyzing or separating the data into more easily processed components, permits precise data values retrieval.
In addition, the physical format storing tabular data must support random access to the data. Unfortunately, stored binary data may be quite voluminous, making the act of parsing all stored binary data difficult and tedious. Moreover, for many uses of binary data, no need exists to parse all of the data for its retrieval. For many binary data values sequentially accessing the data may result in high retrieval times, especially when the tabular data collection includes large binary values. At the same time, inconsistent retrieval times may arise, depending on the position of the value in the collection.
There is no known data format that supports all these requirements. A format known as the “comma separated values” or CSV format attempts to support rapid sequential parsing of tabular data format. Unfortunately, such a format is more suitable to store only alphanumeric data and fails to permit easily the parsing of binary data together with alphanumeric data. Furthermore, the CSV format requires sequential data access and does not provide for meta-information storage or use.
Accordingly, there is the need for a method and system that permits the flexible storage and unequivocal retrieval of both binary and alphanumeric data, which also supports random access to the binary data.
There is a need for a self-descriptive data format that supports the storage of information about the data together with the data itself. Such a format may reduce the level of dependency of particular software applications residing in associated computing platform or network for using such data.
Still further a need exists for a method and system for communicating both alphanumeric and binary data together with associated meta-information, in an open and flexible single package or format that changes according to the dynamics of particular use situation or programming environment.
Finally, in order to transfer such a tabular collection of alphanumeric and binary data values together with its meta-information between computer software applications (potentially over computer networks), the physical storage of all data and meta-information has to be a standalone structure.