This invention pertains to a domain-independent database system (DBS) to store, retrieve and manipulate arbitrary arrays in databases. The term array is understood here in a programming language sense: A fixed-length or variable-length sequence of similarly structured data items (cells, in computer graphics and imaging often called pixels) addressed by (integer) position numbers. The cells themselves may be arrays again, so that arrays of arbitrary dimensions can be constructed.
DBSs have a considerable tradition and play an important role in the storage, retrieval and manipulation of large amounts of data. To this end, they offer means for flexible data access, integrity and consistency management, concurrent access by a plurality of users, query and storage optimization, backup and recovery, etc. For communication between database application (client) and database program (server) various techniques exist, which, for example via function libraries (Application Programmer's Interface, API) which are tied to the application, effect a network data communication including a data conversion invisibly for the application.
However, these advantages are, as yet, only available for data items such as integers and strings, as well as, for some while now, for so-called long fields or blobs (binary large objects), i.e. byte strings of variable length. As for general raster data (for example audio (1D), raster images (2D) and video (3D)), current practice is to regard them as bit strings and to map or project them onto linear blobs; for example in Meyer-Wegener, K. Lum, W. and Wu, C., Image Management in a Multimedia Database, Proc. Working Conference on Visual Databases, Tokyo, Japan, April 1989, Springer 1989, pp. 29-40, the authors first state that "raw image data form a matrix of pixels" and then conclude that "the raw data appear (in the database) just as a string of bits".
Due to this loss of semantics through the FORTRAN-type linearisation in the database, raster data can only be read or written as a whole or in a line-by-line fashion. Raster structures cannot be given in search criteria and cannot be processed by the DBS, for example to extract only relevant parts. Moreover, the application must choose one of a plurality of existing data formats for encoding and compression. This choice is compulsory for all other database applications having access thereto, and they alone must ensure the correct decoding and the decompression. Thus, the database application programmer is burdened with a multitude of low-level (near-to-machine), repetitive, error-prone and time consuming programming tasks.
Furthermore, linearisation of arrays in secondary and tertiary storage destroys the neighboring relationships (locality) between array elements, as shown in FIG. 2. A cut-out, which conveys a high degree of locality on a logical level, is disposed on the background storage in a way which favors only line-by-line access and puts all other access means at a drastic disadvantage. The consequence is an inadequate response time behavior.
A typical system is described in Meyer-Wegener, see id. Raster images are provided in the database as blobs, encoded in one of various possible image exchange formats (for example TIFF or GIF); an additional flag indicates to the application the presently used format. The EXTRA/EXCESS-system, Vandenberg, S. and DeWitt, D., Algebraic Support for Complex Objects with Arrays, Identity, and Inheritance, Proc. ACM SIGMOD Conf. 1991, pp. 158-167, offers an algebra for modeling and querying raster data, but there is no accompanying appropriate storage technique, so that only small arrays (for example 4.times.4-matrixes) can be efficiently managed. Sarawagi, S. and Stonebraker, M., Efficient Organization of Large Multidimensional Arrays, Proc. 10th Int. Conf. on Data Engineering, February 1994, recently proposed a storage architecture for arrays based on tiling (see below), but without a spatial index for access acceleration and without optimizable query support next to the pure cut-out formation. In Baumann, P., On the Management of Multidimensional Discrete Data, VLDB Journal 3(4) 1994, Special Issue on Spatial Databases, pp. 01-444, an approach for raster data management is proposed, which concerns the conceptual as well as the physical level. A general solution for the problem of data independence, i.e. the processing of raster data in a DML or API without knowledge and reference of/to its physical storage structure, encoding and compression, does not exist at the moment. However, from Baumann, P., Database Support for Multidimensional Discrete Data, ISBN 3-540-56869-7, pp. 191-206, it is known to store additionally to the data also an indicator of the compression used.
In imaging, tiling techniques are used for the processing of images, which, due to their size, do not fit into the main memory as a whole (for example, Tamura, H., Image Database Management for Pattern Information Processing Studies; Chang, S. and Fu, K. (eds.), Pictorial Information Systems, Lecture Notes in Computer Science Vol. 80, Springer 1980, pp. 198-227). A tile is a rectangular cut-out of an image. The image is decomposed into tiles, so that these do not overlap and together cover the image (see FIG. 5A). Within a tile, data are stored using a conventional linearisation scheme. Tiling can be advantageously used in order to simulate neighbourhood/vicinity within an array on a linear storage medium, and thus forms an important basis for the present invention.
For the management of geometric data such as points, lines and areas, many spatial indices (geo-indices) exist in order to accelerate access to such data elements in a database, Gueting, R. H., An Introduction to Spatial Database Systems, VLDB Journal 3(4) 1994, Special Issue on Spatial Databases, pp. 357-400. In the present invention, such a geo-index is used for fast tile location.