Data partitioning is a challenge in many database systems. The challenge results from the need for constant access to data and constant indexing of new content while performing data insertion, deletion, and content and data updates. A database managing content and data must operate and respond to queries, which occur at almost the same time when the content and data are being added, modified, retrieved or deleted. Further, many of these datasets may consist of all electronic files and records in arbitrary formats. A format may exhibit a structure, or a semi-structure, or the dataset may be completely unstructured. Examples of such formatted contents are word processing documents, annotated documents such as XML or HTML, and documents containing free text.
Content searching and data query searching in data sets are time consuming when the data constantly changes. This is due, in large measure, to the maintenance of the inverted indices referring to word, phrase, or other information within the datasets and documents: For example, a relevant word and/or phrase can occur in many locations in the datasets and if the datasets changes, the indices must be updated to reflect the change. Indexing data is computing extensive process and involves reading the datasets and creating the indices correspondingly. Further, constant changes on the datasets fragments the storage device rendering it ineffective over time. However, de-fragmentation is a costly process and involves moving massively data around the storage devices, which is time consuming and may in most cases, necessitate further re-indexing. Whether re-indexing or data management, the current state of art is not effective to the access and delivery of information to the information user and has been a major challenge in database management face.
What is needed is a system and associated method for flexibly indexing and searching a plurality of databases, which may have different structures or no structure, that provides and uses information on each feature (byte, word, phrase, symbol expression, image component or other appropriate information segment), each document that contains the feature, each file that contains the document, each folder that contains the file and/or each database that contains or refers to the folder. Preferably, the indexing scheme should allow extension downstream (e.g., to provide further metadata for the referenced feature) or upstream (e.g., to allow further refinement in referral to databases within the collection). Preferably, the index expression referring to a particular feature should be sufficiently intuitive to provide, at a glance, useful information on the nature of the feature.