There follows a glossary of conventional terms. The meaning of terms is generally known per se and accordingly the definitions below are provided for clarity and should not be regarded as binding.
Glossary of Terms
    Data—Information that one wants to store and/or manipulate.    Database—A collection of data organized by some set of rules.    Attribute—A feature or characteristic of specific data, represented e.g. as “columns” in a relational database. A record representing a person might have an attribute “age” that stores the person's age. Each column represents an attribute. In XML (XML is defined below), there is an “attribute” that exists as part of a “tag.”    Column—In a relational database, columns represent attributes for particular rows in a relation. For example, a single row might contain a complete mailing address. The mailing address would have four columns (“attributes”): street address, city, state, and zip code.    Record—A single entry in a database. Often referred to as a “tuple” or “row” in a relational database.    Tuple—See “record”    Row—See “record”    Table—See “relation”    Relation—A way of organizing data into a table consisting of logical rows and columns. Each row represents a complete entry in the table. Each column represents an attribute of the row entries. Frequently referred to as a “table.”    Relational database—A database that consists of one or more “relations” or “tables”.    Database administrator—A person (or persons) responsible for optimizing and maintaining a particular database    Schema—The organization of data in a database. In a relational database, all new data that comes into the database must be consistent with the schema, or the database administrator must change the schema (or reject the new data).    Index—Extra information about a database used to reduce the time required to find specific data in the database. It provides access to particular rows based on a particular column or columns.    Path—A series of relationships among data elements. For instance, a path from a grandson to grandfather would be two steps: from son to father, and from father to grandfather.    Structure—The embodiment of paths in particular documents or data. For example, in a “family tree,” the structure of the data is hierarchical: it is a tree with branches from parents to children. Data without a hierarchical structure is often referred to as “flat.”    Query—A search for information in a database.    Range query—A search for a range of data values, like “all employees aged 25 to 40.”    I/O—A read from a physical device, such a fixed disk (hard drive). I/Os take a significant amount of time compared to memory operations: usually hundreds and even thousands of times (or more) longer.    Block read—Reading a fixed sized chunk of information for processing. A block read implies an “I/O” if the block is not in memory.    Tree—A data structure that is either empty or consists of a root node linked by means of d (d≧0) pointers (or links) to d disjoint trees called subtrees of the root. The roots of the subtrees are referred to as “child nodes” of the root node of the tree, and nodes of the subtrees are “descendent nodes” of the root. A node in which all the subtrees are empty is called a “leaf node.” The nodes in the tree that are not leaves are designated as “internal nodes.”
In the context of the invention, leaf nodes are also nodes that are associated with data.
Nodes and trees should be construed in a broad sense. Thus, the definition of tree encompasses also a tree of blocks wherein each node constitutes a block. In the same manner, descendent blocks of a said block are all the blocks that can be accessed from the block. For detailed definition of “tree,” also refer to the book by Lewis and Deneberg, “Data structures and their algorithms.”    B-tree—A tree structure that can be used as an index in a database. It is useful for exact match and range queries. B-trees frequently require multiple block reads to access a single record. A more complete description of B-trees can be found on pages 473-479 of The Art of Computer Programming, volume 3, by Donald Knuth (©1973, Addison-Wesley).    Hash table—A structure that can be used as an index in a database. It is useful for exact match queries. It is not useful for range queries. Hash tables generally require one block read to access a single record. A more complete description of hash tables can be found on e.g. pages 473-479 of The Art of Computer Programming, volume 3, by Donald Knuth (©1973, Addison-Wesley).    Inverted list—A structure that can be used as an index in a database. It is a set of character strings that points to records that contain particular strings. For example, an inverted list may have an entry “hello.” The entry “hello” points to all database records that have the word “hello” as part of the record. A more complete description of inverted lists can be found on e.g. pages 552-559 of The Art of Computer Programming, volume 3, by Donald Knuth (©1973, Addison-Wesley).    Semi-structured data—Data that does not conform to a fixed schema. Its format is often irregular or only loosely defined.    Data mining—Searching for useful, previously unknown patterns in a database.    Object—An object is some quantity of data. It can be any piece of data, a single path in a document path, or some mixture of structure and data. An object can be a complete record in a database, or formed “on the fly” out of a portion of a record returned as the result of a query.    Markup—In computerized document preparation, a method of adding information to the text indicating the logical components of a document, or instructions for layout of the text on the page or other information which can be interpreted by some automatic system. (from the Free On-Line Dictionary of Computing)    Markup Language—A language for applying markup to text documents to indicate formatting and logical contents. Mark up languages are increasingly being used to add logical structure information to documents to enable automated or semi-automated processing of such documents. Many such languages have been proposed, ranging from generic ones such as SGML and XML, to industry or application-specific versions.    SGML—A specific example of Markup Language, Standard Generalized Markup Language. SGML is a means of formally describing a language, in this case, a markup language. A markup language is a set of conventions used together for encoding texts (e.g., HTML or XML).    XML—A specific example of Markup Language eXtensible Markup Language. A language used to represent semi-structured data. It is a subset of SGML. XML documents can be represented as trees.    Key—An identifier used to refer to particular rows in a database. In the context of relational database, keys represent column information used to identify rows. For instance, “social security number” could be a key that uniquely identifies each individual in a database. Keys may or may not be unique.    Join—A method of matching portions of two or more tables to form a (potentially much larger) unified table. This is generally one of the most expensive relational database operations, in terms of space and execution time.    Key search—The search for a particular value or data according to a key value. This search is usually performed by an index.    Search—In the context of data, searching is the process of locating relevant or desired data from a (typically much larger) set of data based on the content and/or structure of the data. Searching is often done as a batch process, in which a request is submitted to the system, and after processing the request, the system returns the data or references to the data that match the request. Typical (yet not exclusive) examples of searching are the submission of a query to a relational database system, or the submission of key words to a search engine on the World Wide Web.    Path search—The search for a particular path in the database. A “path” is a series of relationships among data elements. For instance, part of an invoice might have the “buyer,” and that buyer has an “address” on the invoice. A search for the address of all buyers is really for the path “invoice to buyer to address.” This is a search for a particular structure, which is different from key search (the search for particular values). Path search and key search may be combined.    Browsing in the context of data, browsing is the process of interactively locating relevant or desired data by wandering or navigating through a (typically much larger) set of data. Browsing can be done based on data content, structure, or a combination of these. A common example of browsing is the traversal of hyperlinks in the World Wide Web in order to locate relevant web pages.    Access—In the context of data, access is the process of obtaining data, typically through searching, browsing, or through following references.    Sibling—Elements of a tree that share the same parent are siblings. This is the same sense as brothers and sisters are siblings.    Tag—An XML tag represents structural information in an XML document. A tag may or may not surround data and may or may not contain other tags. All tags have a parent, except the first tag. Additionally see “markup.”    Parent-child—In a tree, a child is an element that branches from its parent. In XML, if “tag1” immediately surrounds “tag2,” then “tag1” is the parent of “tag2.” “Tag2” is the child of “tag1.”    Token—A short pattern used to represent another pattern.    Complete-key indexing—An indexing method that stores the key as part of the index. This provides an exact “hit or miss” result when using the index, but is very large when the keys are large. This is contrasted with a “compressed-key indexing.”    Compressed-key indexing—A compressed-key index does not store the entire key in the index, thus can be significantly smaller than a complete-key index (for the same keys). However, it may provide “false positives” (that can be removed later). It should not miss relevant records (“false negatives”). This is contrasted with a “complete-key indexing.” A Compressed-key indexing is described e.g. in U.S. Pat. No. 6,175,835.    Encoding—Transforming one representation into a different, equivalent representation. For example, representing the Roman numeral “VII” as the decimal number “7” is a form of encoding.    Sibling Order—Semi-structured data stored in files have a specific “order” associated with the data. In a race, finishers are ordered based on their order of appearance across the finish line: “first,” “second,” “third,” etc. With semi-structured data, siblings can be ordered by their appearance in the document.    Semantic information—“Of or relating to meaning, especially meaning in language.” (The American Heritage® Dictionary of the English Language, Third Edition, © 1996, 1992 by Houghton Mifflin Company) The difference between the word “orange” used to represent a color and the word “orange” to represent a fruit is a “semantic” difference. “Semantic information” is information about the meaning of tags and data.    Syntactic information—Syntax is the study of the rules whereby words or other elements of sentence structure are combined to form proper sentences. “Syntactic information” in semi-structured data represents the tags and data, without information regarding the meaning of the tags and data.    Homonym—A word that is used to designate several different things. The word “bow” represents a stringed weapon, the front of a ship, and a loop of ribbon, among other things. When used with more than one semantic meaning, “bow” would be an example of a homonym.    Synonym—A word having the same or nearly the same meaning as another word in a language. Words like “top,” “peak,” and “apex” are synonyms in English.