The present disclosure relates to data processing by means of a digital computer, and more particularly to selection of rows and values from indexes with updates.
Search engines may search large amounts of data in database tables, such as relational tables, to find results. The data in database tables or indexes may be structured business data, where the data is structured in the sense that data may be attributes or key figures which are organized in the table or index, and attributes or key figures may have dependencies. For example, in a table of information, a row may have dependencies among data in the row such that data in each of the columns of the row is associated with other data in other columns of the row.
Data may originate from database tables and be stored in memory as indexes where the data may be compressed using different techniques, one of which may be referred to as dictionary-based compression, which involves generating value identifiers that are stored in lieu of values (e.g., of attributes) in the indexes, and having one or more dictionaries that describe associations between value identifiers and values represented by the value identifiers.
Value identifiers may be numbers that replace attribute values within indexes. For example, attribute values may be long text strings but value identifiers that represent the attribute values may be defined to be as small as reasonably possible in order to minimize the memory resources they consume. For example, a number of bits used to represent value identifiers in a column of an index may be based on a cardinality of the values for an attribute in that column such that a minimum number of bits are used. Value identifiers may be assigned locally to a table such that value identifiers for one and the same value may be different in different tables.
Dictionaries may be ordered lists of row identifiers or value identifiers with corresponding row key values or attribute values beside the identifiers. A dictionary may be local to a table or local to one or more columns. For example, compression of data in accordance with dictionary-based compression may be on a column-by-column basis, where each column of data has one or more dictionaries which are separate from dictionaries of other columns, and columns of indexes store value identifiers that may be non-unique across columns. When a search engine reads a request, the search engine may use dictionaries to look up attribute values contained in the request. When the search engine returns a result set, the search engine may use dictionaries to look up row identifiers and value identifiers in a result set and translate the result set into values such that, for example, a user may understand the result set.
Updates to data may be stored in delta indexes which are indexes separate from a main index. Delta indexes may also be generated on a column-by-column basis and may be compressed using dictionary-based compression.
A record of an index may be updated by another record in a delta index, and both may be viewed as separate records in the sense of being separate database records although both database records may represent a same logical record, where, for example, one logical record is updated with an updated copy of the logical record.
For massive amounts of data, such as a combination of tables containing millions of records, processing of the data, including updating of thousands of records in brief time intervals, may require lots of hardware resources. For example, large amounts of processor resources may be required to re-sort a table to include updates.