Embodiments of the invention relate to refreshing a full-text search index in a partitioned database.
Efficient searching of textual data is useful in database systems that store structured, semi-structured, and unstructured data. Full-text search indexing and full-text search are used to search out relevant information stored in the database. A full-text search index may be described as a list of search terms that is built by scanning the text of all the documents. A full-text search uses the full-text search index to try to match search criteria in a search request provided by a user. To minimize impact on transactions in the database, when the data source changes, maintenance of a full-text search index is generally done via index update and refresh operations that are separate from the operation that modifies the data source.
Large data volumes and heavy workloads lead to an increased use of partitioning database environments that involve splitting a database into multiple logical or physical partitions to provide a scalable solution.
Multi-partition database tables can be located in one or more partitions of the database, with data distributed based on a hashing function. In such partitioned tables (i.e., table partitions), some of the table data rows are stored in one partition, while other table data rows are stored in other partitions.
A partitioned table (i.e., a data source) might host multiple, independently managed, text search indexes to facilitate full-text search. For each of the text search indexes, a full-text search indexing system may index the data in a single text search index that contains the data for all partitions or, alternatively, use distinct, physical full-text search indexes on separate partitions that are then logically combined to give a unified view of the system.
A change in the number of partitions used by the data source may result in redistribution of the table data rows changing the partitions in which the table data rows are stored.
A non-partitioned full-text search index that does not include partition information is not impacted by such a data redistribution. However, such an index may have increased search times due to missing selectivity and may have longer index update times on the single index because of reduced parallelization opportunities and longer-running, more frequent merge operations.
To improve selectivity, appropriate partitioning metadata may be kept in the text index. Alternatively, to provide parallelization opportunities for increased performance and scalability, the text index may be split into multiple text index partitions that match the data table partitioning.
In this case, the partitioning metadata will have to be updated after a redistribution of table rows. If this type of a change is not accounted for, a query may return invalid search results because the full-text search index for the table data rows on a partition contains incorrect information about (1) rows that have moved to a different partition or (2) rows that were moved to that partition from another partition.
One approach is to mark the full-text search index invalid whenever a base table has its data redistributed after a change in the number of partitions. A user might either be notified immediately or the next time the full-text search index is referenced that a refresh of the full-text search index is required.
Another approach to ensure consistency is to enforce that operations that affect the data distribution in a multi-partition database can only proceed if they include a refresh of the text search index.