The present invention relates to computer-based searches, and more specifically, to indexing systems. A search engine is an information retrieval system that is designed to find data on either the Internet or an intranet. Indexing forms an important part of a search engine, since the index describes how to store the data in a way that facilitates fast and accurate information look-up.
Currently, both Web data and enterprise data grow rapidly and massively. As a result, the scalability, reliability, performance, query capability and merge factor have become critical for indexing systems. Some of the most popular indexing systems for supporting big data search and analysis today include, for example, rational database indexes and inverted indexes.
Indexing systems allow different ways to configure how data should be indexed, to allow searching on specific datum or faceting certain results over others. This is generally referred as “schema configuration” and has a direct impact on the overall performance of the search process. Another important configuration asset is the target architecture, e.g. using one or more nodes for holding the indexes.
As the skilled person realizes, deciding which indexing system, which schema configuration, and how many nodes are the most appropriate for a dataset, is a challenging task. There are already a few popular indexing systems that accept different schema configurations and offer different application program interfaces (APIs). It is still difficult to evaluate, measure and select an appropriate indexing system for different type of data and queries.
Some of the key challenges include:                Capacity planning for a specific type of indexing system always has many options, and to evaluate these different options is complex and time-consuming work.        How to easily measure performance and query capacity for an indexing system.        How to evaluate different types of schema configuration in an indexing system.        How to evaluate different hardware configurations in an indexing system.        How to recommend specific configuration option for a given dataset description.        How to easily perform existing benchmark to different types of indexing system.        
Some of the drawbacks of existing solutions include:                They only run on query performance tests        They can only run benchmark tests for a specific type of indexing system        They cannot compare different configuration plans for an indexing system        They cannot compare different types of indexes        They cannot recommend appropriate configurations based on a dataset description        
Thus, there is a need for improved techniques for discovering and evaluating indexing systems.