Document search in digital libraries, the Internet, and organizational intranets is best served by a combination of metadata processing and content searching. Searchers often rely on content if metadata is absent, erroneous, or incomplete. Metadata-based searches have their own unique challenges. For example, large legacy collections combined with budgets insufficient to permit complete and consistent tagging may mean that metadata associated with the documents of such collections is often limited or non-existent. Furthermore, the wide variety of document types and processing approaches result in non-standardized ways of using metadata to assign properties to documents. Not only may different content generators use different types of properties, but they may use completely different properties (e.g. author, expiration date, version, and so on). On the other extreme end of the spectrum, some or all of the documents may be catalog records consisting entirely of metadata (e.g. in museums, libraries, or repositories).
Often for reasons of economy or practicality, a service platform that provides customers with the service of searching sets of documents that have been annotated with metadata properties may not be able to dictate what schema of metadata the customer should use. In order for the service platform to support multiple customers with a reasonably sized physical implementation, it is desired for the service to be able to combine documents from different customers, and thus with different metadata schemata, in a single search-engine index without loss of data.