Keyword based text searching is ubiquitous. For example, Google utilizes a fairly typical keyword-based search technology (brilliantly implemented to scale). Keyword searches find all the documents matching some pattern of text (e.g., usually the presence of some collection of words) and returns a list of documents sorted by “value.” In the case of Google, the corpora is that part of the world-wide-web accessible to their crawlers, and the value function makes use of massive human effort in choosing which pages to link, and likely by keeping track of which pages people select when a given search is completed. One difficulty with such searches is they do not make any use of the intrinsic contents of the corpus (if Google did not have access to all this human evaluation, the search would be poor), and the results are returned as a (usually very long) list.
Keyword search can be improved by having thesauri augment the set of words, but if the corpus is private, the value function is necessarily very poor.