A collection of data that is extremely large can be difficult to search and/or analyze. For example, in the case of the Web, a large fraction of the data is unstructured and value is locked in the data itself. It is not enough to store the web page of a service provider. For this information to be useful, it needs to be understood. A string of digits could be a model number, a bank account, or a phone number depending on context. For instance, in the context of a ski product, the string “Length: 170,175,180 cm” refers to 3 different ski lengths, not a ski length of 1700 kilometers. An incorrect interpretation of the data may result in useless information.
As an example, if a user enters the two words “mtor” and “stock” into an Internet search engine, and the results largely consist of web pages related to the drug mTor, the search engine has failed to recognize the search as a stock quote query. As another example, if a user enters the two words “seattle” and “sushi” into an Internet search engine, and the results largely consist of web pages related to hotels in Seattle, the search engine has failed to recognize the search as a restaurant query. While Internet search engines often do a reasonable job for head queries and documents, the accuracy quickly falls off in the tail because the information is not automatically understood by the search engines.