Information retrieval typically involves two parties. On one hand there are producers of the information to be stored by the information retrieval system. The producers either actively publish the information to the system or let the information system select the information from the producer's source system (e.g. internet search engines work that way). On the other hand there are consumers of the information stored in the information retrieval system. Consumers have a desire to locate information to satisfy their information need. Examples of published and stored information types, include, but are not limited to, digitized documents (e.g. scanned and converted with optical character recognition), electronic message documents (e-mails), images, HTML pages, binary documents such as office documents, and text documents.
Information retrieval systems extract information from a source document and store the corresponding document representation for later retrieval. Consumers of the system formulate queries which are formal statements of information need. The information retrieval system evaluates each query and locates matching document representations, with a varying level of relevance. The corresponding documents are returned to the user. The information retrieval system may also calculate the level of relevancy of each document representation and sort the returned documents according to the respective relevancy level.
One reason for the broad adoption of information retrieval systems is the large amount of digital information available. Technologies such as the public Internet, electronic messaging systems (e-mail), social media networks, and mobile devices allow producers to publish information more easily and thus more frequently. All of this results in an exponential growth of digital information. Many of today's digital information sources emit information continuously. Examples include electronic messaging systems (e-mail) and social media networks. Such sources create a continuous stream of information for the consumers to process. Thus, on the consumer side information overload is a wide spread problem—there is too much information to be processed. Consumers need tools to efficiently navigate through the information available and find the sub-set of information that satisfies their information need. To tackle such digital information streams more efficiently, consumers default to using an information filtering system.