An information retrieval system, for example, a search engine or a question answering system retrieves, according to a query input by a user, related content required by the user. The query input by the user may include a part of words and phrases that do not have actual meanings but occur frequently. Those words and phrases are also referred to as stop words. To improve retrieval efficiency and accuracy, the information retrieval system needs to identify the stop words in the query, and remove this part of stop words from the query to obtain a keyword of the query. Then the information retrieval system performs matching according to the acquired keyword to acquire the related content required by the user.
As the information retrieval system is widely available and intelligent, more users perform searches by inputting queries in natural and semi-natural linguistic manners. Therefore, higher requirements are imposed on stop word identification capabilities of the information retrieval system. In the prior art, stop word identification is implemented mainly depending on a stop word list that is manually compiled in advance by experts in the field of words and phrases. However, manually compiling the stop word list causes high production costs; in addition, the method of identifying a stop word in an input sentence merely by matching with the stop word list also cannot adapt to increasingly complicated user search behaviors.