Information retrieval (Information Retrieval) refers to a process and technology of organizing and storing information in a certain manner, and finding relevant information according to a requirement of an information user. The information retrieval in a narrow sense only refers to a process of finding required information from an information set, and is equivalent to a so-called information query. Currently, along with the rapid development of the Internet, information on the Internet increases exponentially, and when facing such huge amount of information resources, how to rapidly acquire their required information in a high efficiency is more and more important for people. In order to improve the quality and efficiency of information retrieval for a user, an information retrieval tool having powerful functions, that is, a search engine, may be used. However, the search engine, when bringing about huge convenience for the user, also exposes many defects as a search technology having a key word as a basic index unit. In one aspect, no matter what key word is submitted by the user, excessive results are returned, and information really required by the user only accounts for a small part, so the user has to expend much time in manually filtering the results. In the other aspect, due to a reason of synonyms and near-synonyms, many texts related to a search topic do not completely match a key word input by the user, which causes the search engine to not find these texts. Performing classification and retrieval on information based on topics is an efficient way for solving the foregoing problem, which can solve a problem of heterogeneous and messy information on the Internet to a large extent, thereby shrinking a search space, increasing a retrieval speed, and improving query results.
In the conventional art, during a process of solving for a hierarchical Latent Dirichlet Allocation (hLDA, hierarchical Latent Dirichlet Allocation) model hyper-parameter, for one given text set, firstly a nested Chinese restaurant process (nCRP) prior corresponding to the model needs to be given, the hLDA model hyper-parameter is considered as a constant, a corresponding topic path is acquired for each document through distributed Gibbs Sampling, one corresponding topic is acquired for each word in a document, and finally, a most approximate parameter hLDA model hyper-parameter is calculated according to topic-word and document-topic counting matrices.
However, in the conventional art, the hLDA model hyper-parameter is considered as a constant, and therefore, during the process of solving, a maximum approximation cannot be reached, so a final parameter hLDA model hyper-parameter obtain through solving has low precision, and a solving speed is slow.