Volumes can be digitally stored and hosted by a volume server, enabling the volumes to be easily searched and retrieved for viewing by users. Volumes include, for example, digitized books and magazines. The volumes are searched based on a user query, and the volumes that best match the query are displayed to the user.
The volumes can be crawled and indexed for inclusion in search results. While the content of a volume may be easily obtained by a crawler, it is often difficult to determine semantic information for the volume. For example, it is difficult to determine whether particular content (e.g., a page of a volume stored as a graphical image) is a table of contents, index, or synopsis. It is possible to use heuristics or machine learning techniques to obtain the semantic information, but such techniques can introduce errors that decrease the quality of the information and in turn decrease the quality of the search results provided to the user. Semantic information for the volume is also useful for other purposes, such as properly displaying the volume to users.
Accordingly, there is a need in the art for a way to better determine the semantic information of a digitally hosted volume for improved searching and display of the volume.