Webpages may be identified and indexed based on content contained within the webpage. However, a given webpage may contain both user relevant content as well as user irrelevant content including content that is not rendered to a user accessing the webpage (e.g., backend content). Identifying and/or indexing a webpage based on the irrelevant or backend content may result in a poor indexing and, subsequently, ineffective utilization of the identification or indexing. For example, a webpage containing the text “40 inch television” within an advertisement may be indexed as containing information corresponding to 40 inch televisions even when the actual content of the webpage does not correspond to televisions. Additionally, header and footer information of a webpage may be indistinguishable from the content of a webpage and may contribute to identifying and/or indexing a webpage inefficiently.