Although the origins of the Internet trace back to the late 1960s, the more recently-developed Worldwide Web (“Web”), together with the long-established Usenet, have revolutionized accessibility to untold volumes of information in stored electronic form to a worldwide audience, including written, spoken (audio) and visual (imagery and video) information, both in archived and real-time formats. The Web provides information via interconnected Web pages that can be navigated through embedded hyperlinks. In short, the Web provides desktop access to a virtually unlimited library of information in almost every language.
The Web provides a readily accessible and widely available electronic communications channel to content providers of all types, from advertisers and search engines to individual end users. Web content is presented as a visual medium, which can be complemented by sound, tactile and other forms of non-visual feedback. Visual Web content includes text, images, graphics, video and similar information. The available space on every Web page is finite and both practical and physical limitations can restrict the information presented.
In particular, Web page text is often restricted. The space available for display on a Web page is limited and content providers attempt to make the best use of the space available by limiting the size and amount of text. For instance, Web-enabled cellular telephones and personal data assistant devices have considerably less display space than a full-sized computer monitor. Similarly, advertisements are often subject to strict space limits and are incentived to work within the space restrictions for practical and budgetary reasons. Web content providers frequently charge on-line advertisers for both the space occupied by each advertisement and for the number of times an advertisement is displayed to and selected by end users. Similarly, Web search engines must balance between finding quality search results and being able to only present those search results that will fit on a given Web page. Consequently, Web search engines often rank search results to ensure presentation of the best search results first. Finally, Web pages include columnar and tabular presentation formats respectively including headings and text and individual cells. Heading and cells are inherently limited in the space available and, if necessary, text must be condensed or truncated to fit.
Substantively, quality Web content gets read, yet providing salient and responsive Web content can be difficult. For instance, advertisements are frequently provided with other competing Web-based advertisements and unrelated but distracting content. Relevance and succinctness become particularly important. Product names typically are presented prominently to attract the attention of a user and each word appearing must be carefully selected to maximize user appeal yet conserve available space. Crafting a suitable product name can be particularly problematic for advertisers who have a significant body of advertisements, such as a Web retailer with a large product catalog, and such advertisers may prefer to generate Web-based advertisements through automated means, which draw advertising content from stored advertisement feeds.
Unfortunately, information contained in the stored advertisement feeds tends to be unstructured and of relatively poor quality. Generally, the advertisements are overly wordy and often contain only nouns, adjectives, conjunctions, and prepositions. Improper capitalization often occurs in the description. Consequently, information extracted from the feeds may be unsuitable for mapping directly into standardized Web-based advertisements. Moreover, arbitrarily truncating the product names can result in grammatically improper or nonsensical wording. Other types of information feeds, such as news wires, present similar challenges with respect to editing and condensing the text into quality WEB content usable within available WEB page space.
Therefore, there is a need for an approach to providing text summarization of information provided as WEB content. Preferably, such an approach would enable candidate text, including unstructured content, to fit within a limited space budget while maintaining quality and format.
Therefore, there is a further need for an approach to building standardized advertisements in the form of WEB-based advertising creatives based on information retrieved from advertising excerpts. Preferably, such an approach would identify and summarize information selected from advertising excerpts of stored advertisement feeds in a succinct fashion and relevant to user queries.