A. Field of Art
This application relates generally to the integration of documents into web pages, and in particular to systems and techniques for preserving a document's original nature and appearance when displaying the document within the pages of a website, and automatically sharing users' reading-related activities on that website across their external social networks.
B. Description of Related Art
Well before the advent of the Internet and the World Wide Web, software developers struggled to display documents on a computer monitor in the form intended by the authors of such documents. Initially, documents displayed on a computer screen were limited to text, with little or no choice of fonts, much less page layout and formatting of any kind. As word processors and other presentation programs evolved, fonts were integrated and other media were added (such as images, animation and even video), along with page layout features for presenting the various components of a document with a particular appearance desired by the document's author. Moreover, documents themselves have evolved well beyond traditional text, to include various different static and interactive media and page layout attributes, and to appear in many different forms, ranging from short emails or blog posts to book previews, news articles and creative writing samples, to long novels or reference books, and almost anything in between.
As the Web gained traction in the early to mid 1990s, an entirely new medium for presenting and distributing documents evolved, and a new type of document was created—namely, the “web page” within a “website” containing a collection of related (and often linked) web pages. This new type of document, employing a document format known as “Hypertext Markup Language” (HTML), also went through a similar evolution to that of traditional documents, initially being limited to text, and soon adding other media, including images, animation, and video, as well as hyperlinks, buttons and various other interactive objects and functionality.
Whether an author initially creates a document as a web page (typically displayed via a program known as a “web browser”) or as a more traditional page-oriented document (i.e., a document that is inherently divided into pages corresponding to static “printable” pages), the author intends for the document to be printed or displayed on a computer monitor with a particular desired appearance. A document's appearance includes a variety of presentation and page layout characteristics, such as the position, size and orientation of various component text, graphic and other static and interactive objects on each page of the document. It should be noted that the nature or functionality of these object types also is generally intended to be preserved, particularly when displayed on a computer monitor.
Of particular importance, however, are the various fonts associated with specific text, which themselves have various attributes, including font type, size, style, etc. Given that most documents consist primarily of text, it is not surprising that the particular fonts employed within a document play a significant role in the document's overall appearance.
Maintaining a document's appearance as it is distributed among different computers and platforms (including its appearance when printed or displayed within a web page) has long been a problem addressed by various software technologies. For example, if a document is created with a particular word processing program and transferred to another computer which does not have access to that program, then the document may not even be accessible on the destination computer, or may only be accessible via another program that displays the document with a modified appearance (e.g., with different fonts or other formatting attributes).
One of the leading solutions to this problem, even pre-dating the Web, is the “portable document format” (PDF) created by Adobe Systems, Inc. The PDF is designed to preserve fonts, as well as page layout and other object and document formatting characteristics, so that documents retain a virtually identical appearance when distributed across computers and platforms, displayed on a computer monitor or printed onto a physical medium, such as paper. For this reason, the PDF has become a widely adopted standard document format for printing and distributing documents across computers and platforms, regardless of which program the document's author used to create the document.
At this point, it is virtually impossible to distinguish the appearance of a document created as a web page (HTML) from that of one created as a more traditional page-oriented document via a word processing, presentation or page layout program. Both can contain various media types, from static text and graphics to animation, video and other interactive objects and functionality, such as hyperlinks, buttons and other controls. Moreover, both can be printed as static pages on physical paper, even though HTML documents are not generally divided into distinct pages unless and until they are printed. Finally, both can be converted into PDF documents so as to retain their intended appearance when printed or distributed among different computers and platforms.
Even PDF documents, however, have been difficult to integrate into web pages, while preserving their intended appearance, due to historical formatting limitations of the HTML format, which traditionally has allowed for the display of only a limited number of fonts. For example, Adobe and others have created programs that display existing PDF documents within a web browser's window. Yet, these programs cause the document to occupy the entire web browser window (along with the controls typically associated with Adobe's “Acrobat” program for displaying PDF documents). In other words, although the PDF document may appear within a web browser's window, it is not truly integrated into another web page; instead it becomes a distinct “web page” of its own. Thus, the author of a web page cannot easily integrate an existing PDF document as part of a web page that includes other web elements or objects, such as text, images, advertisements, etc.
Other approaches to this problem include programs that use Adobe “Flash” (or other programming languages/platforms) to display a PDF document in a distinct window within a web page, preserving the appearance of the PDF document while still allowing for other components of the web page to be displayed within the same web browser window. This approach has a number of disadvantages, however, in that the PDF document is not truly integrated into the web page; instead it remains in a separately controlled window within that web page. For example, a user must scroll through the PDF document separately from the rest of the web page, resulting in the significant inconvenience of having to switch between scrolling through the PDF document and scrolling through the web page. Moreover, the “zoom” level and controls of the PDF document are distinct from those of the web page, often forcing the user to zoom the PDF document to a desired level for reading, but switch to a “global” zoom level to read the other components of the web page (text, images, ads, etc), and then reset the zoom level of the PDF document to continue reading (often while repeatedly readjusting the scrolling positions of the PDF document and the overall web page). In short, the PDF document becomes a separately controllable object that is subservient to the primary web browser controls for the overall web page window, resulting in significant inconvenience to the user.
Other approaches include PDF-to-HTML converters that enable the integration of the PDF document into a web page containing other component elements, but do so by sacrificing the original appearance of the document. For example, they convert the fonts embedded within the PDF document into the limited number of fonts typically made available to a computer's web browser. This approach defeats the primary objective of preserving the author's intended appearance of the PDF document.
Yet another approach involves converting the PDF document into an “image” which preserves its intended appearance while allowing for other components of the web page to be displayed within the same web browser window. To the extent this approach employs a separately scrollable window, it suffers from the same disadvantages as noted above. Even if the image of the entire document is truly integrated into a discrete area of the web page (as opposed to a separate scrollable “sub-window”), this approach, while preserving the appearance of text, does not preserve the nature of the text itself. In other words, the ability to search and recognize the text is sacrificed, which results in a significant loss of functionality Not only are users unable to search through the PDF document, but other programs cannot search through and identify words and phrases within the PDF document, a critical feature for targeted advertising engines.
Google has adopted a variation of this approach with its “Google PDF viewer,” which is integrated into its “Gmail,” “Google Docs” and other programs While each page of a PDF document is still converted into an “image” under this approach, users can search for individual words within the document by virtue of Google's “thin client” approach, which relies upon frequent interaction between the user's web browser and a remote web server.
For example, upon detecting that the user has attempted to select a word by clicking on the portion of the image containing that word, the user's web browser invokes the remote web server, which must parse the page of the PDF document to identify the “text” version of that word (e.g., the individual ASCII characters of the word), which can then be sent to the user's web browser, for example, to highlight the word or permit it to be copied and pasted elsewhere. Moreover, a user can search for words within the document by typing them into the user's web browser, which again must invoke the remote web server to conduct the search on the “text” within the PDF document, and then return the results to the user's web browser.
Yet, this “thin client” approach suffers from a number of disadvantages that result from converting the PDF document into an “image” rather than directly into text (along with the fonts that determine the appearance of that text). For example, the “image” of each page of the document is significantly larger than the corresponding text on that page (even apart from other non-text elements on the page), resulting in an additional delay before each page of the document can be delivered to and displayed by the user's web browser.
Moreover, the frequent server interaction imposes further delays whenever the user interacts with the document, e.g., by scrolling to a new page or selecting or searching for words within the document. Even though the “image” of each page can be “zoomed” with the user's standard web browser controls, the words of the document become distorted when zoomed (as would any bitmapped image of text), causing Google to include a custom “zoom” control to avoid this distortion, but at the expense of further delay due to additional server interaction.
In short, there remains a need for the true integration of PDF and other documents into a web page that preserves the original nature and appearance of the documents (including in particular the original text fonts and the ability to search the text), allows for other components of the web page to coexist within the same web browser window, and enables users to read, interact with and control all components of the web page (including the document) via the controls built into standard web browsers.
In addition to reading a PDF or other document as an integral part of a web page, users may also desire to share their reading-related activities (e.g., viewing, annotating, rating, uploading and downloading documents) with friends or other members of their social networks. Yet, actively choosing to share an activity or behavior is burdensome. For this reason, “passive sharing” is more desirable (i.e., setting predefined sharing preferences, with future behavior resulting in the automatic sharing of such behavior in accordance with those preferences).
While passive sharing is becoming increasingly more common, it has yet to be integrated into the activities or behavior within a website independent of the sharing process itself. For example, the sharing of activities and behavior on a social networking site, such as Facebook, Twitter and MySpace, is integral to the nature of these sites. Sharing messages, high scores of games played on the site and other activities is the very essence of participation in these social networks.
As these social networks have grown exponentially in popularity, even external behavior is now being “passively shared” among members of these social networks. For example, “Blippy” (a service offered via the website, www.blippy.com) enables users to share their “purchasing behavior” (i.e., purchases made anywhere via a credit card, registered at the “Blippy” website) with other members of their social networks. Yet, even Blippy is designed with sharing as an integral component. Users already purchase items with their credit cards, and they already share their activities and behavior on their social networks with other members. Blippy simply connects the two, enabling the passive sharing of this existing external behavior (shopping) with users' existing social networks (e.g., Facebook friends).
As the concept of “passive sharing” increases in popularity, there is a desire on the part of many users to enable their activities and behavior on a website (that are otherwise unrelated to their social networks) to be passively shared among their social networks (even beyond that website).