The present invention relates to a method and system for converting digital publications files into digital data, and the use of that data to generate a display on a computer system. Aspects of the invention relate to an information display system and more particularly to an information display system which provides for the simultaneous display of a graphical representation of a printed publication, or part of a publication, and text data appearing in the printed publication.
In today""s society, particularly in the business community, it is a necessity to receive published information as quickly as possible. This is especially important for financial information. Thus, the desire to provide such information in electronic form has expanded rapidly in recent years.
In the United Kingdom, there are a number of suppliers of news information delivered electronically or on-screen or other media consumption. These can be segmented into a number of categories:
(a) an electronic text feed of general and specific news items, and data where the only structure consists of headers detailing news category orders (e.g. Press Association);
(b) an electronic text feed of news items addressing specific market sectors (e.g. Extel Finance);
c) an electronic text feed (not in real time) providing the textual information contained in previously published material. This information provided for archival and search activity as a primary facility (e.g. FT Profile).
The common component of these information provisions is their emphasis on editorial quantity, leaving the editorial and sub-editorial functions to the consumer. Essentially they are providers of a raw material to be used by the customer base as one of their ingredients for the production of their products, or as data for customers to filter to generate information for their own internal or external use. Thus, with this vast quantity of raw data provision with no relative importance attached to each of the individual news items, the user is forced to sift through irrelevant and/or unimportant information to discover their requirement. Additionally, the feeds are, in general, specifically objective rather than subjective.
A further disadvantage of this method of supplying information is that only text information can be provided. Although this text may be searchable or processable, as opposed to a graphical image or microfiche of the publication, it contains less information than the publication. In particular, editorial, information is lost. The foregoing problems of prior art information systems manifest the need for improvement. Specifically, there is a need for an information display system that can make use of information provided in publications such as newspapers and magazines in real time thereby benefiting from the editorial experience of the publishers. Furthermore, since a great deal of information can be obtained from the editorial layout of the publication, the foregoing need can be greatly enhanced by the provision of a simultaneous image of the actual publication together with the actual text in the clear and legible form.
The present invention provides a screen based information display system which utilizes both the graphical images of pages of a printed publication as well as its text data. The present invention allows for the simultaneous display of an image of the pages of a publication and text data. It is not sufficient merely to provide a readable image of the pages of the publication as this only provides a microfiche representation. Whereas this allows the user to read the text, it does so at a representational level which does not give the overview perspective. The user xe2x80x9ccannot see the wood for the treesxe2x80x9d, is a realistic analogy. The purpose of providing a simultaneous image of the publication is to allow the user to interpret the editorial importance that has been attached to articles, thereby allowing the user to benefit from the editorial experience or the publishers, as well as giving immediate access to the edited text.
The present invention allows for a user to select a passage of text composing an article or story on the displayed page of the publication whereby the system of the present invention will simultaneously display the text of the passage adjacent to the image of the full page of the publication. This allows the user to clearly read the article if desired in view of the small size of the image of the page of the publication the text is not clear and therefore it is highly advantageous to provide a clear copy of the text separately. The provision of the text separately also allows for further advantages of the present invention including allowing for identifier words such as company names to the clearly seen e.g. highlighted. The present invention provides for further information on the identifier word e.g. company information to he displayed, by the selection of the identifier word. The further information e.g. company reports, can then be displayed simultaneously with the image of the page of the publication.
A further feature of the present invention is that a list of contents of the pages of the publication can be displayed, wherein the list of contents for each page are displayed such that the passages of text (articles or stories) are listed in the order of importance which can be attached to them by the way in which they are formatted on the page of the publication by the editors. Thus, the list of contents for the publication provided by the present invention provides for an easy means for the important passages in the publication to be identified by a user. When a particular passage is identified which the user wishes to read, this can be selected and the text displayed along with the image of the page of the publication from which the text is taken.
The present invention is particularly applicable to business and financial publications such as newspapers. For example, in the United Kingdom, the London Evening Standard is published five times during a day with the financial information in each publication being updated. Electronic data or each publication can be obtained rapidly from the publisher thereby allowing the information display system or the present invention to be updated rapidly in response to each new edition. The present invention thus removes the need for financial institutions to have to purchase multiple hard copies or the newspaper. Instead, the information can be provided electronically over a network to as many users in the institution as is required. Furthermore, the information provided is in a far more user friendly form than the original hard copy and reaches the user rapidly, even where the publication is printed some distance from the desired user, e.g. overseas.
According to a first embodiment there is provided a computerized method of generating an information display from an input of publication files containing text, graphics, and other data viewable as page images of a publication having stories (text passages) and graphics images appearing herein, comprising the steps of: extracting text data from the publication files corresponding to stories appearing in the page images of the publication, and maintaining them as text data files; processing page images from the publication files and maintaining them as page image files; mapping story areas for respective stories appearing in the page images and indexing each story area to a text data file corresponding to the text passage in the story area, and maintaining the mapped story areas as image map files; and generating a display or a computer system of page images using the page image files, and linking the stories in the story areas of the displayed page images to the corresponding text data using the text data images and image map files.
According to a second embodiment there is provided a computerized method of generating an information display from an input of publication files containing text, graphics, and other data viewable as page images of a publication having stories (text passages) and graphics images appearing therein, comprising the steps of: extracting text data from the publication files corresponding to stories appearing in the pace images of the publication, and maintaining them as text data files; parsing the text data to find predetermined keywords appearing therein, indexing each keyword to a page number and a story number for the story corresponding to the text passage in which the keyword is found, and maintaining the indexed keywords on a keyword list; processing page images from the publication files and maintaining them as page image files; generating a display on a computer system of the keyword list, and displaying the page image containing the story in which a selected keyword appears when the keyword is selected from the keyword list.
According to a third embodiment there is provided a computerized method of generating an information display from an input of publication files containing text, graphics, and other data viewable as page images of a publication having stories (text passages) and graphics images appearing therein, comprising the steps of: extracting text data from the publication files corresponding to stories appearing in the page images of the publication, and maintaining them as text data files; processing page images from the publication files and maintaining them as page image files; assigning to each story appearing on a page of the publication a cage number on which the story appears, and a story number ranking corresponding to the relative importance of the story to other stories on the page; indexing the text data files to the page numbers and story number rankings for the corresponding stories appearing in the page images of the publication; generating a display on a computer system of a page image using the page image files, and a side-by-side display of a list of story titles for the stories appearing on the displayed page ranked in order of their assigned story number rankings.