1. Field of the Invention
The present invention relates to systems for browsing documents and in particular to methods and systems for using a web browser to quickly search large collections of documents such as arbitrary text documents.
2. Discussion of Related Art
It is common to use a computer to assist a user in browsing through large collections of documents. For example, patent attorneys and patent examiners frequently review large patent documents or collections of related patent or legal documents. Or, for example, computer programmers frequently browse large files of computer source language programs or collections of related source language programs. Computers are applied to assist in such situations to improve, in particular, the speed of searching for symbols or keywords in the collection of documents. Manually searching large collections of documents can be extremely cumbersome and unproductive.
Text editors or word processors on computer systems are known to allow such browsing by simple sequential paging or scrolling through the documents or by search capabilities to locate particular words or phrases. However, such known techniques typically do not use indexed searching techniques to locate desired search terms in the document(s). Indexed searches are those which use an index to rapidly locate occurrences of a particular symbol or keyword in the text. Rather, simple linear search techniques are most commonly utilized by known text editor or word processing techniques. Such simple linear search techniques are impractical when scaled up to very large collections of documents. Simple, non-indexed search techniques cannot provide adequate performance when used in very large collections of documents.
For example, a team of programmers may need to rapidly search for related terms or phrases in the collection of source code files which implement an operating system. One such operating system, by way of example, comprises over 14,000 directories including 70,000 files totaling over 40,000,000 lines of source code. Simple, non-indexed search techniques are inadequate for such large collections of files.
To aid in browsing applications for computer programmers, source code browser programs are often included in program development environments (e.g., in computer aided software engineering (CASE) toolsets). Source code browser programs are usually tightly coupled to the underlying program development package and therefore are only operable in conjunction with the corresponding tools. However, source code browsers do not in general provide browsing service for arbitrary text documents outside the context of the program development tools. Furthermore, they are often constrained by the underlying databases which control the operation of the program development toolset. The databases which contain design information regarding a software development xe2x80x9cprojectxe2x80x9d often cannot handle such large collections of files as noted above. Lastly, different source code browser programs each provide a unique user interface potentially forcing a user to learn a proprietary user interface in order to scan collections of documents.
It is a particular problem to display a large text document and quickly jump to a region of interest as represented, for example, by a line number. Present solutions for text browsing access files in a generally sequential manner from first line through last line. To display a region of interest that is not near the start of the file of text requires that the browser sequence through other lines of text at the start of the file to locate the region of interest to the requesting user. Requiring the text browser to sequence through all lines of text in the file before displaying the user""s requested line slows the perceived responsiveness of the system.
It can be seen from the above discussion that a need exists for a text search capability that is efficient at searches of large collections of documents and in particular rapidly displays the user""s region of interest in the text files.
The present invention solves the above and other problems, thereby advancing the state of the useful arts, by providing a system and associated methods for using a text browsing (viewing) client/server system to improve perceived performance of the system. More specifically, the present invention provides a client process that requests text files from a server process and presents the received text files in a manner intended to improve perceived performance by the user. In a first preferred embodiment, the client process receives requested text from the server process in blocks (also referred to herein as xe2x80x9cchunksxe2x80x9d) and parses the received chunks of text to identify line numbers of text received. The received, parsed text is stored in local cache memory associated with the client process and indexed to rapidly locate desired line numbers. In a second preferred embodiment, the server process is directed to return chunks of the text file in other than sequential order. In accordance with this second preferred embodiment, the server process returns chunks of the text file in other than sequential order. In particular, the client process provides the server process with one or more desired line numbers. The chunk or chunks containing the desired line numbers are returned first to the client process followed by other remaining chunks of the text file. As used herein the desired line numbers are referred to as xe2x80x9chot linesxe2x80x9d and the chunks containing the hot lines are referred to as xe2x80x9chot chunks.xe2x80x9d Other lines and chunks of the text file are referred to herein as xe2x80x9cnormal.xe2x80x9d
A first aspect of the invention provides for a system for displaying large text files comprising: a display for presenting text to a user; a client process responsive to user requests to display selected portions of a text file on the display; and a cache memory associated with the client process for storing data representative of the text file, wherein the client process is operable to parse the text file to identify line numbers associated with the text file and wherein the client process is further operable to store indices in the cache memory identifying the line numbers and corresponding portions of the text file and wherein the client process is further operable to display the selected portions of the text file in accordance with the line numbers.
In another aspect of the invention, the system further comprises a server process responsive to requests from the client process to retrieve an identified text file and to return chunks of text from the identified text file to the client process. And the client process includes: a graphical user interface thread for interacting with a user of the system and for displaying the selected portions of the text file on the display; a fetcher thread operable substantially in parallel with the parser thread for receiving the chunks of text from the server process and for storing the chunks of text in the cache memory; and a parser thread operable substantially in parallel with the graphical user interface thread for parsing the text file to identify the line numbers. And further, requests from the client process to the server process includes at least one line number of interest to a user of the system, and the server process is operable to return chunks of text that include the at least one line number before other chunks of text that do not include the at least one line number.