The invention relates to apparatus and process for editing a document with multiple classes of elements, and particularly, to apparatus and process for searching and replacing items in a document with embedded or nested elements.
The Internet is becoming an increasingly popular medium for communicating and publishing widely accessible documents. A network of networks, the Internet transfers information using a common protocol which tells computers connected to the network how to locate and exchange files with one another. Documents communicated over the Internet generally conform to a Hyper Text Markup Language (HTML) that a World-Wide-Web (WWW or Web) browser can translate and display. Once posted on a Web server, these documents or compilations of pages can be retrieved and viewed by anyone who has access to the Internet.
Each document is essentially a collection of HTML codes or tags which provide the document with a general structure of a head and a body, as well as headings, format controls, forms, tables, and one or more element types. The head contains a title, while the body contains information actually displayed. The body can be made up of a number of elements such as texts, images, sound clips and tags for forming controls, lists, tables, frames, layers, and others. A sample HTML document which displays an image and a string xe2x80x9cone two threexe2x80x9d in an increasing font size for each word in the string is as follows:
 less than HTML greater than 
 less than HEAD greater than 
 less than TITLE greater than SAMPLE less than /TITLE greater than 
 less than /HEAD greater than 
 less than BODY greater than 
 less than IMG SRC=xe2x80x9cEWLogo.gifxe2x80x9d WIDTH=xe2x80x9c72xe2x80x9d HEIGHT=xe2x80x9c73xe2x80x9d ALIGN=xe2x80x9cBOTTOMxe2x80x9d NATURALSIZEFLAG=xe2x80x9c3xe2x80x9d greater than  one  less than FONT SIZE=+2 greater than two less than /FONT greater than   less than FONT SIZE=+3 greater than three less than /FONT greater than 
 less than /BODY greater than 
 less than /HTML greater than 
Even for this relatively small document with one image element, three text elements and formatting elements, the HTML codes in the document are complicated. Not surprisingly, even though many people use the Internet daily, only a fraction can compose HTML documents without appropriate tools.
A basic familiarity with HTML codes is only one aspect in the process of creating HTML documents. Another issue relates to the process of editing such documents. Although a conventional next editor or word processor can be used to add HTML markups to the document, such method of composing and editing the HTML document is quite tedious as the process does not allow a user to see the document as actually displayed by the browser. Without a visual feedback, the process of composing and editing the HTML document can be error-prone and inefficient.
Moreover, when the HTML document contains elements other than the usual text and text formatting codes, the process of composing and editing the HTML document can be challenging. For example, if image elements are embedded in the document, the conventional text editor or word processor would reference each image using only its access path and file name. Consequently, the user has to be more careful and more exact in selecting the elements, as the user cannot visually verify that the correct image is being edited. Hence, the difficulty in generating the desired HTML document is increased when non-text elements are embedded in the document. The difficulty is particularly accentuated when elements such as table elements with embedded elements need to be specified.
Additionally, when search or replacement operations are performed on non-text elements, the formulation of such search queries can be daunting. Although each search query can designate the desired elements using access paths and file names, such approach is non-intuitive, tedious and error-prone. Moreover, in the event that the user wishes to specify elements having specific attributes, such as text case, text style, or element size, the search query can become quite unwieldy. In such events, if the conventional text editor or word processor is used, the user has to be intimately familiar with the HTML tags and needs to be unswervingly accurate in entering the HTML tags which include element addresses, element attributes, and HTML specific search terms in the search query. Moreover, in the event that the user wishes to perform wild-card type searches on these elements, a search specification language would be needed to supplement conventional word processors in locating the diverse element types. Thus, the process of searching for elements embedded in the HTML document can be non-trivial, especially when a large number of non-text elements such as images, sound clips and animation sequences are dispersed throughout the HTML document or embedded within elements of the HTML document.
An apparatus and a method perform search operations on a document with nested elements of varying types. The apparatus finds in the document an element which is capable of containing nested elements of one or more varying types. The apparatus can also replace the found element with a substitute element. The substitute element is also capable of containing nested elements of one or more varying types.
The types or classes of elements include texts, images, animation, and sound clips, among others. For each element, a matching function and a find function are provided. The matching function associated with one element determines if a target element matches itself based on specified search criteria. The find function associated with one element searches for a match of a target element within itself. The find function of one element can in turn invoke find and matching functions associated with elements embedded within itself in carrying out the search. As such, a hierarchical composition of elements can be searched.
One class of element includes image elements where each image element has a Uniform Resource Locator (URL) address. For this class, a default base class find function is used, as the image class does not contain any other elements. However, the image element has an overridden matching function which detects whether a target element belongs to the image class of elements, and if so, checks for matching URL addresses and other attributes such as image sizes.
In a second class relating to text elements, the text element has an overridden matching function which detects whether a target element belongs to the text element class, and if so, checks if the length of the text element matches the length of the target element. The overridden matching function also compares if specified characteristics of the element match corresponding characteristics of the target element. The characteristics checked may include text, font, size, case, and style characteristics. The text element class also has an overridden find function which detects whether an embedded element matches the target element.
Advantages of the invention include the following. The invention allows users to formulate search queries using an intuitive, easy to use process such as a drag and drop procedure, a copy and paste procedure, or a suitable composition procedure to place desired elements into a find dialog without any knowledge of HTML codes or tags. The invention thus supports a xe2x80x9cWhat You See is What You Getxe2x80x9d (WYSIWYG) HTML editor without requiring the user to learn the innards of HTML tags. Moreover, the invention provides a search and replace system that can handle documents with embedded or nested element types.