1. Field of the Invention
The present invention relates to computer networks and more particularly to a method and apparatus for populating a form with data.
2. Background Art
Computer users can shop for merchandise using the Internet. A number of companies maintain web sites where customers can purchase books, records, videos, and many other products. Other web sites provide subscription services that provide the user with access to a service or product. Typically, users make purchases or sign up for services by completing forms. A problem with current Internet schemes is that each field on a form must be filled out separately and manually. This is a time consuming and often frustrating process for the user.
Current systems do not provide a way for users to quickly complete a form (referred to as xe2x80x9cpopulatingxe2x80x9d a form) with data. For example, when a user wishes to purchase a product online, the user typically supplies a credit card number, a shipping address, and any other personal information that may be needed to complete the transaction. The user supplies this information using an input device such as a keyboard to manually enter data into the fields of a form. However, after buying goods from one vendor, for example, the user may wish to purchase goods from a second vendor. The user must manually complete the second vendor""s form even though the second vendor may require the purchaser to provide the same kind of information the user provided to the first vendor. This presents a problem to the user because it forces the user to manually enter the same information into different vendor""s forms multiple times.
Another problem presented by the prior art is that users do not have a way to securely store sensitive information and then later use that information to populate a form. Current systems do not provide a way to prevent unauthorized users from obtaining access to sensitive information stored on a user""s computer. Instead such systems leave data that is entered into a form readily accessible. For example, data that is entered into a form may be stored locally using a technology called cookies. A cookie is a local representation of information related to a particular web page that was previously visited by the user. Cookies are not encrypted and may be read by anyone who can obtain them. The term anyone also includes any computer program that knows where the cookies are stored. This is problematic because it unnecessarily exposes sensitive information to unauthorized users. Therefore, there is a need for a method and apparatus for populating forms with data where that data is accessible only to authenticated users.
The problems associated with form population may be better understood by the following discussion of the Internet/World Wide Web, web page creation, embedded forms, cookies, and web browser technology.
The Internet/World Wide Web
A web browser is a type of computer program that provides users with a mechanism for accessing the World Wide Web (WWW). The WWW is a segment of the Internet comprised of numerous web clients and web servers that communicate with one another using a standard set of protocols. A web server is a computer configured to provide web pages to the web client upon request. A web client typically utilizes the web browser application to request web pages from the web server.
The Internet is a global computer network comprised of an amalgamation of interconnected networks that are capable of readily communicating with one another using a standardized set of protocols. Protocols provide a uniform set of communication parameters that enable computers to effectively transmit and receive data. Most computer networks, including the Internet, utilize several different layers of protocols. Each layer provides the network with different functionality.
The WWW is a segment of the Internet that utilizes an application layer protocol called the HyperText Transfer Protocol (HTTP) to disseminate and to obtain information from users. HTTP is a request/response protocol used with distributed, collaborative, hypermedia information systems. In operation, HTTP enables one computer to communicate with another. For example referring now to FIG. 1, web client 100 can use HTTP to communicate with web server 150 via network 125. In this scenario the web server acts as a repository for files 160 and is capable of processing the web client""s requests for such files. The files 160 stored on the web server may contain any type of data. For example, the files may contain data used to construct a form, image data, text data, or any other type of data.
HTTP has communication methods that allow web client 100 to request or send data to web server 150. The web client 100 may use web browser 110 to initiate request 115 and receive one or more files from file repository 160. Typically, web browser 110 sends a request 115 for at least one file to web server 150 and the web server forwards the requested file to web client 100 in response 120. The connection is then terminated between web client 100 and web server 150. Web client 100 then uses web browser 110 to display the requested file. A client request 115 therefore, consists of establishing a connection between the web client and the web server using network 125, issuing a response 120 to the request 115, and terminating the connection.
Web server 150 does not utilize state information about the request once the connection is terminated. HTTP is, therefore, a stateless application protocol. That is, a web client can send several requests to a web server, but each individual request is treated independent of any other request. In some instances, the web server maintains a record of such requests, but the web server does not use that information to process later requests. Thus, for example, if a form is completed by the user and submitted to the web server for processing, the web server may maintain a record of the data entered into the form, but that record will not be used to later influence another request from the same client. In other words, web server 150 may record a request in a log file, but it does not later read from the log to determine how to respond to another request from the same client.
Once a file is sent from web server 150 to web client 100 it becomes ready for display. The web client""s 100 web browser 110 is typically used to format and display files. Web browser 110 allows the user to request and view a file without having to learn a complicated command syntax. Examples of several widely used web browsers include Netscape Navigator, Internet Explorer, and Opera. Some web browsers can display several different types of files. For example, files written using the HyperText Markup Language (HTML), the JavaScript programming language, the ActiveX programming language, or the Portable Document Format (PDF) may be displayed using a web browser. It is also possible to display various other types of files using language such as Standard Generalized Markup Language (SGML) or eXtensible Markup Language (XML).
Creating a Web Page
A form, which provides one or more places for a user to enter data, can be embedded inside of a web page. A web page may be created using a variety of different data formats and/or programming languages. Most web pages, and as a result most forms, are created using the HyperText Markup Language (HTML). The techniques used to create a web page will now be discussed in further detail.
HTML is a language that may be used to specify the contents of a web page (e.g. web page 220). An HTML description is typically comprised of a set of markup symbols which are described in more detail below. HTML file 250 or any type of data file that contains the markup symbols for web page 220 may be sent to web browser 210. Web browser 210 executing at web client 200 parses the markup symbols in HTML file 250 and produces web page 220, which is then displayed, based on the information in HTML file 250. Web page 220 may contain text, pictures, or forms comprised of embedded text fields, checkboxes, or other types of data that is to be displayed on the web client using web browser 210. Consequently, HTML document 250 defines the web page 220 that is rendered by web browser 210. For example, the following set of markup symbols directs web browser 210 to display a title, a heading, and an image called xe2x80x9cimage.jpgxe2x80x9d:
 less than HTML greater than 
 less than HEAD greater than 
 less than TITLE greater than  This is a document title  less than /TITLE greater than 
 less than /HEAD greater than 
 less than BODY greater than 
 less than H1 greater than  This text uses heading level one  less than /Hl greater than 
 less than IMG SRC=xe2x80x9chttp://www.idealab.com/image.jpgxe2x80x9d greater than 
 less than /BODY greater than 
 less than /HTML greater than 
In the above example, markup symbols (e.g. xe2x80x9c less than xe2x80x9d and xe2x80x9c greater than xe2x80x9d) indicate where each HTML command (e.g. TITLE) begins and ends. An HTML command, which is typically surround by markup symbols, provides the web browser with instructions to execute. Markup symbols typically surround an HTML command. The xe2x80x9c less than xe2x80x9d symbol indicates the start of an HTML command an the xe2x80x9c less than xe2x80x9d symbol indicates the end of an HTML command. Each start or end command has a corresponding xe2x80x9c greater than xe2x80x9d to indicate the close of that particular command. Information associated with the HTML command may be contained within the HTML command""s start and end symbols. An HTML command is used by the web browser 210 to determine how to process the block of information associated with the two commands.
In the above example, xe2x80x9c less than TITLE greater than xe2x80x9d, and xe2x80x9c less than /TITLE greater than xe2x80x9d are examples of HTML commands surrounded by markup symbols. The xe2x80x9c less than /TITLE greater than xe2x80x9d HTML command directs web browser 210 to place the text xe2x80x9cThis is a document titlexe2x80x9d in the title bar of web browser 210.
Some HTML commands have attribute names associated with the command. For example, HTML command xe2x80x9c less than IMG greater than xe2x80x9d, directs web browser 210 to display an image. A xe2x80x9cSRC=xe2x80x9d attribute identifies the location and name of the image to be displayed. In the above example, the statement xe2x80x9c less than IMG SRC=xe2x80x9chttp://www.idealab.com/image.jpgxe2x80x9c greater than xe2x80x9d tells the web browser to display an image named xe2x80x9cimage.jpgxe2x80x9d that can be obtained from the web server located at xe2x80x9chttp://www.idealab.com.xe2x80x9d
Embedding a Form into a Web Page
An HTML file may also contain HTML commands that cause the web browser to render a web page that contains fields for entering data. The portion of the web page that contains the data entry fields may be referred to as a form (e.g. the portion enclosed in the HTML tags  less than FORM greater than  less than /FORM greater than ). When the web page is comprised of primarily data entry fields the entire web page is sometimes also referred to as a form. As is discussed below, HTML includes an HTML form command that may cause the browser to display data entry fields. A text box, a drop down menu, a check box, a command button, a toggle button, or any other kind of interface component capable of receiving input are some examples of the type of data entry fields that may be placed in a form. Data is manually entered into the fields by the user and then submitted to a web server for processing. As previously discussed, when a user wishes to use a form to purchase a product, for example, the user manually fills in text boxes located on the form with the type of information needed to carry out the transaction. Other types of data entry fields require the user to select from one or more options such as in the case of a menu, a toggle button, or a radio button, for example. When the form is complete the user submits the completed form to the vendor""s web server by pressing a command button, for example.
A problem with collecting information from the user in this manner is that the user is required to manually enter data into the fields of a form. This takes time and becomes unnecessarily burdensome when the user wishes to complete more than one form with the same information.
FIG. 3 provides an example of, a form created using the HTML definition language. Code block 310 contains HTML command examples. When a document comprising code block 310 is transmitted to web browser 300 executing on web client 330, it causes form 305 to be displayed. Web browser 300 displays form 305 by parsing the HTML commands contained in code block 310 and then using the information obtained to format form 305. Once the user finishes entering information into form 305, the user may submit the form by pressing command button 331. Pressing command button 331 sends the information entered by the user in form 305 to web server 320 using the POST method. However, one of ordinary skill in the art could modify code block 310 to use the GET method instead of the POST method.
The  less than FORM greater than command shown in code block 310 indicates the beginning of a form. Once the initial FORM command is placed into the HTML document other HTML commands may be entered between the initial FORM command and the closing FORM command (e.g.  less than /FORM greater than ) that represent, for example, one or more data entry fields such as a text-box, drop-down lists, check boxes, radio buttons, and input buttons. The INPUT command is used to specify different types of data entry fields. The type of data entry field created is dependant upon the value assigned to the INPUT command""s TYPE attribute. For example, the INPUT command can create a text-box, a password field, a hidden field, a button, a radio button, or a checkbox. A xe2x80x9ctextxe2x80x9d value assigned to the TYPE attribute of an INPUT command creates a text box, for example. Similarly, a TYPE of xe2x80x9ccheckboxxe2x80x9d creates a checkbox. The following code, if inserted after the FORM element, creates a text box and a checkbox:
 less than INPUT TYPE=xe2x80x9ctextxe2x80x9d NAME=xe2x80x9cuser_namexe2x80x9d greater than 
 less than INPUT TYPE=xe2x80x9ccheckboxxe2x80x9d NAME=xe2x80x9cuser_item1xe2x80x9d greater than 
The HTML tags and text contained in code block 310, for example, create form 305 when displayed using web browser 300.
A  less than FORM greater than tag may contain an ACTION and/or METHOD attribute. The ACTION attribute specifies what program on the server to execute when form 305 is submitted for processing. The METHOD attribute describes how data entered into the form is handled. There are two METHOD choices: GET and POST. Each one of them is capable of processing the data entered into a form. To illustrate, if a user named Bill who resides at 130 West Union Street, Pasadena, Calif. 91103 with a credit card number of 1111222233334444 completes the Name Address, and Credit Card number fields shown on form 305 by filling in spaces 345-347, then the following data string 350 is created and sent to web server 320.
user_name=Billanduser_address=130 West Union Street
Pasadena CA 91103anduser_cc=1111222233334444
This data string may be handled by either the GET method or the POST method. However, in FIG. 3 it is handled by the POST method. The GET method appends information entered into form 305 by the user onto the end of a Uniform Resource Locator (URL 335) or in a QUERY_STRING environment variable. URL 335 is submitted to web server 320 by web client 330 in an HTTP request 315. Web server 320 is responsible for processing the submitted information. A problem with the GET method is that the information contained in URL 335 is not stored on web client 330 for later use and cannot therefore be later used to populate form 305. With the POST method, the information placed into form 305 by the user is placed into the data stream of the HTTP protocol. This allows web server 320 to process form 305 data using the standard input data stream to a separate program or script on the server. Standard input is a method for providing a program or script with data that is well know to programmers skilled in the art. The POST method also does not provide a way to store data on web client 330 for later use.
The ACTION attribute of the FORM tag indicates where the computer program tasked with processing data string 350 resides. For example, the following portion of code block 310 specifies that a computer program (e.g. a PERL script) named form_processor.pl is located in cgi-bin 321 of sever 320.
 less than FORM METHOD=POST ACTION=http://server320/form_processor.pl greater than 
The form_processor.pl program is responsible for processing data string 350. Programs that utilize the Common Gateway Interface (CGI) standard are typically located in a common directory (e.g. cgi-bin 321). CGI is a specified standard for communicating between HTTP servers and server-side gateway programs. A CGI program is capable of storing data supplied by the user, but it cannot determine whether the user asking for the form is the same user that previously submitted data to the CGI program. If the user has never visited the web site before, the CGI program does not have any information about that user. CGI programs can collect information about users who visit a web site, but such programs cannot collect information about users who have never visited the web site.
The program xe2x80x9cform_processor.plxe2x80x9d used in the above example is an example of a server-side program that processes data input by the user via form 305, for example. When form 305 is submitted web server 320 executes the xe2x80x9cform_processor.plxe2x80x9d program and passes the data that was entered by the user and then sent from web client 330 to server 321. That is, data string 350 is passed to xe2x80x9cform_processor.plxe2x80x9d when a form is submitted. When the gateway program is done processing data string 350 the result is sent to server 320 and a response may be forwarded back to web client 330. Gateway programs can be compiled programs written in languages such as C or C++ or they can be executable scripts written using a scripting language such as PERL, TCL, or various other language. Once data string 350 is processed it may be stored on server 320 in data store 322. A problem with using CGI programs is that since they are located on the server, information cannot easily be saved on the web client for later use.
Thus, existing mechanisms for processing forms reside on the server and are unable to populate a form on the client with user data from users that have never visited the web site having the form (e.g. the data is only available to web servers the user has previously visited).
HTTP Cookies
One example of data that may be stored on the client is a cookie. A cookie is an HTTP header that consists of a text-only string. The text-string is entered into the memory of a web client and accessible by the web browser. This text-string is initially set by the computer serving the web page and consists of the domain name, path, lifetime, and value of a variable that the server sets. If the lifetime of this variable is longer than the time the user spends at that site, then the text-string is saved on the web client for later use. As a result, cookies provide a way to store and later recall values entered into a web page by the user. For example, cookies allow a web server to individually customize a web page for each user who visits the page.
A cookie may be used to populate the elements of a form that a user has previously completed. For example, referring now to FIG. 9, if a user completes field 945-947 of form 905 and submits it to server 920 via submission path 915, server 920 can place the data submitted by the user in cookie 931 by sending web client 910 data via path 925. If the user returns to form 905 again Web browser 930 executing at web client 910 may use the data saved in cookie 931 populate fields 945-947 with the data previously entered by the user. Thus, a cookie can free the user from having to complete form 905 more than once. Cookies can be used to fill forms the user has previously visited. However, the data that is stored by the cookie cannot be used to populate other forms. That is, a problem with cookies is that forms located on different servers are not provided access to the same cookie. Only web servers that are listed as having permission to access a particular cookie may obtain such access.
A cookie is only accessible to the server that set, or created it. Thus, a second form cannot use the first form""s cookies to obtain the user""s name and address. For example, the web server located at www.merchantA.com may be allowed to read a cookie, but the web server located at www.merchantB.com may not be allowed to read the cookie created by www.merchantA.com
Referring now to FIG. 4, the process used to create and retrieve a cookie is illustrated. Initially, web browser 401 which resides on web client 400 issues a request 410 for web page 415 from web server 450. Web server 450 responds by sending a copy of web page 415 to web client 400 via response 411. Web client 400 uses web browser 401 to display web page 415 to the user. To create a cookie 460, web server 450 sends a xe2x80x9cSet-Cookiexe2x80x9d command in the header line contained in response 411. For example, in response to a request 410, web server 450 may send the following command in HTTP response 411:
Set-Cookie: NAME=VALUE; expires=DATE; path=PATH; domain=DOMAIN_NAME; secure.
The NAME and VALUE parameter contain the information included in the cookie. DATE is the time at which the cookie expires and is thus no longer saved on web client 210. DOMAIN is the host address or domain name for which the cookie is valid. For example, if a cookie is set by a computer having a domain name of www.merchantA.com, only the server responsible for that particular domain name is provided access to the cookie. The PATH parameter specifies a subset of URL""s at the appropriate domain for which the cookie is valid. If the keyword secure is used then the cookie is only transmitted over a secure connection. All of these parameters except the NAME=VALUE are optional to set a cookie. Once the Set-Cookie command is sent to web client 400 a cookie 460 is placed on web client 400.
To create a cookie 460 using the UNIX operating system, for example, web server 450 could execute the following shell script in response to a request 410:
#!/bin/sh
echo Content-type: text/html
echo Set-cookie: FooBar=foo; expires=Wednesday, 02-11-99 12:00:00 GMT
echo
echo  less than H1 greater than A Cookie named Foobar containing the text foo is now present on your computer less than /H greater than 
This script stores xe2x80x9cFooBar=fooxe2x80x9d on web client 400. The following script allows web server 450 to read the HTTP cookie:
#!/bin/sh
echo Content-type: text/html
echo
echo The data supplied here was obtained from a cookie: less than P greater than 
echo $HTTP_COOKIE
Once cookie 460 is created, the information stored in the cookie can be used in many different ways. Information in cookie 460 may be accessed by a script that has authorization to insert cookie 460 data into a form, for example. However, a problem with cookies that no authentication mechanism is required to obtain the information they store. A cookie is a text file and is not stored in encrypted form. While a browser may limit access to a cookie, the text that is stored in the cookie may be accessed or read by any program that can read a text file. Therefore, it is not wise to store any sensitive data in a cookie. An additional problem with cookies is that users do not have to enter a user name and password to obtain access to the cookie.
Predefined Paste Operation
Some web browsers provide users with a mechanism for pasting predefined data into a form. Under the generic preferences menu of the Opera web browser, for example, the user can select a command button labeled personal information. When this command button is selected a dialog box that contains fields for entering personal information opens. If the user desires, the user can enter a name, an address, a phone number, an e-mail address, and/or any other kind of information into the text fields of the dialog box.
Once the user enters information into the fields of the dialog box and clicks xe2x80x9cOKxe2x80x9d the information that was entered can be pasted into the fields of a form by performing a right click operation. For example, if the user encounters a form that has a place for entering an address, the user can elect to paste the address previously specified in the dialog box by first selecting the address field and then right-clicking on a pointing device such as a mouse and selecting from a drop down menu the item titled xe2x80x9cinsert addressxe2x80x9d.
A problem with pasting information into the fields of a form in this manner is that the user is required to manually select the place to insert the information and the type of information to insert into that place. Another problem is that unauthorized users are not prevented from accessing the information entered into the dialog box.
A method and apparatus for populating a form with data is described. One embodiment of the present invention executes on a web client that is connected to a computer network such as the Internet. The web client is capable of obtaining web pages that contain forms from the Internet. The web client contains computer code, such as a form completion program, configured to enable a user to populate a form with data. The form completion program interacts with a target application (e.g. a web browser) installed at the web client to provide the user with a mechanism for filling out multiple forms. The form completion program, for example, provides the user with a mechanism to complete a form by dragging a graphical representation of a data set over to the form and then dropping it.
In one embodiment of the invention, a form is displayed to the user via the target application. Each form has one or more data receptacles. The data receptacles of a form are filled with data when the user executes a data population command. The form completion program executes the data population command when a graphical representation of a particular data set is placed over the form. The form completion program collects data from the user via a data collection interface. The data collection interface provides data entry fields for the user to enter a particular type of information (e.g. a data set). Each data set that is entered using the data collection interface is stored in an encrypted manner and is accessible to users who enter the appropriate information into an authentication mechanism.
To populate a form with data the form completion program obtains an image of the form and then searches for a template file that resembles the form image to within a certain threshold. The template files are typically stored on the computer hosting the target application in a template directory that is arranged according to a predefined structure. The form completion program is configured to search for templates that resemble the form image in the template directory to within a certain threshold. However, the invention also contemplates searching for templates at alternate locations. Templates, for example, may reside on any other computer that is accessible via a telecommunication medium such as a computer network.
The form completion program examines each template file in order to determine if one or more of the template files resembles the form image to within a certain threshold. If a template file that resembles the form image to within a certain threshold is located, then the form completion acknowledges that a match occurred. When a match occurs the form completion program utilizes the template file to identify what kind of data to insert into each of the form""s data receptacles. For example, the template file allows the form completion program to determine which of the data receptacles contain personal information and which data receptacles contain payment information. Once the form completion program successfully identifies what kind of data to insert into each data receptacle the program begins to input the appropriate kind of data into the appropriate data receptacle.
The present invention contemplates the use of multiple techniques to insert information into the data receptacles. The technique used is largely dependent upon the type of target application used to display the form.