1. Field of the Invention
This invention relates to the field of computer software, and, more specifically, to advertising on the internet.
Portions of the disclosure of this patent document contain material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office file or records, but otherwise reserves all copyright rights whatsoever. Sun, Sun Microsystems, the Sun logo, Solaris, Java, JavaOS, JavaStation, HotJava Views and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries.
2. Background Art
In a computer network environment and the internet, advertising is increasingly utilized by owners of web sites (referred to as web hosts) as a revenue source and for the advertisers to gain publicity and web site access. Web hosts sell advertising space on their web site and distribute web pages including the advertisements to internet users or clients. It is desirable for advertisements to target specific audiences and persons that may be interested in the specific good or service being advertised. Prior art advertising schemes poorly target audiences and create excessive internet traffic when retrieving and transmitting advertisements. These problems can be understood by reviewing networks, internets, advertising and how they work.
Networks
In modern computing environments, it is commonplace to employ multiple computers or workstations linked together in a network to communicate between, and share data with, network users. A network also may include resources, such as printers, modems, file servers, etc., and may also include services, such as electronic mail.
A network can be a small system that is physically connected by cables (a local area network or xe2x80x9cLANxe2x80x9d), or several separate networks can be connected together to form a larger network (a wide area network or xe2x80x9cWANxe2x80x9d). Other types of networks include the internet, tel-com networks, the World Wide Web, intranets, extranets, wireless networks, and other networks over which electronic, digital, and/or analog data may be communicated.
Computer systems sometimes rely on a server computer system to provide information to requesting computers on a network. When there are a large number of requesting computers, it may be necessary to have more than one server computer system to handle the requests.
The Internet
The Internet is a worldwide network of interconnected computers. An Internet client accesses a computer on the network via an Internet provider. An Internet provider is an organization that provides a client (e.g., an individual or other organization) with access to the Internet (via analog telephone line or Integrated Services Digital Network line, for example). A client can, for example, read information from, download a file from or send an electronic mail message to another computer/client using the Internet.
To retrieve a file or service on the Internet, a client must search for the file or service, make a connection to the computer on which the file or service is stored, and download the file or service. Each of these steps may involve a separate application and access to multiple, dissimilar computer systems. The World Wide Web (WWW) was developed to provide a simpler, more uniform means for accessing information on the Internet.
The components of the WWW include browser software, network links, servers. and WWW protocols. The browser software, or browser, is a user-friendly interface (i.e., front-end) that simplifies access to the Internet. A browser allows a client to communicate a request without having to learn a complicated command syntax, for example. A browser typically provides a graphical user interface (GUI) for displaying information and receiving input. Examples of browsers currently available include Mosaic, Netscape Navigator and Communicator, Microsoft Internet Explorer, and Cello.
Information servers maintain the information on the WWW and are capable of processing a client request. Hypertext Transport Protocol (HTTP) is the standard protocol for communication with an information server on the WWW. HTTP has communication methods that allow clients to request data from a server and send information to the server.
To submit a request, the client contacts the HTTP server and transmits the request to the HTTP server. The request contains the communication method requested for the transaction (e.g., GET an object from the server or POST data to an object on the server). The HTTP server responds to the client by sending a status of the request and the requested information. The connection is then terminated between the client and the HTTP server.
A client request therefore, consists of establishing a connection between the client and the HTTP server, performing the request, and terminating the connection. The HTTP server does not retain any information about the request after the connection has been terminated. HTTP is, therefore, a stateless protocol. That is, a client can make several requests of an HTTP server, but each individual request is treated independent of any other request. The server has no recollection of any previous request.
Instead of transmitting the information from the server that maintains the information, some systems utilize what is referred to as a proxy. Referring to FIG. 1, a proxy 102 is a server that carries out requests transmitted to it (i.e., from client 100), keeping copies of fetched documents or information for some time so that they can be accessed more quickly in the future, speeding up access for commonly requested information. This maintaining of information and fetched documents by the proxy 102 is referred to as caching and the information maintained in the proxy 102 is referred to as a cache or proxy cache.
To protect information in internal computer networks from external access, a firewall is utilized. A firewall is a mechanism that blocks access between the client and the server. To provide limited access to information, a proxy or proxy server may sit atop a firewall and act as a conduit, providing a specific connection for each network connection. Proxy software retains the ability to communicate with external sources, yet is trusted to communicate with the internal network. For example, proxy software may require a username and password to access certain sections of the internal network and completely block other sections from any external access.
An addressing scheme is employed to identify Internet resources (e.g., HTTP server, file or program). This addressing scheme is called Uniform Resource Locator (URL). A URL contains the protocol to use when accessing the server (e.g., HTTP), the Internet domain name of the site on which the server is running, the port number of the server, and the location of the resource in the file structure of the server.
The WWW uses a concept known as hypertext. Hypertext provides the ability to create links within a document to move directly to other information. To activate the link, it is only necessary to click on the hypertext link (e.g., a word or phrase). The hypertext link can be to information stored on a different site than the one that supplied the current information. A URL is associated with the link to identify the location of the additional information. When the link is activated, the client""s browser uses the link to access the data at the site specified in the URL.
If the client request is for a file, the HTTP server locates the file and sends it to the client. An HTTP server also has the ability to delegate work to gateway programs. The Common Gateway Interface (CGI) specification defines a mechanism by which HTTP servers communicate with gateway programs. A gateway program is referenced using a URL. The HTTP server activates the program specified in the URL and uses CGI mechanisms to pass program data sent by the client to the gateway program. Data is passed from the server to the gateway program via command-line arguments, standard input, or environment variables. The gateway program processes the data and returns its response to the server using CGI (via standard input, for example). The server forwards the data to the client using the HTTP.
A browser displays information to a client/user as pages or documents (referred to as xe2x80x9cweb pagesxe2x80x9d or xe2x80x9cweb sitesxe2x80x9d). A language is used to define the format for a page to be displayed in the WWW. The language is called Hypertext Markup Language (HTML). A WWW page is transmitted to a client as an HTML document. The browser executing at the client parses the document and displays a page based on the information in the HTML document.
HTML is a structural language that is comprised of HTML elements that are nested within each other. An HTML document is a text file in which certain strings of characters, called tags, mark regions of the document and assign special meaning to them. These regions are called HTML elements. Each element has a name, or tag. An element can have attributes that specify properties of the element. Blocks or components include unordered list, text boxes, check boxes, and radio buttons, for example. Each block has properties such as name, type, and value. The following provides an example of the structure of an HTML document:
 less than HTML greater than 
 less than HEAD greater than 
. . . element(s) valid in the document head
 less than /HEAD greater than 
 less than BODY greater than 
. . . element(s) valid in the document body
 less than /BODY greater than 
 less than /HTML greater than 
Each HTML element is delimited by the pair of characters xe2x80x9c less than xe2x80x9d and xe2x80x9c greater than xe2x80x9d. The name of the HTML element is contained within the delimiting characters. The combination of the name and delimiting characters is referred to as a marker, or tag. Each element is identified by its marker. In most cases, each element has a start and ending marker. The ending marker is identified by the inclusion of an another character, xe2x80x9c/xe2x80x9d that follows the xe2x80x9c less than xe2x80x9d character.
HTML is a hierarchical language. With the exception of the HTML element, all other elements are contained within another element. The HTML element encompasses the entire document. It identifies the enclosed text as an HTML document. The HEAD element is contained within the HTML element and includes information about the HTML document. The BODY element is contained within the HTML. The BODY element contains all of the text and other information to be displayed. Other HTML elements are described in HTML reference manuals.
Advertising
In traditional media (e.g., television, radio, and newspaper), local advertising is provided by radio stations, television stations, the different newspaper editions, and different newspaper distributors. The local advertisers target a sub-group, often defined geographically, of the audience for that media outlet. For example, a local newspaper distributor for a metropolitan city newspaper (e.g., the Houston Chronicle or Los Angeles Times) may include advertising inserts and coupons from local grocers and shopkeepers with the newspaper.
Online advertising on the internet has followed the advertising approach in traditional media. Advertising space on the internet is sold by web hosts to third parties (advertisers). Additionally, an advertising agency may be hired by the advertiser to conduct internet advertising.
Advertising space on the internet often appears as a banner or icon on a web page. Banners often range from xc2xd-4  inches high and 4-8xc2xd inches wide. The banner or icon may be an image, text, or an image with text. Additionally, the banner or icon may have a hyperlink to the advertiser""s web page. Thus, if a user clicks on an advertiser""s banner, the user""s browser will load the advertiser""s web page.
Payment schemes for online advertising vary. For example, an advertiser may pay based on the number of times different users access a web site (referred to as hits or page impressions). Alternatively, an advertiser may only pay if a user clicks on the advertiser""s banner or icon and views the advertiser""s web page (referred to as a click-through). Further, a web host may also receive payment based on any completed transactions that result from a click through (e.g., the web host receives a percentage of the payment received by the advertiser from the user) (referred to as referral commissions).
Advertising schemes attempt to target audiences that would most likely be interested in the product or service being advertised. For example, the commercials aired in connection with cartoons on television often relate to children""s toys, cereal, or other items that children would utilize. Consequently, the more information known about a viewer or user of a particular web site, the more targeted an advertisement may be.
In existing internet advertising schemes, a web host often provides one advertisement that all clients or users view. Consequently, there is one global advertisement that all users of the web site see. Such a global advertisement assumes a homogeneous interest by all users and does not provide different advertisements based on different interests or characteristics of users.
Prior Art Advertising Schemes
One internet advertising scheme attempts to target specific audiences based on demographics. For example, a web site that provides information about a specific city (e.g., San Francisco) may attempt to capture local audiences by placing advertisements for businesses located in or near San Francisco. Thus, advertising on a Yahoo-San Francisco bay web site would attempt to target a local San Francisco bay area audience.
Another advertising scheme bases the advertisement on input from the user. For example, if a search for baby books were made on a search engine such as Yahoo, the web host for Yahoo may display advertisements relating to baby merchandise such as strollers and high chairs.
Another advertising scheme accesses cookies stored on individual""s browsers to determine the types of web sites that have been accessed. Cookies are small pieces of information that can later be read back from a browser. When a web site is accessed, a cookie is sent by the web site identifying itself to the web browser. Cookies are stored by the browser and may be read back by any server that desires to access the cookies at a later date. Based on the information retrieved from the cookies, a local advertisement targeted to the specific user""s interests (based on the web sites that user has accessed or retrieved a cookie from) is provided. Alternatively, the advertising scheme may evaluate the HTTP referring page information. To prevent this information from being distributed or used in any manner, software is available that enables users to strip off cookies or HTTP referring page information. Further, the information collected only pertains to the small set of sites which the advertiser has a business relationship, either directly or indirectly through an advertisement network.
Another advertising scheme attempts to guess the geographic location of a user based on the client""s internet protocol (IP) address. When accessing the internet, individual clients are differentiated from each other by a unique number referred to as an IP address. In this advertising scheme, a database is maintained by the web host that contains a mapping that provides a correspondence between each IP address to a modem phone number. The mappings are created by retrieving the modem phone numbers and the different IP addresses that the modem phone numbers correspond to from internet service providers (ISP) (ISPs are companies that provide internet access to users). By searching the database for the IP address, the web host or advertising company can deduce which modem phone number the user called in from. Based on the modem phone number and area code, the web host or advertising company can deduce where geographically the user is from or what telephone exchange the user is closest to. Consequently, the user is provided advertisements based on the estimated geographic location of the user.
Each of the above advertising schemes relies on the insertion and transmission of the advertisement by the web host. Additionally, each advertising scheme relies on information retrieved from the user (which may be modified by the user) or attempts to guess information about the user. Consequently, advertising is not precisely targeted and the premiums paid for xe2x80x9cgood demographicsxe2x80x9d and xe2x80x9cprecise targetingxe2x80x9d are lower.
The above advertising schemes also create additional processing overhead for the web host (for implementing an advertising scheme), require extra bandwidth to transmit the advertisement across the network to the user, and poorly target specific audiences. Further, due to the increased overhead and low hit count for small web sites, advertisers are reluctant to advertise on the smaller web sites. Additionally, due to the high advertising costs for large and frequently used web sites, small businesses cannot afford to conduct advertising.
Advertising payment schemes do not provide for payment to the ISPs. Since ISPs do not benefit from the advertising, ISPs often do not cache the advertisements (resulting in increased transmission time) and software to strip off advertisements from web pages have appeared. Further, since payment schemes may be based on the number of hits, the hit count and number of page impressions must be determined. Techniques for checking and auditing the hit counts and number of page impressions are primitive and primarily based on trusting the web host who may inflate the numbers.
A method and apparatus for local advertising on the internet. Advertising is increasingly utilized by owners of web sites (referred to as web hosts) as a revenue source and for the advertisers to gain publicity and web site access. Web hosts sell advertising space on their web site and distribute web pages including the advertisements to internet users or clients. It is desirable for advertisements to target specific audiences and persons that may be interested in the specific good or service being advertised. Prior art advertising schemes poorly target audiences and create excessive internet traffic when retrieving and transmitting advertisements.
According to one ore more embodiments of the invention, Internet Service Providers (ISPs) or proxies owned by an ISP insert advertisements that are transmitted from a web host to a client. Additionally, any entity may insert or replace an advertisement that is transmitted to a client. The inserted advertisement may be an advertisement that is stored in the proxy""s cache or may be retrieved from a web server for an advertiser. By providing the ISP with the ability to insert the advertisement, advertisements appear on small web sites that do not normally attract advertisers. Additionally, due to the number of advertisements placed by an ISP, small advertisers may have their advertisement appear in connection with frequently used web sites.
In addition to inserting advertisements, one or more embodiments of the invention provide for an ISP to collect and store information regarding particular users in a user profile. The information may include demographic information such as the user""s age, residence, credit history, etc. Additionally, the information may include the web sites that the user has accessed, the time spent on each web site, and any internet searches performed by the user.
The profile information may be utilized by the proxy to conduct targeted advertising or the information may be provided to a web host so that the web host may conduct targeted advertising. The profile information may also be utilized to associate a cost with certain demographic information. For example, if the profile information indicates that the user is interested in automobiles, a premium may be charged to an automobile advertiser. The profile information may be evaluated by the ISP for advertisement insertion. Alternatively, the profile information may be forwarded to an advertiser or advertising agency that evaluates and forwards back an advertisement for the proxy to transmit to the user. Thus, the profile and demographic information is utilized to precisely target advertisements to specific users.