A number of fields of endeavor are relevant to the present invention, and exemplary prior art, incorporated herein by reference, are disclosed below. The references disdosed provide a skilled artisan with embodiments of elements of the present invention, and the teachings therein may be combined and subcombined in various manners in accordance with the present teachings. The topical headings are advisory only, and are not intended to limit the applicability of any reference. While some embodiments are discussed as being preferred, it should be understood that all embodiments discussed, in any portion of this documents, whether stated as having advantages or not, form a part of the invention and may be combined and/or subcombined in a consistent manner in accordance with the teachings hereof.
Internet
The Internet is structured such various networks are interconnected, with communications effected by addressed packets conforming to a common protocol. Based on the packet addressing, information is routed from source to destination, often through a set of networks having multiple potential pathways. The communications medium is shared between all users. Statistically, some proportion of the packets are extraordinarily delayed, or simply lost. Therefore, protocols involving communications using these packets include error detection schemes that request a retransmit of required data not received within a time window. In the even that the network nears capacity or is otherwise subject to limiting constraint, the incidence of delayed or lost packets increases, thereby increasing requests for retransmission and retransmission. Therefore, as the network approaches available bandwidth, the load increases, ultimately leading to failure. In instances where a minimum quality of service must be guaranteed, special Internet technologies are required, to reserve bandwidth or to specify network pathways. End-to-end quality of service guarantees, however, may exceed the cost of circuit switched technologies, such as dialup modems, especially where the high quality needs are intermittent.
Internet usage typically involves an Internet server, an automated system capable of responding to communications received through the Internet, and often communicating with other systems not directly connected to the Internet. The server typically has relatively large bandwidth to the Internet, allowing multiple simultaneous communications sessions, and usually supports the hypertext transport protocol (HTTP), which provides, in conjunction with a so-called web browser on a remote client system, a human readable interface which facilitates navigation of various resources available in the Internet. The client systems are typically human user interfaces, which employ a browser to display HTTP “web pages”. The browser typically does not provide intelligence. Bandwidth between the client and Internet is typically relatively small, and various communications and display rendering considered normal. Typically, both client and server are connected to the Internet through Internet service providers, each having its own router.
It is also known to provide so-called proxy servers and firewalls, which are automated systems that insulate the client system from the Internet. Further, so-called Internet applications and applets are known which provide local intelligence at the client system. Further, it is known to provide a local server within the client system for locally processing a portion of the information. These local servers, applications and applets are non-standard, and thus require special software to be available locally for execution.
Thus, the Internet poses a number of advantages for commercial use, including low cost and ubiquitous connectivity. Therefore, it is desirable to employ standard Internet technologies while achieving sufficient quality communications to effect an efficient transaction.
Market Economy Systems
In modern retail transactions, predetermined price transactions are common, with market transactions, i.e., commerce conducted in a setting which allows the transaction price to float based on the respective valuation allocated by the buyer(s) and seller(s), often left to specialized fields. While interpersonal negotiation is often used to set a transfer price, this price is often different from a transfer price that might result from a best-efforts attempt at establishing a market price. Assuming that the market price is optimal, it is therefore assumed that alternatives are sub optimal. Therefore, the establishment of a market price is desirable over simple negotiations.
One particular problem with market-based commerce is that both seller optimization and market efficiency depend on the fact that representative participants of a preselected class are invited to participate, and are able to promptly communicate, on a relevant timescale, in order to accurately value the goods or services and make an offer. Thus, in traditional market-based system, all participants are in the same room, or connected by a high quality telecommunications link. Alternately, the market valuation process is prolonged over an extended period, allowing non-real time communications of market information and bids. Thus, attempts at ascertaining a market price for non-commodity goods can be subject to substantial inefficiencies, which reduce any potential gains by market pricing. Further, while market pricing might be considered “fair”, it also imposes an element of risk, reducing the ability of parties to predict future pricing and revenues. Addressing this risk may also reduce efficiency of a market-based system.
Auction Systems
When a single party seeks to sell goods to the highest valued purchaser(s), to establish a market price, the rules of conduct typically define an auction. Typically, known auctions provide an ascending price or descending price over time, with bidders making offers or ceasing to make offers, in the descending price or ascending price models, respectively, to define the market price. After determining the winner of the auction, the pricing rules define uniform price auctions, wherein all successful bidders pay the lowest successful bid, second price auctions wherein the winning bidder pays the amount bid by the next highest bidder, and pay-what-you-bid auctions. The pay-what-you-bid auction is also known as a discriminative auction while the uniform price auction is known as a non-discriminative auction. In a second-price auction, also known as a Vickrey auction, the policy seeks to create a disincentive for speculation and to encourage bidders to submit bids reflecting their true value for the good. In the uniform price and second price schemes, the bidder is encourages to disclose the actual private value to the bidder of the good or service, since at any price below this amount, there is an excess gain to the buyer, whereas by withholding this amount the bid may be unsuccessful, resulting in a loss of the presumably desirable opportunity. In the pay-what-you-bid auction, on the other hand, the buyer need not disclose the maximum private valuation, and those bidders with lower risk tolerance will bid higher prices. See, www.isoc.org/inet98/proceedings/3b/3b—3.html; www.ibm.com/iac/reports-technical/reports-bus-neg-internet.html.
Two common types of auction are the English auction, which sells a single good to the highest bidder in an ascending price auction, and the Dutch auction, in which multiple units are available for sale, and in which a starting price is selected by the auctioneer, which is successively reduced, until the supply is exhausted by bidders (or the minimum price/final time is reached), with the buyer(s) paying the lowest successful bid. The term Dutch auction is also applied to a type of sealed bid auction. In a multi-unit live Dutch auction, each participant is provided with the current price, the quantity on hand and the time remaining in the auction. This type of auction, typically takes place over a very short period of time and there is a flurry of activity in the last portion of the auction process. The actual auction terminates when there is no more product to be sold or the time period expires.
In selecting the optimal type of auction, a number of factors are considered. In order to sell large quantities of a perishable commodity in a short period of time, the descending price auctions are often preferred. For example, the produce and flower markets in Holland routinely use the Dutch auction (hence the derivation of the name), while the U.S. Government uses this form to sell its financial instruments. The format of a traditional Dutch auction encourages early bidders to bid up to their “private value”, hoping to pay some price below the “private value”. In making a bid, the “private value” becomes known, helping to establish a published market value and demand curve for the goods, thus allowing both buyers and sellers to define strategies for future auctions.
In an auction, typically a seller retains an auctioneer to conduct an auction with multiple buyers. (In a reverse auction, a buyer solicits the lowest price from multiple competing vendors for a desired purchase). Since the seller retains the auctioneer, the seller essentially defines the rules of the auction. These rules are typically defined to maximize the revenues or profit to the seller, while providing an inviting forum to encourage a maximum number of high valued buyers. If the rules discourage high valuations of the goods or services, or discourage participation by an important set of potential bidders, then the rules are not optimum. A rule may also be imposed to account for the valuation of the good or service applied by the seller, in the form of a reserve price. It is noted that these rules typically seek to allocate to the seller a portion of the economic benefit that would normally inure to the buyer, creating an economic inefficiency. However, since the auction is to benefit the seller, not society as a whole, this potential inefficiency is tolerated. An optimum auction thus seeks to produce a maximum profit (or net revenues) for the seller. An efficient auction, on the other hand, maximizes the sum of the utilities for the buyer and seller. It remains a subject of academic debate as to which auction rules are most optimum in given circumstances; however, in practice, simplicity of implementation may be a paramount concern, and simple auctions may result in highest revenues; complex auctions, while theoretically more optimal, may discourage bidders from participating or from applying their true and full private valuation in the auction process.
Typically, the rules of the auction are predefined and invariant. Further, for a number of reasons, auctions typically apply the same rules to all bidders, even though, with a priori knowledge of the private values assigned by each bidder to the goods, or a prediction of the private value, an optimization rule may be applied to extract the full value assigned by each bidder, while selling above the sellers reserve.
In a known ascending price auction, each participant must be made aware of the status of the auction, e.g., open, closed, and the contemporaneous price. A bid is indicated by the identification of the bidder at the contemporaneous price, or occasionally at any price above the minimum bid increment plus the previous price. The bids are asynchronous, and therefore each bidder must be immediately informed of the particulars of each bid by other bidders.
In a known descending price auction, the process traditionally entails a common clock, which corresponds to a decrementing price at each decrement interval, with an ending time (and price). Therefore, once each participant is made aware of the auction parameters, e.g., starting price, price decrement, ending price/time, before the start of the auction, the only information that must be transmitted is auction status (e.g., inventory remaining).
As stated above, an auction is traditionally considered an efficient manner of liquidating goods at a market price. The theory of an auction is that either the buyer will not resell, and thus has an internal or private valuation of the goods regardless of other's perceived values, or that the winner will resell, either to gain economic efficiency or as a part of the buyers regular business. In the later case, it is a general presumption that the resale buyers are not in attendance at the auction or are otherwise precluded from bidding, and therefore that, after the auction, there will remain demand for the goods at a price in excess of the price paid during the auction. Extinction of this residual demand results in the so-called “winner's curse”, in which the buyer can make no profit from the transaction during the auction. Since this detracts from the value of the auction as a means of conducting profitable commerce, it is of concern to both buyer and seller. In fact, experience with initial public offerings (IPOs) of stock through various means has demonstrated that by making stock available directly to all classes of potential purchasers, latent demand for a new issue is extinguished, and the stock price is likely to decline after issuance, resulting in an IPO which is characterized as “unsuccessful”. This potential for post IPO decline tempers even initial interest in the issue, resulting in a paradoxical decline in revenues from the vehicle. In other words, the “money on the table” resulting from immediate retrading of IPO shares is deemed a required aspect of the IPO process. Thus, methods that retain latent demand after IPO shares result in post IPO increases, and therefore a “successful” IPO. Therefore, where the transaction scheme anticipates demand for resale after the initial distribution, it is often important to assure a reasonable margin for resellers and limitations on direct sale to ultimate consumers.
Research into auction theory (game theory) shows that in an auction, the goal of the seller is to optimize the auction by allocating the goods inefficiently, and thus to appropriate to himself an excess gain. This inefficiency manifests itself by either withholding goods from the market or placing the goods in the wrong hands. In order to assure for the seller a maximum gain from a misallocation of the goods, restrictions on resale are imposed; otherwise, post auction trading will tend to undue the misallocation, and the anticipation of this trading will tend to control the auction pricing. The misallocation of goods imposed by the seller through restrictions allow the seller to achieve greater revenues than if free resale were permitted. It is believed that in an auction followed by perfect resale, that any mis-assignment of the goods lowers the sellers revenues below the optimum and likewise, in an auction market followed by perfect resale, it is optimal for the seller to allocate the goods to those with the highest value. Therefore, if post-auction trading is permitted, the seller will not benefit from these later gains, and the seller will obtain sub optimal revenues.
These studies, however, typically do not consider transaction costs and internal inefficiencies of the resellers, as well as the possibility of multiple classes of purchasers, or even multiple channels of distribution, which may be subject to varying controls or restrictions, and thus in a real market, such theoretical optimal allocation is unlikely. In fact, in real markets the transaction costs involved in transfer of ownership are often critical in determining a method of sale and distribution of goods. For example, it is the efficiency of sale that motivates the auction in the first place. Yet, the auction process itself may consume a substantial margin, for example 1-15% of the transaction value. To presume, even without externally imposed restrictions on resale, that all of the efficiencies of the market may be extracted by free reallocation, ignores that the motivation of the buyer is a profitable transaction, and the buyer may have fixed and variable costs on the order of magnitude of the margin. Thus, there are substantial opportunities for the seller to gain enhanced revenues by defining rules of the auction, strategically allocating inventory amount and setting reserve pricing.
Therefore, perfect resale is but a fiction created in auction (game) theory. Given this deviation from the ideal presumptions, auction theory may be interpreted to provide the seller with a motivation to misallocate or withhold based on the deviation of practice from theory, likely based on the respective transaction costs, seller's utility of the goods, and other factors not considered by the simple analyses.
A number of proposals have been made for effecting auction systems using the Internet. These systems include consumer-to-consumer, business-to-consumer, and business-to-business types. Generally, these auctions, of various types and implementations discussed further below, are conducted through Internet browsers using hypertext markup language (HTML) “web pages”, using HTTP. In some instances, such as BIDWATCH, discussed further below, an application with associated applets is provided to define a user interface instead of HTML.
As stated above, the information packets from the transaction server to client systems associated with respective bidders communicate various information regarding the status of an interactive auction during the progress thereof. The network traffic from the client systems to the transaction server is often limited to the placement of bids; however, the amount of information required to be transmitted can vary greatly, and may involve a complex dialogue of communications to complete the auction offer. Typically, Internet based auction systems have scalability issues, wherein economies of scale are not completely apparent, leading to implementation of relatively large transaction server systems to handle peak loads. When the processing power of the transaction server system is exceeded, entire system outages may occur, resulting in lost sales or diminished profits, and diminished goodwill.
In most Internet auction system implementations, there are a large quantity of simultaneous auctions, with each auction accepting tens or hundreds of bids over a timescale of hours to days. In systems where the transaction volume exceeds these scales, for example in stock and commodity exchanges, which can accommodate large numbers of transactions per second involving the same issue, a private network, or even a local area network, is employed, and the public Internet is not used as a direct communications system with the transaction server. Thus, while infrastructures are available to allow successful handling of massive transaction per second volumes, these systems typically avoid direct public Internet communications or use of some of its limiting technologies. The transaction processing limitations are often due to the finite time required to handle, e.g., open, update, and close, database records.
In business-to-business auctions, buyers seek to ensure that the population of ultimate consumers for the good or services are not present at the auction, in order to avoid the “winner's curse”, where the highest bidder in the auction cannot liquidate or work the asset at a profit. Thus, business-to-business auctions are distinct from business-to-consumer auctions. In the former, the optimization by the seller must account for the desire or directive of the seller to avoid direct retail distribution, and instead to rely on a distribution tier represented in the auction. In the latter, the seller seeks maximum revenues and to exhaust the possibilities for downstream trade in the goods or services. In fact, these types of auctions may be distinguished by various implementing rules, such as requiring sales tax resale certificates, minimum lot size quantities, preregistration or qualification, support or associated services, or limitations on the title to the goods themselves. The conduct of these auctions may also differ, in that consumer involvement typically is permissive of mistake or indecision, while in a pure business environment professionalism and decisiveness are mandated.
In many instances, psychology plays an important role in the conduct of the auction. In a live auction, bidders can see each other, and judge the tempo of the auction. In addition, multiple auctions are often conducted sequentially, so that each bidder can begin to understand the other bidder's patterns, including hesitation, bluffing, facial gestures or mannerisms. Thus, bidders often prefer live auctions to remote or automated auctions if the bidding is to be conducted strategically.
Internet Auctions
On-line electronic auction systems which allow efficient sales of products and services are well known, for example, EBAY.COM, ONSALE.COM, UBID.COM, and the like. Inverse auctions that allow efficient purchases of product are also known, establishing a market price by competition between sellers. The Internet holds the promise of further improving efficiency of auctions by reducing transaction costs and freeing the “same time-same place” limitations of traditional auctions. This is especially appropriate where the goods may be adequately described by text or images, and thus a physical examination of the goods is not required prior to bidding.
In existing Internet systems, the technological focus has been in providing an auction system that, over the course of hours to days, allow a large number of simultaneous auctions, between a large number of bidders to occur. These systems must be scalable and have high transaction throughput, while assuring database consistency and overall system reliability. Even so, certain users may selectively exploit known technological limitations and artifacts of the auction system, including non-real time updating of bidding information, especially in the final stages of an auction.
Because of existing bandwidth and technological hurdles, Internet auctions are quite different from live auctions with respect to psychological factors. Live auctions are often monitored closely by bidders, who strategically make bids, based not only on the “value” of the goods, but also on an assessment of the competition, timing, psychology, and progress of the auction. It is for this reason that so-called proxy bidding, wherein the bidder creates a preprogrammed “strategy”, usually limited to a maximum price, are disfavored. A maximum price proxy bidding system is somewhat inefficient, in that other bidders may test the proxy, seeking to increase the bid price, without actually intending to purchase, or contrarily, after testing the proxy, a bidder might give up, even below a price he might have been willing to pay. Thus, the proxy imposes inefficiency in the system that effectively increases the transaction cost.
In order to address a flurry of activity that often occurs at the end of an auction, an auction may be held open until no further bids are cleared for a period of time, even if advertised to end at a certain time. This is common to both live and automated auctions. However, this lack of determinism may upset coordinated schedules, thus impairing efficient business use of the auction system.
In order to facilitate management of bids and bidding, some of the Internet auction sites have provided non-Hypertext Markup Language (HTML) browser based software “applet” to track auctions. For example, ONSALE.COM has made available a Marimba Castanet® applet called Bidwatch to track auction progress for particular items or classes of items, and to facilitate bidding thereon. This system, however, lacks real-time performance under many circumstances, having a stated refresh period of 10 seconds, with a long latency for confirmation of a bid, due to constraints on software execution, quality of service in communications streams, and bid confirmation dialogue. Thus, it is possible to lose a bid even if an attempt was made prior to another bidder. The need to quickly enter the bid, at risk of being too late, makes the process potentially error prone.
Proxy bidding, as discussed above, is a known technique for overcoming the constraints of Internet communications and client processing limitations, since it bypasses the client and telecommunications links and may execute solely on the host system or local thereto. However, proxy bidding undermines some of the efficiencies gained by a live market.
U.S. Pat. No. 5,890,138 to Godin, et al. (Mar. 30, 1999), expressly incorporated herein by reference in its entirety, relates to an Internet auction system. The system implements a declining price auction process, removing a user from the auction process once an indication to purchase has been received. See, Rockoff, T. E., Groves, M.; “Design of an Internet-based System for Remote Dutch Auctions”, Internet Research, v 5, n 4, pp. 10-16, MCB University Press, Jan. 1, 1995.
A known computer site for auctioning a product on-line comprises at least one web server computer designed for serving a host of computer browsers and providing the browsers with the capability to participate in various auctions, where each auction is of a single product, at a specified time, with a specified number of the product available for sale. The web server cooperates with a separate database computer, separated from the web server computer by a firewall. The database computer is accessible to the web computer server computer to allow selective retrieval of product information, which includes a product description, the quantity of the product to be auctioned, a start price of the product, and an image of the product. The web server computer displays, updated during an auction, the current price of the product, the quantity of the product remaining available for purchase and the measure of the time remaining in the auction. The current price is decreased in a predetermined manner during the auction. Each user is provided with an input instructing the system to purchase the product at a displayed current price, transmitting an identification and required financial authorization for the purchase of the product, which must be confirmed within a predetermined time. In the known system, a certain fall-out rate in the actual purchase confirmation may be assumed, and therefore some overselling allowed. Further, after a purchase is indicate, the users screen is not updated, obscuring the ultimate lowest selling price from the user. However, if the user maintains a second browser, he can continue to monitor the auction to determine whether the product could have been purchased at a lower price, and if so, fail to confirm the committed purchase and purchase the same goods at a lower price while reserving the goods to avoid risk of loss. Thus, the system is flawed, and may fail to produce an efficient transaction or optimal price.
An Internet declining price auction system may provide the ability to track the price demand curve, providing valuable marketing information. For example, in trying to determine the response at different prices, companies normally have to conduct market surveys. In contrast, with a declining price auction, substantial information regarding price and demand is immediately known. The relationship between participating bidders and average purchasers can then be applied to provide a conventional price demand curve for the particular product.
U.S. Pat. No. 5,835,896, Fisher, et al., issued Nov. 10, 1998, expressly incorporated herein by reference in its entirety, provides method and system for processing and transmitting electronic auction information over the Internet, between a central transaction server system and remote bidder terminals. Those bids are recorded by the system and the bidders are updated with the current auction status information. When appropriate, the system closes the auction from further bidding and notifies the winning bidders and losers as to the auction outcome. The transaction server posts information from a database describing a lot available for purchase, receives a plurality of bids, stored in a bid database, in response to the information, and automatically categorizes the bids as successful or unsuccessful. Each bid is validated, and an electronic mail message is sent informing the bidder of the bid status. This system employs HTTP, and thus does not automatically update remote terminal screens, requiring the e-mail notification feature.
The auction rules may be flexible, for example including Dutch-type auctions, for example by implementing a price markdown feature with scheduled price adjustments, and English-type (progressive) auctions, with price increases corresponding to successively higher bids. In the Dutch type auction, the price markdown feature may be responsive to bidding activity over time, amount of bids received, and number of items bid for. Likewise, in the progressive auction, the award price may be dependent on the quantity desired, and typically implements a lowest successful bid price rule. Bids that are below a preset maximum posted selling price are maintained in reserve by the system. If a certain sales volume is not achieved in a specified period of time, the price is reduced to liquidate demand above the price point, with the new price becoming the posted price. On the other hand, if a certain sales volume is exceeded in a specified period of time, the system may automatically increase the price. These automatic price changes allow the seller to respond quickly to market conditions while keeping the price of the merchandise as high as possible, to the seller's benefit. A “Proxy Bidding” feature allows a bidder to place a bid for the maximum amount they are willing to pay, keeping this value a secret, displaying only the amount necessary to win the item up to the amount of the currently high bids or proxy bids of other bidders. This feature allows bidders to participate in the electronic auction without revealing to the other bidders the extent to which they are willing to increase their bids, while maintaining control of their maximum bid without closely monitoring the bidding. The feature assures proxy bidders the lowest possible price up to a specified maximum without requiring frequent inquiries as to the state of the bidding.
A “Floating Closing Time” feature may also be implemented whereby the auction for a particular item is automatically closed if no new bids are received within a predetermined time interval, assuming an increasing price auction. Bidders thus have an incentive to place bids expeditiously, rather than waiting until near the anticipated close of the auction.
U.S. Pat. No. 5,905,975, Ausubel, issued May 18, 1999, expressly incorporated herein by reference in its entirety, relates to computer implemented methods and apparatus for auctions. The proposed system provides intelligent systems for the auctioneer and for the user. The auctioneer's system contains information from a user system based on bid information entered by the user. With this information, the auctioneer's system determines whether the auction can be concluded or not and appropriate messages are transmitted. At any point in the auction, bidders are provided the opportunity to submit not only their current bids, but also to enter future bids, or bidding rules which may have the opportunity to become relevant at future times or prices, into the auction system's database. Participants may revise their executory bids, by entering updated bids. Thus, at one extreme, a bidder who wishes to economize on his time may choose to enter his entire set of bidding rules into the computerized system at the start of the auction, effectively treating this as a sealed-bid auction. At the opposite extreme, a bidder who wishes to closely participate in the auction may choose to constantly monitor the auction's progress and to submit all of his bids in real time. See also, U.S. patent application Ser. No. 08/582,901 filed Jan. 4, 1996, which provides a method for auctioning multiple, identical objects and close substitutes.
Secure Networks
A number of references relate to secure networks, which are an aspect of various embodiments of the present invention. These references are incorporated herein by reference in their entirety, including U.S. Pat. Nos. 5,933,498 (Schneck, et al., Aug. 3, 1999); 5,978,918 (Scholnick, et al., Nov. 2, 1999); 6,005,943 (Cohen, et al., Dec. 21, 1999); 6,009,526 (Choi, Dec. 28, 1999); 6,021,202 (Anderson, et al., Feb. 1, 2000); 6,021,491 (Renaud, Feb. 1, 2000); 6,021,497 (Bouthillier, et al., Feb. 1, 2000); 6,023,762 (Dean, et al., Feb. 8, 2000); 6,029,245 (Scanlan, Feb. 22, 2000); 6,049,875 (Suzuki, et al., Apr. 11, 2000); 6,055,508 (Naor, et al., Apr. 25, 2000); 6,065,119 (Sandford, II, et al., May 16, 2000); 6,073,240 (Kurtzberg, et al., Jun. 6, 2000); 6,075,860 (Ketcham, Jun. 13, 2000); and 6,075,861 (Miller, II, Jun. 13, 2000).
Cryptographic Technology
U.S. Pat. No. 5,956,408 (Arnold, Sep. 21, 1999), expressly incorporated herein by reference, relates to an apparatus and method for secure distribution of data. Data, including program and software updates, is encrypted by a public key encryption system using the private key of the data sender. The sender also digitally signs the data. The receiver decrypts the encrypted data, using the public key of the sender, and verifies the digital signature on the transmitted data. The program interacts with basic information stored within the confines of the receiver. As result of the interaction, the software updates are installed within the confines of the user, and the basic information stored within the confines of the user is changed.
U.S. Pat. Nos. 5,982,891 (Ginter, et al., Nov. 9, 1999); 5,949,876 (Ginter, et al., Sep. 7, 1999); and 5,892,900 (Ginter, et al., Apr. 6, 1999), expressly incorporated herein by reference, relate to systems and methods for secure transaction management and electronic rights protection. Electronic appliances, such as computers, help to ensure that information is accessed and used only in authorized ways, and maintain the integrity, availability, and/or confidentiality of the information. Such electronic appliances provide a distributed virtual distribution environment (VDE) that may enforce a secure chain of handling and control, for example, to control and/or meter or otherwise monitor use of electronically stored or disseminated information. Such a virtual distribution environment may be used to protect rights of various participants in electronic commerce and other electronic or electronic-facilitated transactions. Distributed and other operating systems, environments and architectures, such as, for example, those using tamper-resistant hardware-based processors, may establish security at each node. These techniques may be used to support an all-electronic information distribution, for example, utilizing the “electronic highway.”
U.S. Pat. No. 6,009,177 (Sudia, Dec. 28, 1999), expressly incorporated herein by reference, relates to a cryptographic system and method with a key escrow feature that uses a method for verifiably splitting users' private encryption keys into components and for sending those components to trusted agents chosen by the particular users, and provides a system that uses modern public key certificate management, enforced by a chip device that also self-certifies. The methods for key escrow and receiving an escrow certificate are also applied herein to a more generalized case of registering a trusted device with a trusted third party and receiving authorization from that party enabling the device to communicate with other trusted devices. Further preferred embodiments provide for rekeying and upgrading of device firmware using a certificate system, and encryption of stream-oriented data.
U.S. Pat. No. 6,052,467 (Brands, Apr. 18, 2000), expressly incorporated herein by reference, relates to a system for ensuring that the blinding of secret-key certificates is restricted, even if the issuing protocol is performed in parallel mode. A cryptographic method is disclosed that enables the issuer in a secret-key certificate issuing protocol to issue triples consisting of a secret key, a corresponding public key, and a secret-key certificate of the issuer on the public key, in such a way that receiving parties can blind the public key and the certificate, but cannot blind a predetermined non-trivial predicate of the secret key even when executions of the issuing protocol are performed in parallel.
U.S. Pat. No. 6,052,780 (Glover, Apr. 18, 2000), expressly incorporated herein by reference, relates to a computer system and process for accessing an encrypted and self-decrypting digital information product while restricting access to decrypted digital information. Some of these problems with digital information protection systems may be overcome by providing a mechanism that allows a content provider to encrypt digital information without requiring either a hardware or platform manufacturer or a content consumer to provide support for the specific form of corresponding decryption. This mechanism can be provided in a manner that allows the digital information to be copied easily for back-up purposes and to be transferred easily for distribution, but which should not permit copying of the digital information in decrypted form. In particular, the encrypted digital information is stored as an executable computer program that includes a decryption program that decrypts the encrypted information to provide the desired digital information, upon successful completion of an authorization procedure by the user. In combination with other mechanisms that track distribution, enforce royalty payments and control access to decryption keys, an improved method is provided for identifying and detecting sources of unauthorized copies. Suitable authorization procedures also enable the digital information to be distributed for a limited number of uses and/or users, thus enabling per-use fees to be charged for the digital information.
See also, U.S. Pat. Nos. 4,200,770 (Cryptographic apparatus and method); 4,218,582 (Public key cryptographic apparatus and method); 4,264,782 (Method and apparatus for transaction and identity verification); 4,306,111 (Simple and effective public-key cryptosystem); 4,309,569 (Method of providing digital signatures); 4,326,098 (High security system for electronic signature verification); 4,351,982 (RSA Public-key data encryption system having large random prime number generating microprocessor or the like); 4,365,110 (Multiple-destinational cryptosystem for broadcast networks); 4,386,233 (Crytographic key notarization methods and apparatus); 4,393,269 (Method and apparatus incorporating a one-way sequence for transaction and identity verification); 4,399,323 (Fast real-time public key cryptography); 4,405,829 (Cryptographic communications system and method); 4,438,824 (Apparatus and method for cryptographic identity verification); 4,453,074 (Protection system for intelligent cards); 4,458,109 (Method and apparatus providing registered mail features in an electronic communication system); 4,471,164 (Stream cipher operation using public key cryptosystem); 4,514,592 (Cryptosystem); 4,528,588 (Method and apparatus for marking the information content of an information carrying signal); 4,529,870 (Cryptographic identification, financial transaction, and credential device); 4,558,176 (Computer systems to inhibit unauthorized copying, unauthorized usage, and automated cracking of protected software); 4,567,600 (Method and apparatus for maintaining the privacy of digital messages conveyed by public transmission); 4,575,621 (Portable electronic transaction device and system therefor); 4,578,531 (Encryption system key distribution method and apparatus); 4,590,470 (User authentication system employing encryption functions); 4,595,950 (Method and apparatus for marking the information content of an information carrying signal); 4,625,076 (Signed document transmission system); 4,633,036 (Method and apparatus for use in public-key data encryption system); 5,991,406 (System and method for data recovery); 6,026,379 (System, method and article of manufacture for managing transactions in a high availability system); 6,026,490 (Configurable cryptographic processing engine and method); 6,028,932 (Copy prevention method and apparatus for digital video system); 6,028,933 (Encrypting method and apparatus enabling multiple access for multiple services and multiple transmission modes over a broadband communication network); 6,028,936 (Method and apparatus for authenticating recorded media); 6,028,937 (Communication device which performs two-way encryption authentication in challenge response format); 6,028,939 (Data security system and method); 6,029,150 (Payment and transactions in electronic commerce system); 6,029,195 (System for customized electronic identification of desirable objects); 6,029,247 (Method and apparatus for transmitting secured data); 6,031,913 (Apparatus and method for secure communication based on channel characteristics); 6,031,914 (Method and apparatus for embedding data, including watermarks, in human perceptible images); 6,034,618 (Device authentication system which allows the authentication function to be changed); 6,035,041 (Optimal-resilience, proactive, public-key cryptographic system and method); 6,035,398 (Cryptographic key generation using biometric data); 6,035,402 (Virtual certificate authority); 6,038,315 (Method and system for normalizing biometric variations to authenticate users from a public database and that ensures individual biometric data privacy); 6,038,316 (Method and system for protection of digital information); 6,038,322 (Group key distribution); 6,038,581 (Scheme for arithmetic operations in finite field and group operations over elliptic curves realizing improved computational speed); 6,038,665 (System and method for backing up computer files over a wide area computer network); 6,038,666 (Remote identity verification technique using a personal identification device); 6,041,122 (Method and apparatus for hiding cryptographic keys utilizing autocorrelation timing encoding and computation); 6,041,123 (Centralized secure communications system); 6,041,357 (Common session token system and protocol); 6,041,408 (Key distribution method and system in secure broadcast communication); 6,041,410 (Personal identification fob); 6,044,131 (Secure digital x-ray image authentication method); 6,044,155 (Method and system for securely archiving core data secrets); 6,044,157 (Microprocessor suitable for reproducing AV data while protecting the AV data from illegal copy and image information processing system using the microprocessor); 6,044,205 (Communications system for transferring information between memories according to processes transferred with the information); 6,044,349 (Secure and convenient information storage and retrieval method and apparatus); 6,044,350 (Certificate meter with selectable indemnification provisions); 6,044,388 (Pseudorandom number generator); 6,044,462 (Method and apparatus for managing key revocation); 6,044,463 (Method and system for message delivery utilizing zero knowledge interactive proof protocol); 6,044,464 (Method of protecting broadcast data by fingerprinting a common decryption function); 6,044,466 (Flexible and dynamic derivation of permissions); 6,044,468 (Secure transmission using an ordinarily insecure network communication protocol such as SNMP); 6,047,051 (Implementation of charging in a telecommunications system); 6,047,066 (Communication method and device); 6,047,067 (Electronic-monetary system); 6,047,072 (Method for secure key distribution over a nonsecure communications network); 6,047,242 (Computer system for protecting software and a method for protecting software); 6,047,268 (Method and apparatus for billing for transactions conducted over the internet); 6,047,269 (Self-contained payment system with circulating digital vouchers); 6,047,374 (Method and apparatus for embedding authentication information within digital data); 6,047,887 (System and method for connecting money modules); 6,049,610 (Method and apparatus for digital signature authentication); 6,049,612 (File encryption method and system); 6,049,613 (Method and apparatus for encrypting, decrypting, and providing privacy for data values); 6,049,671 (Method for identifying and obtaining computer software from a network computer); 6,049,785 (Open network payment system for providing for authentication of payment orders based on a confirmation electronic mail message); 6,049,786 (Electronic bill presentment and payment system which deters cheating by employing hashes and digital signatures); 6,049,787 (Electronic business transaction system with notarization database and means for conducting a notarization procedure); 6,049,838 (Persistent distributed capabilities); 6,049,872 (Method for authenticating a channel in large-scale distributed systems); 6,049,874 (System and method for backing up computer files over a wide area computer network); 6,052,466 (Encryption of data packets using a sequence of private keys generated from a public key exchange); 6,052,467 (System for ensuring that the blinding of secret-key certificates is restricted, even if the issuing protocol is performed in parallel mode); 6,052,469 (Interoperable cryptographic key recovery system with verification by comparison); 6,055,314 (System and method for secure purchase and delivery of video content programs); 6,055,321 (System and method for hiding and extracting message data in multimedia data); 6,055,508 (Method for secure accounting and auditing on a communications network); 6,055,512 (Networked personal customized information and facility services); 6,055,636 (Method and apparatus for centralizing processing of key and certificate life cycle management); 6,055,639 (Synchronous message control system in a Kerberos domain); 6,056,199 (Method and apparatus for storing and reading data); 6,057,872 (Digital coupons for pay televisions); 6,058,187 (Secure telecommunications data transmission); 6,058,188 (Method and apparatus for interoperable validation of key recovery information in a cryptographic system); 6,058,189 (Method and system for performing secure electronic monetary transactions); 6,058,193 (System and method of verifying cryptographic postage evidencing using a fixed key set); 6,058,381 (Many-to-many payments system for network content materials); 6,058,383 (Computationally efficient method for trusted and dynamic digital objects dissemination); 6,061,448 (Method and system for dynamic server document encryption); 6,061,454 (System, method, and computer program for communicating a key recovery block to enable third party monitoring without modification to the intended receiver); 6,061,692 (System and method for administering a meta database as an integral component of an information server); 6,061,789 (Secure anonymous information exchange in a network); 6,061,790 (Network computer system with remote user data encipher methodology); 6,061,791 (Initial secret key establishment including facilities for verification of identity); 6,061,792 (System and method for fair exchange of time-independent information goods over a network); 6,061,794 (System and method for performing secure device communications in a peer-to-peer bus architecture); 6,061,796 (Multi-access virtual private network); 6,061,799 (Removable media for password based authentication in a distributed system); 6,064,723 (Network-based multimedia communications and directory system and method of operation); 6,064,738 (Method for encrypting and decrypting data using chaotic maps); 6,064,740 (Method and apparatus for masking modulo exponentiation calculations in an integrated circuit); 6,064,741 (Method for the computer-aided exchange of cryptographic keys between a user computer unit U and a network computer unit N); 6,064,764 (Fragile watermarks for detecting tampering in images); 6,064,878 (Method for separately permissioned communication); 6,065,008 (System and method for secure font subset distribution); 6,067,620 (Stand alone security device for computer networks); 6,069,647 (Conditional access and content security method); 6,069,952 (Data copyright management system); 6,069,954 (Cryptographic data integrity with serial bit processing and pseudo-random generators); 6,069,955 (System for protection of goods against counterfeiting); 6,069,969 (Apparatus and method for electronically acquiring fingerprint images); 6,069,970 (Fingerprint sensor and token reader and associated methods); 6,070,239 (System and method for executing verifiable programs with facility for using non-verifiable programs from trusted sources); 6,072,870 (System, method and article of manufacture for a gateway payment architecture utilizing a multichannel, extensible, flexible architecture); 6,072,874 (Signing method and apparatus using the same); 6,072,876 (Method and system for depositing private key used in RSA cryptosystem); 6,073,125 (Token key distribution system controlled acceptance mail payment and evidencing system); 6,073,160 (Document communications controller); 6,073,172 (Initializing and reconfiguring a secure network interface); 6,073,234 (Device for authenticating user's access rights to resources and method); 6,073,236 (Authentication method, communication method, and information processing apparatus); 6,073,237 (Tamper resistant method and apparatus); 6,073,238 (Method of securely loading commands in a smart card); 6,073,242 (Electronic authority server); 6,075,864 (Method of establishing secure, digitally signed communications using an encryption key based on a blocking set cryptosystem); 6,075,865 (Cryptographic communication process and apparatus); 6,076,078 (Anonymous certified delivery); 6,076,162 (Certification of cryptographic keys for chipcards); 6,076,163 (Secure user identification based on constrained polynomials); 6,076,164 (Authentication method and system using IC card); 6,076,167 (Method and system for improving security in network applications); 6,078,663 (Communication apparatus and a communication system); 6,078,665 (Electronic encryption device and method); 6,078,667 (Generating unique and unpredictable values); 6,078,909 (Method and apparatus for licensing computer programs using a DSA signature); 6,079,018 (System and method for generating unique secure values for digitally signing documents); 6,079,047 (Unwrapping system and method for multiple files of a container); 6,081,597 (Public key cryptosystem method and apparatus); 6,081,598 (Cryptographic system and method with fast decryption); 6,081,610 (System and method for verifying signatures on documents); 6,081,790 (System and method for secure presentment and payment over open networks); 6,081,893 (System for supporting secured log-in of multiple users into a plurality of computers using combined presentation of memorized password and transportable passport record), 6,192,473 (System and method for mutual authentication and secure communications between a postage security device and a meter server), each of which is expressly incorporated herein by reference.
See, also, U.S. Pat. Nos. 6,028,937 (Tatebayashi et al.), 6,026,167 (Aziz), 6,009,171 (Ciacelli et al.) (Content Scrambling System, or “CSS”), 5,991,399 (Graunke et al.), 5,948,136 (Smyers) (IEEE 1394-1995), and 5,915,018 (Aucsmith), expressly incorporated herein by reference, and Jim Wright and Jeff Robillard (Philsar Semiconductor), “Adding Security to Portable Designs”, Portable Design, March 2000, pp. 16-20.
See also, Stefik, U.S. Pat. Nos. 5,715,403 (System for controlling the distribution and use of digital works having attached usage rights where the usage rights are defined by a usage rights grammar); 5,638,443 (System for controlling the distribution and use of composite digital works); 5,634,012 (System for controlling the distribution and use of digital works having a fee reporting mechanism); and 5,629,980 (System for controlling the distribution and use of digital works), expressly incorporated herein by reference.
Watermarking
U.S. Pat. No. 5,699,427 (Chow, et al., Dec. 16, 1997), expressly incorporated herein by reference, relates to a method to deter document and intellectual property piracy through individualization, and a system for identifying the authorized receiver of any particular copy of a document. More specifically, each particular copy of a document is fingerprinted by applying a set of variations to a document, where each variation is a change in data contents, but does not change the meaning or perusal experience of the document. A database associating a set of variants to a receiver is maintained. Thus any variant or copy of that variant can be traced to an authorized receiver.
See also, U.S. Pat. Nos. 4,734,564 (Transaction system with off-line risk assessment); 4,812,628 (Transaction system with off-line risk assessment); 4,926,325 (Apparatus for carrying out financial transactions via a facsimile machine); 5,235,166 (Data verification method and magnetic media therefor); 5,254,843 (Securing magnetically encoded data using timing variations in encoded data); 5,341,429 (Transformation of ephemeral material); 5,428,683 (Method and apparatus for fingerprinting and authenticating magnetic media); 5,430,279 (Data verification method and magnetic media therefor); 5,521,722 (Image handling facilitating computer aided design and manufacture of documents); 5,546,462 (Method and apparatus for fingerprinting and authenticating various magnetic media); 5,606,609 (Electronic document verification system and method); 5,613,004 (Steganographic method and device); 5,616,904 (Data verification method and magnetic media therefor); 5,636,292 (Steganography methods employing embedded calibration data); 5,646,997 (Method and apparatus for embedding authentication information within digital data); 5,659,726 (Data embedding); 5,664,018 (Watermarking process resilient to collusion attacks); 5,687,236 (Steganographic method and device); 5,710,834 (Method and apparatus responsive to a code signal conveyed through a graphic image); 5,727,092 (Compression embedding); 5,734,752 (Digital watermarking using stochastic screen patterns); 5,740,244 (Method and apparatus for improved fingerprinting and authenticating various magnetic media); 5,745,569 (Method for stega-cipher protection of computer code); 5,745,604 (Identification/authentication system using robust, distributed coding); 5,748,763 (Image steganography system featuring perceptually adaptive and globally scalable signal embedding); 5,748,783 (Method and apparatus for robust information coding); 5,761,686 (Embedding encoded information in an iconic version of a text image); 5,765,152 (System and method for managing copyrighted electronic media); 5,768,426 (Graphics processing system employing embedded code signals); 5,778,102 (Compression embedding); 5,790,703 (Digital watermarking using conjugate halftone screens); 5,819,289 (Data embedding employing degenerate clusters of data having differences less than noise value); 5,822,432 (Method for human-assisted random key generation and application for digital watermark system); 5,822,436 (Photographic products and methods employing embedded information); 5,832,119 (Methods for controlling systems using control signals embedded in empirical data); 5,841,886 (Security system for photographic identification); 5,841,978 (Network linking method using steganographically embedded data objects); 5,848,155 (Spread spectrum watermark for embedded signalling); 5,850,481 (Steganographic system); 5,862,260 (Methods for surveying dissemination of proprietary empirical data); 5,878,137 (Method for obtaining authenticity identification devices for using services in general, and device obtained thereby); 5,889,868 (Optimization methods for the insertion, protection, and detection of digital watermarks in digitized data); 5,892,900 (Systems and methods for secure transaction management and electronic rights protection); 5,905,505 (Method and system for copy protection of on-screen display of text); 5,905,800 (Method and system for digital watermarking); 5,915,027 (Digital watermarking); 5,920,628 (Method and apparatus for fingerprinting and authenticating various magnetic media); 5,930,369 (Secure spread spectrum watermarking for multimedia data); 5,933,498 (System for controlling access and distribution of digital property); 5,943,422 (Steganographic techniques for securely delivering electronic digital rights management control information over insecure communication channels); 5,946,414 (Encoding data in color images using patterned color modulated image regions); 5,949,885 (Method for protecting content using watermarking); 5,974,548 (Media-independent document security method and apparatus); 5,995,625 (Electronic cryptographic packing); 6,002,772 (Data management system); 6,004,276 (Open architecture cardiology information system); 6,006,328 (Computer software authentication, protection, and security system); 6,006,332 (Rights management system for digital media); 6,018,801 (Method for authenticating electronic documents on a computer network); 6,026,193 (Video steganography); 6,044,464 (Method of protecting broadcast data by fingerprinting a common decryption function); 6,047,374 (Method and apparatus for embedding authentication information within digital data); 6,049,627 (Covert digital identifying indicia for digital image); 6,061,451 (Apparatus and method for receiving and decrypting encrypted data and protecting decrypted data from illegal use); 6,064,737 (Anti-piracy system for wireless telephony); 6,064,764 (Fragile watermarks for detecting tampering in images); 6,069,914 (Watermarking of image data using MPEG/JPEG coefficients); 6,076,077 (Data management system); 6,081,793 (Method and system for secure computer moderated voting), each of which is expressly incorporated herein by reference.
Role-Based Access
U.S. Pat. No. 6,023,765 (Kuhn, Feb. 8, 2000; Implementation of role-based access control in multi-level secure systems), expressly incorporated herein by reference, relates to a system and method for implementation of role-based access control in multi-level secure systems. Role-based access control (RBAC) is implemented on a multi-level secure (MLS) system by establishing a relationship between privileges within the RBAC system and pairs of levels and compartments within the MLS system. The advantages provided by RBAC, that is, reducing the overall number of connections that must be maintained, and, for example, greatly simplifying the process required in response to a change of job status of individuals within an organization, are then realized without loss of the security provided by MLS. A trusted interface function is developed to ensure that the RBAC rules permitting individual's access to objects are followed rigorously, and provides a proper mapping of the roles to corresponding pairs of levels and compartments. No other modifications are necessary. Access requests from subjects are mapped by the interface function to pairs of levels and compartments, after which access is controlled entirely by the rules of the MLS system.
See also, U.S. Pat. Nos. 6,073,242 (Electronic authority server); 6,073,240 (Method and apparatus for realizing computer security); 6,064,977 (Web server with integrated scheduling and calendaring); 6,055,637 (System and method for accessing enterprise-wide resources by presenting to the resource a temporary credential); 6,044,466 (Flexible and dynamic derivation of permissions); 6,041,349 (System management/network correspondence display method and system therefore); 6,014,666 (Declarative and programmatic access control of component-based server applications using roles); 5,991,877 (Object-oriented trusted application framework); 5,978,475 (Event auditing system); 5,949,866 (Communications system for establishing a communication channel on the basis of a functional role or task); 5,925,126 (Method for security shield implementation in computer system's software); 5,911,143 (Method and system for advanced role-based access control in distributed and centralized computer systems); 5,797,128 (System and method for implementing a hierarchical policy for computer system administration); 5,761,288 (Service context sensitive features and applications); 5,751,909 (Database system with methods for controlling object interaction by establishing database contracts between objects); 5,748,890 (Method and system for authenticating and auditing access by a user to non-natively secured applications); 5,621,889 (Facility for detecting intruders and suspect callers in a computer installation and a security system including such a facility); 5,535,383 (Database system with methods for controlling object interaction by establishing database contracts between objects); 5,528,516 (Apparatus and method for event correlation and problem reporting); 5,481,613 (Computer network cryptographic key distribution system); 5,347,578 (Computer system security); 5,265,221 (Access restriction facility method and apparatus), each of which is expressly incorporated herein by reference.
Computer System Security
A number of references relate to computer system security, which is a part of various embodiment of the invention. The following references relevant to this issue are incorporated herein by reference: U.S. Pat. Nos. 5,881,225 (Worth, Mar. 9, 1999); 5,937,068 (Audebert, Aug. 10, 1999); 5,949,882 (Angelo, Sep. 7, 1999); 5,953,419 (Lohstroh, et al., Sep. 14, 1999); 5,956,400 (Chaum, et al., Sep. 21, 1999); 5,958,050 (Griffin, et al., Sep. 28, 1999); 5,978,475 (Schneier, et al., Nov. 2, 1999); 5,991,878 (McDonough, et al., Nov. 23, 1999); 6,070,239 (McManis, May 30, 2000); and 6,079,021 (Abadi, et al., Jun. 20, 2000).
Computer Security Devices
A number of references relate to computer security devices, which is a part of various embodiment of the invention. The following references relevant to this issue are incorporated herein by reference: U.S. Pat. Nos. 5,982,520 (Weiser, et al., Nov. 9, 1999); 5,991,519 (Benhammou, et al., Nov. 23, 1999); 5,999,629 (Heer, et al., Dec. 7, 1999); 6,034,618 (Tatebayashi, et al., Mar. 7, 2000); 6,041,412 (Timson, et al., Mar. 21, 2000); 6,061,451 (Muratani, et al., May 9, 2000); and 6,069,647 (Sullivan, et al., May 30, 2000).
Virtual Private Network
A number of references relate to virtual private networks, which is a part of various embodiment of the invention. The following references relevant to this issue are incorporated herein by reference: U.S. Pat. Nos. 6,079,020 (Liu, Jun. 20, 2000); 6,081,900 (Secure intranet access); 6,081,533 (Method and apparatus for an application interface module in a subscriber terminal unit); 6,079,020 (Method and apparatus for managing a virtual private network); 6,078,946 (System and method for management of connection oriented networks); 6,078,586 (ATM virtual private networks); 6,075,854 (Fully flexible routing service for an advanced intelligent network); 6,075,852 (Telecommunications system and method for processing call-independent signalling transactions); 6,073,172 (Initializing and reconfiguring a secure network interface); 6,061,796 (Multi-access virtual private network); 6,061,729 (Method and system for communicating service information in an advanced intelligent network); 6,058,303 (System and method for subscriber activity supervision); 6,055,575 (Virtual private network system and method); 6,052,788 (Firewall providing enhanced network security and user transparency); 6,047,325 (Network device for supporting construction of virtual local area networks on arbitrary local and wide area computer networks); 6,032,118 (Virtual private network service provider for asynchronous transfer mode network); 6,029,067 (Virtual private network for mobile subscribers); 6,016,318 (Virtual private network system over public mobile data network and virtual LAN); 6,009,430 (Method and system for provisioning databases in an advanced intelligent network); 6,005,859 (Proxy VAT-PSTN origination); 6,002,767 (System, method and article of manufacture for a modular gateway server architecture); 6,002,756 (Method and system for implementing intelligent telecommunication services utilizing self-sustaining, fault-tolerant object oriented architecture), each of which is expressly incorporated herein by reference.
See also, U.S. Pat. Nos. 6,081,900 (Secure intranet access); 6,081,750 (Ergonomic man-machine interface incorporating adaptive pattern recognition based control system); 6,081,199 (Locking device for systems access to which is time-restricted); 6,079,621 (Secure card for E-commerce and identification); 6,078,265 (Fingerprint identification security system); 6,076,167 (Method and system for improving security in network applications); 6,075,455 (Biometric time and attendance system with epidermal topographical updating capability); 6,072,894 (Biometric face recognition for applicant screening); 6,070,141 (System and method of assessing the quality of an identification transaction using an identification quality score); 6,068,184 (Security card and system for use thereof); 6,064,751 (Document and signature data capture system and method); 6,056,197 (Information recording method for preventing alteration, information recording apparatus, and information recording medium); 6,052,468 (Method of securing a cryptographic key); 6,045,039 (Cardless automated teller transactions); 6,044,349 (Secure and convenient information storage and retrieval method and apparatus); 6,044,155 (Method and system for securely archiving core data secrets); 6,041,410 (Personal identification fob); 6,040,783 (System and method for remote, wireless positive identity verification); 6,038,666 (Remote identity verification technique using a personal identification device); 6,038,337 (Method and apparatus for object recognition); 6,038,315 (Method and system for normalizing biometric variations to authenticate users from a public database and that ensures individual biometric data privacy); 6,037,870 (Detector system for access control, and a detector assembly for implementing such a system); 6,035,406 (Plurality-factor security system); 6,035,402 (Virtual certificate authority); 6,035,398 (Cryptographic key generation using biometric data); 6,031,910 (Method and system for the secure transmission and storage of protectable information); 6,026,166 (Digitally certifying a user identity and a computer system in combination); 6,018,739 (Biometric personnel identification system); 6,016,476 (Portable information and transaction processing system and method utilizing biometric authorization and digital certificate security); 6,012,049 (System for performing financial transactions using a smartcard); 6,012,039 (Tokenless biometric electronic rewards system); 6,011,858 (Memory card having a biometric template stored thereon and system for using same); 6,009,177 (Enhanced cryptographic system and method with key escrow feature); 6,006,328 (Computer software authentication, protection, and security system); 6,003,135 (Modular security device); 6,002,770 (Method for secure data transmission between remote stations); 5,999,637 (Individual identification apparatus for selectively recording a reference pattern based on a correlation with comparative patterns); 5,999,095 (Electronic security system); 5,995,630 (Biometric input with encryption); 5,991,431 (Mouse adapted to scan biometric data); 5,991,429 (Facial recognition system for security access and identification); 5,991,408 (Identification and security using biometric measurements); 5,987,155 (Biometric input device with peripheral port); 5,987,153 (Automated verification and prevention of spoofing for biometric data); 5,986,746 (Topographical object detection system); 5,984,366 (Unalterable self-verifying articles); 5,982,894 (System including separable protected components and associated methods); 5,979,773 (Dual smart card access control electronic data storage and retrieval system and methods); 5,978,494 (Method of selecting the best enroll image for personal identification); 5,974,146 (Real time bank-centric universal payment system); 5,970,143 (Remote-auditing of computer generated outcomes, authenticated billing and access control, and software metering system using cryptographic and other protocols); 5,966,446 (Time-bracketing infrastructure implementation); 5,963,908 (Secure logon to notebook or desktop computers); 5,963,657 (Economical skin-pattern-acquisition and analysis apparatus for access control; systems controlled thereby); 5,954,583 (Secure access control system); 5,952,641 (Security device for controlling the access to a personal computer or to a computer terminal); 5,951,055 (Security document containing encoded data block); 5,949,881 (Apparatus and method for cryptographic companion imprinting); 5,949,879 (Auditable security system for the generation of cryptographically protected digital data); 5,949,046 (Apparatus for issuing integrated circuit cards); 5,943,423 (Smart token system for secure electronic transactions and identification); 5,935,071 (Ultrasonic biometric imaging and identity verification system); 5,933,515 (User identification through sequential input of fingerprints); 5,933,498 (System for controlling access and distribution of digital property); 5,930,804 (Web-based biometric authentication system and method); 5,923,763 (Method and apparatus for secure document timestamping); 5,920,477 (Human factored interface incorporating adaptive pattern recognition based controller apparatus); 5,920,384 (Optical imaging device); 5,920,058 (Holographic labeling and reading machine for authentication and security applications); 5,915,973 (System for administration of remotely-proctored, secure examinations and methods therefor); 5,913,196 (System and method for establishing identity of a speaker); 5,913,025 (Method and apparatus for proxy authentication); 5,912,974 (Apparatus and method for authentication of printed documents); 5,912,818 (System for tracking and dispensing medical items); 5,910,988 (Remote image capture with centralized processing and storage); 5,907,149 (Identification card with delimited usage); 5,901,246 (Ergonomic man-machine interface incorporating adaptive pattern recognition based control system); 5,898,154 (System and method for updating security information in a time-based electronic monetary system); 5,897,616 (Apparatus and methods for speaker verification/identification/classification employing non-acoustic and/or acoustic models and databases); 5,892,902 (Intelligent token protected system with network authentication); 5,892,838 (Biometric recognition using a classification neural network); 5,892,824 (Signature capture/verification systems and methods); 5,890,152 (Personal feedback browser for obtaining media files); 5,889,474 (Method and apparatus for transmitting subject status information over a wireless communications network); 5,881,226 (Computer security system); 5,878,144 (Digital certificates containing multimedia data extensions); 5,876,926 (Method, apparatus and system for verification of human medical data); 5,875,108 (Ergonomic man-machine interface incorporating adaptive pattern recognition based control system); 5,872,849 (Enhanced cryptographic system and method with key escrow feature); 5,872,848 (Method and apparatus for witnessed authentication of electronic documents); 5,872,834 (Telephone with biometric sensing device); 5,870,723 (Tokenless biometric transaction authorization method and system); 5,869,822 (Automated fingerprint identification system); 5,867,802 (Biometrically secured control system for preventing the unauthorized use of a vehicle); 5,867,795 (Portable electronic device with transceiver and visual image display); 5,867,578 (Adaptive multi-step digital signature system and method of operation thereof); 5,862,260 (Methods for surveying dissemination of proprietary empirical data); 5,862,246 (Knuckle profile identity verification system); 5,862,223 (Method and apparatus for a cryptographically-assisted commercial network system designed to facilitate and support expert-based commerce); 5,857,022 (Enhanced cryptographic system and method with key escrow feature); 5,850,451 (Enhanced cryptographic system and method with key escrow feature); 5,850,442 (Secure world wide electronic commerce over an open network); 5,848,231 (System configuration contingent upon secure input); 5,844,244 (Portable identification carrier); 5,841,907 (Spatial integrating optical correlator for verifying the authenticity of a person, product or thing); 5,841,886 (Security system for photographic identification); 5,841,865 (Enhanced cryptographic system and method with key escrow feature); 5,841,122 (Security structure with electronic smart card access thereto with transmission of power and data between the smart card and the smart card reader performed capacitively or inductively); 5,838,812 (Tokenless biometric transaction authorization system); 5,832,464 (System and method for efficiently processing payments via check and electronic funds transfer); 5,832,119 (Methods for controlling systems using control signals embedded in empirical data); 5,828,751 (Method and apparatus for secure measurement certification); 5,825,880 (Multi-step digital signature method and system); 5,825,871 (Information storage device for storing personal identification information); 5,815,577 (Methods and apparatus for securely encrypting data in conjunction with a personal computer); 5,815,252 (Biometric identification process and system utilizing multiple parameters scans for reduction of false negatives); 5,805,719 (Tokenless identification of individuals); 5,802,199 (Use sensitive identification system); 5,799,088 (Non-deterministic public key encryption system); 5,799,086 (Enhanced cryptographic system and method with key escrow feature); 5,799,083 (Event verification system); 5,790,674 (System and method of providing system integrity and positive audit capabilities to a positive identification system); 5,790,668 (Method and apparatus for securely handling data in a database of biometrics and associated data); 5,789,733 (Smart card with contactless optical interface); 5,787,187 (Systems and methods for biometric identification using the acoustic properties of the ear canal); 5,784,566 (System and method for negotiating security services and algorithms for communication across a computer network); 5,784,461 (Security system for controlling access to images and image related services); 5,774,551 (Pluggable account management interface with unified login and logout and multiple user authentication services); 5,771,071 (Apparatus for coupling multiple data sources onto a printed document); 5,770,849 (Smart card device with pager and visual image display); 5,768,382 (Remote-auditing of computer generated outcomes and authenticated billing and access control system using cryptographic and other protocols); 5,767,496 (Apparatus for processing symbol-encoded credit card information); 5,764,789 (Tokenless biometric ATM access system); 5,763,862 (Dual card smart card reader); 5,761,298 (Communications headset with universally adaptable receiver and voice transmitter); 5,757,916 (Method and apparatus for authenticating the location of remote users of networked computing systems); 5,757,431 (Apparatus for coupling multiple data sources onto a printed document); 5,751,836 (Automated, non-invasive iris recognition system and method); 5,751,809 (Apparatus and method for securing captured data transmitted between two sources); 5,748,738 (System and method for electronic transmission, storage and retrieval of authenticated documents); 5,745,573 (System and method for controlling access to a user secret); 5,745,555 (System and method using personal identification numbers and associated prompts for controlling unauthorized use of a security device and unauthorized access to a resource); 5,742,685 (Method for verifying an identification card and recording verification of same); 5,742,683 (System and method for managing multiple users with different privileges in an open metering system); 5,737,420 (Method for secure data transmission between remote stations); 5,734,154 (Smart card with integrated reader and visual image display); 5,719,950 (Biometric, personal authentication system); 5,712,914 (Digital certificates containing multimedia data extensions); 5,712,912 (Method and apparatus for securely handling a personal identification number or cryptographic key using biometric techniques); 5,706,427 (Authentication method for networks); 5,703,562 (Method for transferring data from an unsecured computer to a secured computer); 5,696,827 (Secure cryptographic methods for electronic transfer of information); 5,682,142 (Electronic control system/network); 5,682,032 (Capacitively coupled identity verification and escort memory apparatus); 5,680,460 (Biometric controlled key generation); 5,668,878 (Secure cryptographic methods for electronic transfer of information); 5,666,400 (Intelligent recognition); 5,659,616 (Method for securely using digital signatures in a commercial cryptographic system); 5,647,364 (Ultrasonic biometric imaging and identity verification system); 5,647,017 (Method and system for the verification of handwritten signatures); 5,646,839 (Telephone-based personnel tracking system); 5,636,282 (Method for dial-in access security using a multimedia modem); 5,633,932 (Apparatus and method for preventing disclosure through user-authentication at a printing node); 5,615,277 (Tokenless security system for authorizing access to a secured computer system); 5,613,012 (Tokenless identification system for authorization of electronic transactions and electronic transmissions); 5,608,387 (Personal identification devices and access control systems); 5,594,806 (Knuckle profile identity verification system); 5,592,408 (Identification card and access control device); 5,588,059 (Computer system and method for secure remote communication sessions); 5,586,171 (Selection of a voice recognition data base responsive to video data); 5,583,950 (Method and apparatus for flash correlation); 5,583,933 (Method and apparatus for the secure communication of data); 5,578,808 (Data card that can be used for transactions involving separate card issuers); 5,572,596 (Automated, non-invasive iris recognition system and method); 5,561,718 (Classifying faces); 5,559,885 (Two stage read-write method for transaction cards); 5,557,765 (System and method for data recovery); 5,553,155 (Low cost method employing time slots for thwarting fraud in the periodic issuance of food stamps, unemployment benefits or other governmental human services); 5,544,255 (Method and system for the capture, storage, transport and authentication of handwritten signatures); 5,534,855 (Method and system for certificate based alias detection); 5,533,123 (Programmable distributed personal security); 5,526,428 (Access control apparatus and method); 5,523,739 (Metal detector for control of access combined in an integrated form with a transponder detector); 5,497,430 (Method and apparatus for image recognition using invariant feature signals); 5,485,519 (Enhanced security for a secure token code); 5,485,312 (Optical pattern recognition system and method for verifying the authenticity of a person, product or thing); 5,483,601 (Apparatus and method for biometric identification using silhouette and displacement images of a portion of a person's hand); 5,478,993 (Process as safety concept against unauthorized use of a payment instrument in cashless payment at payment sites); 5,475,839 (Method and structure for securing access to a computer system); 5,469,506 (Apparatus for verifying an identification card and identifying a person by means of a biometric characteristic); 5,457,747 (Anti-fraud verification system using a data card); 5,455,407 (Electronic-monetary system); 5,453,601 (Electronic-monetary system); 5,448,045 (System for protecting computers via intelligent tokens or smart cards); 5,432,864 (Identification card verification system); 5,414,755 (System and method for passive voice verification in a telephone network); 5,412,727 (Anti-fraud voter registration and voting system using a data card); 5,363,453 (Non-minutiae automatic fingerprint identification system and methods); 5,347,580 (Authentication method and system with a smartcard); 5,345,549 (Multimedia based security systems); 5,341,428 (Multiple cross-check document verification system); 5,335,288 (Apparatus and method for biometric identification); 5,291,560 (Biometric personal identification system based on iris analysis); 5,283,431 (Optical key security access system); 5,280,527 (Biometric token for authorizing access to a host system); 5,272,754 (Secure computer interface); 5,245,329 (Access control system with mechanical keys which store data); 5,229,764 (Continuous biometric authentication matrix); 5,228,094 (Process of identifying and authenticating data characterizing an individual); 5,224,173 (Method of reducing fraud in connection with employment, public license applications, social security, food stamps, welfare or other government benefits); 5,208,858 (Method for allocating useful data to a specific originator); 5,204,670 (Adaptable electric monitoring and identification system); 5,191,611 (Method and apparatus for protecting material on storage media and for transferring material on storage media to various recipients); 5,163,094 (Method for identifying individuals from analysis of elemental shapes derived from biosensor data); 5,155,680 (Billing system for computing software); 5,131,038 (Portable authentification system); 5,073,950 (Finger profile identification system); 5,067,162 (Method and apparatus for verifying identity using image correlation); 5,065,429 (Method and apparatus for protecting material on storage media); 5,056,147 (Recognition procedure and an apparatus for carrying out the recognition procedure); 5,056,141 (Method and apparatus for the identification of personnel); 5,036,461 (Two-way authentication system between user's smart card and issuer-specific plug-in application modules in multi-issued transaction device); 5,020,105 (Field initialized authentication system for protective security of electronic information networks); 4,993,068 (Unforgettable personal identification system); 4,972,476 (Counterfeit proof ID card having a scrambled facial image); 4,961,142 (Multi-issuer transaction device with individual identification verification plug-in application modules for each issuer); 4,952,928 (Adaptable electronic monitoring and identification system); 4,941,173 (Device and method to render secure the transfer of data between a videotex terminal and a server); 4,926,480 (Card-computer moderated systems); 4,896,363 (Apparatus and method for matching image characteristics such as fingerprint minutiae); 4,890,323 (Data communication systems and methods); 4,868,376 (Intelligent portable interactive personal data system); 4,827,518 (Speaker verification system using integrated circuit cards); 4,819,267 (Solid state key for controlling access to computer systems and to computer software and/or for secure communications); 4,752,676 (Reliable secure, updatable “cash” card system); 4,736,203 (3D hand profile identification apparatus); 4,731,841 (Field initialized authentication system for protective security of electronic information networks); 4,564,018 (Ultrasonic system for obtaining ocular measurements), each of which is expressly incorporated herein by reference.
Content-Based Query Servers
U.S. Pat. No. 5,987,459 (Swanson, et al. Nov. 16, 1999), expressly incorporated herein by reference, relates to an image and document management system for content-based retrieval support directly into the compressed files. The system minimizes a weighted sum of the expected size of the compressed files and the expected query response time. Object searching of documents stored by the system is possible on a scalable resolution basis. The system includes a novel object representation based on embedded prototypes that provides for high-quality browsing of retrieval images at low bit rates.
U.S. Pat. No. 6,038,560 (Wical, Mar. 14, 2000), expressly incorporated herein by reference, relates to a concept knowledge base search and retrieval system, which includes factual knowledge base queries and concept knowledge base queries, is disclosed. A knowledge base stores associations among terminology/categories that have a lexical, semantic or usage association. Document theme vectors identify the content of documents through themes as well as through classification of the documents in categories that reflects what the documents are primarily about. The factual knowledge base queries identify, in response to an input query, documents relevant to the input query through expansion of the query terms as well as through expansion of themes. The concept knowledge base query does not identify specific documents in response to a query, but specifies terminology that identifies the potential existence of documents in a particular area.
U.S. Pat. No. 6,067,466 (Selker, et al., May 23, 2000), expressly incorporated herein by reference, relates to a diagnostic tool using a predictive instrument. A method is provided for evaluating a medical condition of a patient including the steps of monitoring one or more clinical features of a patient; based on the monitored features, computing a primary probability of a medical outcome or diagnosis; computing a plurality of conditional probabilities for a selected diagnostic test, the computed conditional probabilities including a first probability of the medical outcome or diagnosis assuming the selected diagnostic test produces a first outcome and a second probability of the medical outcome or diagnosis assuming the selected diagnostic test produces a second outcome; and displaying the computed primary probability as well as the plurality of computed conditional probabilities to a user as an aid to determining whether to administer the selected diagnostic test to the patient.
E-Commerce Systems
U.S. Pat. No. 5,946,669 (Polk, Aug. 31, 1999), expressly incorporated herein by reference, relates to a method and apparatus for payment processing using debit-based electronic funds transfer and disbursement processing using addendum-based electronic data interchange. This disclosure describes a payment and disbursement system, wherein an initiator authorizes a payment and disbursement to a collector and the collector processes the payment and disbursement through an accumulator agency. The accumulator agency processes the payment as a debit-based transaction and processes the disbursement as an addendum-based transaction. The processing of a debit-based transaction generally occurs by electronic funds transfer (EFT) or by financial electronic data interchange (FEDI). The processing of an addendum-based transaction generally occurs by electronic data interchange (EDI).
U.S. Pat. No. 6,005,939 (Fortenberry, et al., Dec. 21, 1999), expressly incorporated herein by reference, relates to a method and apparatus for storing an Internet user's identity and access rights to World Wide Web resources. A method and apparatus for obtaining user information to conduct secure transactions on the Internet without having to re-enter the information multiple times is described. The method and apparatus can also provide a technique by which secured access to the data can be achieved over the Internet. A passport containing user-defined information at various security levels is stored in a secure server apparatus, or passport agent, connected to computer network. A user process instructs the passport agent to release all or portions of the passport to a recipient node and forwards a key to the recipient node to unlock the passport information.
U.S. Pat. No. 6,016,484 (Williams, et al., Jan. 18, 2000), expressly incorporated herein by reference, relates to a system, method and apparatus for network electronic payment instrument and certification of payment and credit collection utilizing a payment. An electronic monetary system provides for transactions utilizing an electronic-monetary system that emulates a wallet or a purse that is customarily used for keeping money, credit cards and other forms of payment organized. Access to the instruments in the wallet or purse is restricted by a password to avoid unauthorized payments. A certificate form must be completed in order to obtain an instrument. The certificate form obtains the information necessary for creating a certificate granting authority to utilize an instrument, a payment holder and a complete electronic wallet. Electronic approval results in the generation of an electronic transaction to complete the order. If a user selects a particular certificate, a particular payment instrument holder will be generated based on the selected certificate. In addition, the issuing agent for the certificate defines a default bitmap for the instrument associated with a particular certificate, and the default bitmap will be displayed when the certificate definition is completed. Finally, the number associated with a particular certificate will be utilized to determine if a particular party can issue a certificate.
U.S. Pat. No. 6,029,150 (Kravitz, Feb. 22, 2000), expressly incorporated herein by reference, relates to a system and method of payment in an electronic payment system wherein a plurality of customers have accounts with an agent. A customer obtains an authenticated quote from a specific merchant, the quote including a specification of goods and a payment amount for those goods. The customer sends to the agent a single communication including a request for payment of the payment amount to the specific merchant and a unique identification of the customer. The agent issues to the customer an authenticated payment advice based only on the single communication and secret shared between the customer and the agent and status information, which the agent knows about the merchant, and/or the customer. The customer forwards a portion of the payment advice to the specific merchant. The specific merchant provides the goods to the customer in response to receiving the portion of the payment advice.
U.S. Pat. No. 6,047,269 (Biffar, Apr. 4, 2000), expressly incorporated herein by reference, relates to a self-contained payment system with creating and facilitating transfer of circulating digital vouchers representing value. A digital voucher has an identifying element and a dynamic log. The identifying element includes information such as the transferable value, a serial number and a digital signature. The dynamic log records the movement of the voucher through the system and accordingly grows over time. This allows the system operator to not only reconcile the vouchers before redeeming them, but also to recreate the history of movement of a voucher should an irregularity like a duplicate voucher be detected. These vouchers are used within a self-contained system including a large number of remote devices that are linked to a central system. The central system can e linked to an external system. The external system, as well as the remote devices, is connected to the central system by any one or a combination of networks. The networks must be able to transport digital information, for example the Internet, cellular networks, telecommunication networks, cable networks or proprietary networks. Vouchers can also be transferred from one remote device to another remote device. These remote devices can communicate through a number of methods with each other. For example, for a non-face-to-face transaction the Internet is a choice, for a face-to-face or close proximity transactions tone signals or light signals are likely methods. In addition, at the time of a transaction a digital receipt can be created which will facilitate a fast replacement of vouchers stored in a lost remote device.
Micropayments
U.S. Pat. No. 5,999,919 (Jarecki, et al., Dec. 7, 1999), expressly incorporated herein by reference, relates to an efficient micropayment system. Existing software proposals for electronic payments can be divided into “on-line” schemes which require participation of a trusted party (the bank) in every transaction and are secure against overspending, and “off-line” schemes which do not require a third party and guarantee only that overspending is detected when vendors submit their transaction records to the bank (usually at the end of the day). A new “hybrid” scheme is proposed which combines the advantages of both “on-line” and “off-line”electronic payment schemes. It allows for control of overspending at a cost of only a modest increase in communication compared to the off-line schemes. The protocol is based on probabilistic polling. During each transaction, with some small probability, the vendor forwards information about this transaction to the bank. This enables the bank to maintain an accurate approximation of a customers spending. The frequency of polling messages is related to the monetary value of transactions and the amount of overspending the bank is willing to risk. For transactions of high monetary value, the cost of polling approaches that of the on-line schemes, but for micropayments, the cost of polling is a small increase over the traffic incurred by the off-line schemes.
Micropayments are often preferred where the amount of the transaction does not justify the costs of complete financial security. In the micropayment scheme, typically a direct communication between creditor and debtor is not required; rather, the transaction produces a result which eventually results in an economic transfer, but which may remain outstanding subsequent to transfer of the underlying goods or services. The theory underlying this micropayment scheme is that the monetary units are small enough such that risks of failure in transaction closure is relatively insignificant for both parties, but that a user gets few chances to default before credit is withdrawn. On the other hand, the transaction costs of a non-real time transactions of small monetary units are substantially less than those of secure, unlimited or potentially high value, real time verified transactions, allowing and facilitating such types of commerce. Thus, the rights management system may employ applets local to the client system, which communicate with other applets and/or the server and/or a vendor/rights-holder to validate a transaction, at low transactional costs.
The following U.S. patents, expressly incorporated herein by reference, define aspects of micropayment, digital certificate, and on-line payment systems: U.S. Pat. Nos. 5,930,777 (Barber, Jul. 27, 1999, Method of charging for pay-per-access information over a network); 5,857,023 (Jan. 5, 1999, Demers et al., Space efficient method of redeeming electronic payments); 5,815,657 (Sep. 29, 1998, Williams, System, method and article of manufacture for network electronic authorization utilizing an authorization instrument); 5,793,868 (Aug. 11, 1998, Micali, Certificate revocation system), 5,717,757 (Feb. 10, 1998, Micali, Certificate issue lists); 5,666,416 (Sep. 9, 1997, Micali, Certificate revocation system); 5,677,955 (Doggett et al., Electronic funds transfer instruments); 5,839,119 (Nov. 17, 1998, Krsul; et al., Method of electronic payments that prevents double-spending); 5,915,093 (Berlin et al.); 5,937,394 (Wong, et al.); 5,933,498 (Schneck et al.); 5,903,880 (Biffar); 5,903,651 (Kocher); 5,884,277 (Khosla); 5,960,083 (Sep. 28, 1999, Micali, Certificate revocation system); 5,963,924 (Oct. 5, 1999, Williams et al., System, method and article of manufacture for the use of payment instrument holders and payment instruments in network electronic commerce); 5,996,076 (Rowney et al., System, method and article of manufacture for secure digital certification of electronic commerce); 6,016,484 (Jan. 18, 2000, Williams et al., System, method and article of manufacture for network electronic payment instrument and certification of payment and credit collection utilizing a payment); 6,018,724 (Arent); 6,021,202 (Anderson et al., Method and system for processing electronic documents); 6,035,402 (Vaeth et al.); 6,049,786 (Smorodinsky); 6,049,787 (Takahashi, et al.); 6,058,381 (Nelson, Many-to-many payments system for network content materials); 6,061,448 (Smith, et al.); 5,987,132 (Nov. 16, 1999, Rowney, System, method and article of manufacture for conditionally accepting a payment method utilizing an extensible, flexible architecture); 6,057,872 (Candelore); and 6,061,665 (May 9, 2000, Bahreman, System, method and article of manufacture for dynamic negotiation of a network payment framework). See also, Rivest and Shamir, “PayWord and MicroMint: Two Simple Micropayment Schemes” (May 7, 1996); Micro PAYMENT transfer Protocol (MPTP) Version 0.1 (22 Nov. 95) et seq., www.w3.org/pub/WWW/TR/WD-mptp; Common Markup for web Micropayment Systems, www.w3.org/TR/WD-Micropayment-Markup (9 Jun. 99); “Distributing Intellectual Property: a Model of Microtransaction Based Upon Metadata and Digital Signatures”, Olivia, Maurizio, olivia.modlang.denison.edu/˜olivia/RFC/09/, all of which are expressly incorporated herein by reference.
See, also: 4,977,595 (Dec. 11, 1990, Method and apparatus for implementing electronic cash); 5,224,162 (Jun. 29, 1993, Electronic cash system); 5,237,159 (Aug. 17, 1993, Electronic check presentment system); 5,392,353 (2/1995, Morales, TV Answer, Inc. Interactive satellite broadcast network); 5,511,121 (Apr. 23, 1996, Efficient electronic money); 5,621,201 (4/1997, Langhans et al., Visa International Automated purchasing control system); 5,623,547 (Apr. 22, 1997, Value transfer system); 5,679,940 (10/1997, Templeton et al., TeleCheck International, Inc. Transaction system with on/off line risk assessment); 5,696,908 (12/1997, Muehlberger et al., Southeast Phonecard, Inc. Telephone debit card dispenser and method); 5,754,939 (5/1998, Herz et al., System for generation of user profiles for a system for customized electronic identification of desirable objects); 5,768,385 (Jun. 16, 1998, Untraceable electronic cash); 5,799,087 (Aug. 25, 1998, Electronic-monetary system); 5,812,668 (Sep. 22, 1998, System, method and article of manufacture for verifying the operation of a remote transaction clearance system utilizing a multichannel, extensible, flexible architecture); 5,828,840 (Oct. 27, 1998, Server for starting client application on client if client is network terminal and initiating client application on server if client is non network terminal); 5,832,089 (Nov. 3, 1998, Off-line compatible electronic cash method and system); 5,850,446 (Dec. 15, 1998, System, method and article of manufacture for virtual point of sale processing utilizing an extensible, flexible architecture); 5,889,862 (Mar. 30, 1999, Method and apparatus for implementing traceable electronic cash); 5,889,863 (Mar. 30, 1999, System, method and article of manufacture for remote virtual point of sale processing utilizing a multichannel, extensible, flexible architecture); 5,898,154 (Apr. 27, 1999, System and method for updating security information in a time-based electronic monetary system); 5,901,229 (May 4, 1999, Electronic cash implementing method using a trustee); 5,920,629 (Jul. 6, 1999, Electronic-monetary system); 5,926,548 (Jul. 20, 1999, Method and apparatus for implementing hierarchical electronic cash); 5,943,424 (Aug. 24, 1999, System, method and article of manufacture for processing a plurality of transactions from a single initiation point on a multichannel, extensible, flexible architecture); 5,949,045 (Sep. 7, 1999, Micro-dynamic simulation of electronic cash transactions); 5,952,638 (Sep. 14, 1999, Space efficient method of electronic payments); 5,963,648 (Oct. 5, 1999, Electronic-monetary system); 5,978,840 (System, method and article of manufacture for a payment gateway system architecture for processing encrypted payment transactions utilizing a multichannel, extensible, flexible architecture); 5,983,208 (Nov. 9, 1999, System, method and article of manufacture for handling transaction results in a gateway payment architecture utilizing a multichannel, extensible, flexible architecture); 5,987,140 (Nov. 16, 1999, System, method and article of manufacture for secure network electronic payment and credit collection); 6,002,767 (Dec. 14, 1999, System, method and article of manufacture for a modular gateway server architecture); 6,003,765 (Dec. 21, 1999, Electronic cash implementing method with a surveillance institution, and user apparatus and surveillance institution apparatus for implementing the same); 6,021,399 (Feb. 1, 2000, Space efficient method of verifying electronic payments); 6,026,379 (Feb. 15, 2000, System, method and article of manufacture for managing transactions in a high availability system); 6,029,150 (Feb. 22, 2000, Payment and transactions in electronic commerce system); 6,029,151 (Feb. 22, 2000, Method and system for performing electronic money transactions); 6,047,067 (Apr. 4, 2000, Electronic-monetary system); 6,047,887 (Apr. 11, 2000, System and method for connecting money modules); 6,055,508 (Apr. 25, 2000, Method for secure accounting and auditing on a communications network); 6,065,675 (May 23, 2000, Processing system and method for a heterogeneous electronic cash environment); 6,072,870 (Jun. 6, 2000, System, method and article of manufacture for a gateway payment architecture utilizing a multichannel, extensible, flexible architecture), each of which is expressly incorporated herein by reference.
Neural Networks
The resources relating to Neural Networks, listed in the Neural Networks References Appendix, each of which is expressly incorporated herein by reference, provides a sound basis for understanding the field of neural networks (and the subset called artifical neural networks, which distinguish biolofical systems) and how these might be used to solve problems. A review of these references will provide a state of knowledge appropriate for an understanding of aspects of the invention which rely on Neural Networks, and to avoid a prolix discussion of no benefit to those already possessing an appropriate state of knowledge.
Wavelets
The following resources listed in the Wavelets References Appendix relate to Wavelets and wavelet based analysis, each of which is expressly incorporated herein by reference, provides a sound basis for understanding the mathematical basis for wavelet theory and analysis using wavelet transforms and decomposition, and how these might be used to solve problems or extract useful information from a signal. A review of these references will assure a background in this field for an understanding of aspects of the invention which rely on wavelet theory.
Telematics
The resources relating to telematics listed in the Telematics Appendix, each of which is expressly incorporated herein by reference, provides a background in the theory and practice of telematics, as well as some of the underlying technologies. A review of these references is therefore useful in inderstanding practical issues and the context of functions and technologies which may be used in conjunction with the advances set forth herein.
Game Theory
The following resources listed in the Game Theory References Appendix, relating to Game Theory, each of which is expressly incorporated herein by reference, provides a basis for understanding Game Theory and its implications for the design, control, and analysis of systems and networks. A review of these references will assure a background in this field for an understanding of aspects of the invention which relate to game Theory.
Use of Game Theory to Control Ad Hoc Networks
The resources relating to ad hoc networks and game theory listed in the Game Theory and Ad Hoc Networks References Appendix, each of which is expressly incorporated herein by reference, provides a sound basis for understanding the implications of game theory for the design, control and analysis of communications networks, and in particular, ad hoc networks. A review of these references will assure a background in this field for an understanding of aspects of the invention which rely on these topics.
The following patents are expressly incorporated herein by reference: U.S. Pat. Nos. 6,640,145, 6,418,424, 6,400,996, 6,081,750, 5,920,477, 5,903,454, 5,901,246, 5,875,108, 5,867,386, 5,774,357, 6,429,812, and 6,252,544.
This patent builds upon and extends aspects of U.S. Pat. No. 6,252,544 (Hoffberg), Jun. 26, 2001, and 6,429,812, Aug. 6, 2002, which are expressly incorporated herein by reference in its entirety. See, also, U.S. Pat. No. 6,397,141 (Binnig, May 28, 2002, Method and device for signalling local traffic delays), expressly incorporated herein by reference, which relates to a method and an apparatus for signalling local traffic disturbances wherein a decentralised communication between vehicles, which is performed by exchanging their respective vehicle data. Through repeated evaluation of these individual vehide data, each reference vehicle may determine a group of vehicles having relevance for itself from within a maximum group of vehicles and compare the group behavior of the relevant group with its own behavior. The results of this comparison are indicated in the reference vehicle, whereby a homogeneous flow of traffic may be generated, and the occurrence of accidents is reduced. See, also U.S. Pat. Nos. 4,706,086 (November, 1987 Panizza 340/902), and 5,428,544 (June, 1995 Shyu 701/117), 6,473,688 (Kohno, et al., Oct. 29, 2002, Traffic information transmitting system, traffic information collecting and distributing system and traffic information collecting and distributing method), 6,304,758 (October, 2001, Iierbig et al., 701/117); 6,411,221 (January, 2002, Horber, 701/117); 6,384,739 (May, 2002, Robert, Jr., 701/117); 6,401,027 (June, 2002, Xa et al., 701/117); 6,411,889 (June, 2002, Mizunuma et at, 701/117), 6,359,571 (Endo, et al., Mar. 19, 2002, Broadcasting type information providing system and travel environment information collecting device); 6,338,011 (Furst, et al., Jan. 8, 2002, Method and apparatus for sharing vehicle telemetry data among a plurality of users over a communications network); 5,131,020 (July, 1992, Liebesny et al., 455/422); 5,164,904 (November, 1992, Sumner, 701/117); 5,539,645 (July, 1996, Mandhyan et al., 701/119); 5,594,779 (January, 1997, Goodman, 455/4); 5,689,252 (November, 1997, Ayanoglu et al., 340/991); 5,699,056 (December, 1997, Yoshida, 340/905); 5,864,305 (January, 1999, Rosenquist, 340/905); 5,889,473 (March, 1999, Wicks, 340/825); 5,919,246 (July, 1999, Waizmann et al., 701/209); 5,982,298 (November, 1999, Lappenbusch et al., 340/905); 4,860,216 (August, 1989, Linsenmayer, 342/159); 5,302,955 (April, 1994, Schutte et al., 342/59); 5,809,437 (September, 1998, Breed, 701/29); 6,115,654 (September, 2000, Eid et al., 701/34); 6,173,159 (January, 2001, Wright et al., 455/66); and Japanese Patent Document Nos. JP 9-236650 (September, 1997); 10-84430 (March, 1998); 5-151496 (June, 1993); and 11-183184 (July, 1999), each of which is expressly incorporated herein by reference. See also: Martin E. Liggins, II, et al., “Distributed Fusion Architectures and Algorithms for Target Tracking”, Proceedings of the IEEE, vol. 85, No. 1, (XP-002166088) January, 1997, pp. 95-106; D. M. Hosmer, “Data-Linked Associate Systems”, 1994 IEEE International Conference on Systems, Man, and Cybernetics. Humans, Information and Technology (Cat. No. 94CH3571-5), Proceedings of IEEE International Conference on Systems, Man and Cybernetics, San Antonio, Tex., vol. 3, (XP-002166089) (1994), pp. 2075-2079.
One aspect of the invention provides a communications system, method and infrastructure. According to one preferred embodiment, an ad hoc, self organizing, cellular radio system (sometimes known as a “mesh network”) is provided. Advantageously, high gain antennas are employed, preferably electronically steerable antennas, to provide efficient communications and to increase communications bandwidth, both between nodes and for the system comprising a plurality of nodes communicating with each other. See, U.S. Pat. No. 6,507,739 (Gross, et al., Jan. 14, 2003), expressly incorporated herein by reference.
In general, time-critical, e.g., voice communications require tight routing to control communications latency. On the other hand, non-time critical communications generally are afforded more leeway in terms of communications pathways, including a number of “hops”, retransmission latency, and out-of-order packet communication tolerance, between the source and destination or fixed infrastructure, and quality of communication pathway. Further, it is possible to establish redundant pathways, especially where communications bandwidth is available, multiple paths possible, and no single available path meets the entire communications requirements or preferences.
Technologies for determining a position of a mobile device are also well known. Most popular are radio triangulation techniques, including artificial satellite and terrestrial transmitters or receivers, dead reckoning and inertial techniques. Advantageously, a satellite-based or augmented satellite system, although other suitable geolocation systems are applicable.
Navigation systems are also well known. These systems generally combine a position sensing technology with a geographic information system (GIS), e.g., a mapping database, to assist navigation functions. Systems which integrate GPS, GLONASS, LORAN or other positioning systems into vehicular guidance systems are well known, and indeed navigational purposes were prime motivators for the creation of these systems.
Environmental sensors are well known. For example, sensing technologies for temperature, weather, object proximity, location and identification, vehicular traffic and the like are well developed. In particular, known systems for analyzing vehicular traffic patterns include both stationary and mobile sensors, and networks thereof. Most often, such networks provide a stationary or centralized system for analyzing traffic information, which is then broadcast to vehicles.
Encryption technologies are well known and highly developed. These are generally classified as being symmetric key, for example the Data Encryption Standard (DES), and the more recent Advanced Encryption Standard (AES), in which the same key is used for encryption as decryption, and asymmetric key cryptography, in which different and complementary keys are used to encrypt and decrypt, in which the former and the latter are not derivable from each other (or one from the other) and therefore can be used for authentication and digital signatures. The use of asymmetric keys allows a so-called public key infrastructure, in which one of the keys is published, to allow communications to be directed to a possessor of a complementary key, and/or the identity of the sender of a message to be verified. Typical asymmetric encryption systems include the Rivest-Shamir-Adelman algorithm (RSA), the Diffie-Hellman algorithm (DH), elliptic curve encryption algorithms, and the so-called Pretty Good Privacy (PGP) algorithm.
One embodiment of the invention provides a system that analyzes both a risk and an associated reliability. Another embodiment of the invention communicates the risk and associated reliability in a manner for efficient human comprehension, especially in a distracting environment. See, U.S. Pat. Nos. 6,201,493; 5,977,884; 6,118,403; 5,982,325; 5,485,161; WO0077539, each of which is expressly incorporated herein by reference, and the Uniden GPSRD (see Uniden GPSRD User's Manual, expressly incorporated herein by reference). See, also U.S. Pat. Nos. 5,650,770; 5,450,329; 5,504,482; 5,504,491; 5,539,645; 5,929,753; 5,983,161; 6,084,510; 6,255,942; 6,225,901; 5,959,529; 5,752,976; 5,748,103; 5,720,770; 6,005,517; 5,805,055; 6,147,598; 5,687,215; 5,838,237; 6,044,257; 6,144,336; 6,285,867; 6,340,928; 6,356,822; 6,353,679 each of which is expressly incorporated herein by reference.
Statistical Analysis
It is understood that the below analysis and analytical tools, as well as those known in the art, may be used individually, in sub-combination, or in appropriate combination, to achieve the goals of the invention. These techniques may be implemented in dedicated or reprogrammable/general purpose hardware, and may be employed for low level processing of signals, such as in digital signal processors, within an operating system or dynamic linked libraries, or within application software. Likewise, these techniques may be applicable, for example, to low level data processing, system-level data processing, or user interface data processing.
A risk and reliability communication system may be useful, for example, to allow a user to evaluate a set of events in statistical context. Most indicators present data by means of a logical indicator or magnitude, as a single value. Scientific displays may provide a two-dimensional display of a distribution, but these typically require significant user focus to comprehend, especially where a multimodal distribution is represented. User displays of a magnitude or binary value typically do not provide any information about a likelihood of error. Thus, while a recent positive warning of the existence of an event may be a reliable indicator of the actual existence of the event, the failure to warn of an event does not necessarily mean that the event does not exist. Further, as events age, their reliability often decreases.
A Bayesian network is a representation of the probabilistic relationships among distinctions about the world. Each distinction, sometimes called a variable, can take on one of a mutually exclusive and exhaustive set of possible states. Associated with each variable in a Bayesian network is a set of probability distributions. Using conditional probability notation, the set of probability distributions for a variable can be denoted by p(xi|πi,X), where “p” refers to the probability distribution, where “πi” denotes the parents of variable Xi and where “X” denotes the knowledge of the expert. The Greek letter “X” indicates that the Bayesian network reflects the knowledge of an expert in a given field. Thus, this expression reads as follows: the probability distribution for variable Xi given the parents of Xi and the knowledge of the expert. For example, Xi is the parent of X2. The probability distributions specify the strength of the relationships between variables. For instance, if Xi has two states (true and false), then associated with Xi is a single probability distribution p(xi|X)p and associated with X2 are two probability distributions p(xi|X1=t, X) and p(xi|X2=t,X).
A Bayesian network is expressed as an acyclic-directed graph where the variables correspond to nodes and the relationships between the nodes correspond to arcs. The arcs in a Bayesian network convey dependence between nodes. When there is an arc between two nodes, the probability distribution of the first node depends upon the value of the second node when the direction of the arc points from the second node to the first node. Missing arcs in a Bayesian network convey conditional independencies. However, two variables indirectly connected through intermediate variables are conditionally dependent given lack of knowledge of the values (“states”) of the intermediate variables. In other words, sets of variables X and Y are said to be conditionally independent, given a set of variables Z, if the probability distribution for X given Z does not depend on Y. If Z is empty, however, X and Y are said to be “independent” as opposed to conditionally independent. If X and Y are not conditionally independent, given Z, then X and Y are said to be conditionally dependent given Z.
The variables used for each node may be of different types. Specifically, variables may be of two types: discrete or continuous. A discrete variable is a variable that has a finite or countable number of states, whereas a continuous variable is a variable that has an effectively infinite number of states. An example of a discrete variable is a Boolean variable. Such a variable can assume only one of two states: “true” or “false.” An example of a continuous variable is a variable that may assume any real value between −1 and 1. Discrete variables have an associated probability distribution. Continuous variables, however, have an associated probability density function (“density”). Where an event is a set of possible outcomes, the density p(x) for a variable “x” and events “a” and “b” is defined as:
      p    ⁡          (      x      )        =                         a        ⟶        b              ⁡          [                        p          ⁡                      (                          a              ≤              x              ≤              b                        )                                                          (                          a              -              b                        )                                        ]      
where p(a≦x≦b) is the probability that x lies between a and b. Conventional systems for generating Bayesian networks cannot use continuous variables in their nodes.
There are two conventional approaches for constructing Bayesian networks. Using the first approach (“the knowledge-based approach”), first the distinctions of the world that are important for decision making are determined. These distinctions correspond to the variables of the domain of the Bayesian network. The “domain” of a Bayesian network is the set of all variables in the Bayesian network. Next the dependencies among the variables (the arcs) and the probability distributions that quantify the strengths of the dependencies are determined.
In the second approach (“called the data-based approach”), the variables of the domain are first determined. Next, data is accumulated for those variables, and an algorithm is applied that creates a Bayesian network from this data. The accumulated data comes from real world instances of the domain. That is, real world instances of decision making in a given field. Conventionally, this second approach exists for domains containing only discrete variables.
U.S. application Ser. No. 08/240,019 filed May 9, 1994 entitled “Generating Improved Belief Networks” describes a system and method for generating Bayesian networks (also known as “belief networks”) that utilize both expert data received from an expert (“expert knowledge”) and data received from real world instances of decisions made (“empirical data”). By utilizing both expert knowledge and empirical data, the network generator provides an improved Bayesian network that may be more accurate than conventional Bayesian networks or provide other advantages, e.g., ease of implementation and lower reliance on “expert” estimations of probabilities. Likewise, it is known to initiate a network using estimations of the probabilities (and often the relevant variables), and subsequently use accumulated data to refine the network to increase its accuracy and precision.
Expert knowledge consists of two components: an equivalent sample size or sizes (“sample size”), and the prior probabilities of all possible Bayesian-network structures (“priors on structures”). The effective sample size is the effective number of times that the expert has rendered a specific decision. For example, a doctor with 20 years of experience diagnosing a specific illness may have an effective sample size in the hundreds. The priors on structures refers to the confidence of the expert that there is a relationship between variables (e.g., the expert is 70% sure that two variables are related). The priors on structures can be decomposed for each variable-parent pair known as the “prior probability” of the variable-parent pair. Empirical data is typically stored in a database. The database may contain a list of the observed state of some or all of the variables in the Bayesian network. Each data entry constitutes a case. When one or more variables are unobserved in a case, the case containing the unobserved variable is said to have “missing data.” Thus, missing data refers to when there are cases in the empirical data database that contain no observed value for one or more of the variables in the domain. An assignment of one state to each variable in a set of variables is called an “instance” of that set of variables. Thus, a “case” is an instance of the domain. The “database” is the collection of all cases.
Therefore, it is seen that Bayesian networks can be used to probabilistically model a problem, in a mathematical form. This model may then be analyzed to produce one or more outputs representative of the probability that a given fact is true, or a probability density distribution that a variable is at a certain value.
A review of certain statistical methods is provided below for the convenience of the reader, and is not intended to limit the scope of methods, of statistical of other type, which may be employed in conjunction with the system and method according to the present invention. It is understood that these mathematical models and methods may be implemented in known manner on general purpose computing platforms, for example as a compiled application in a real-time operating system such as RT Linux, QNX, versions of Microsoft Windows, or the like. Further, these techniques may be implemented as applets operating under Matlab or other scientific computing platform. Alternately, the functions may be implemented natively in an embedded control system or on a microcontroller.
It is also understood that, while the mathematical methods are capable of producing precise and accurate results, various simplyfying presumptions and truncations may be employed to increase the tractability of the problem to be solved. Further, the outputs generally provided according to preferred embodiments of the present invention are relatively low precision, and therefore higher order approximation of the analytic solution, in the case of a rapidly convergent calculation, will often be sufficient.
A time domain process demonstrates a Markov property if the conditional probability density of the current event, given all present and past events, depends only on the jth most recent events. If the current event depends solely on the most recent past event, then the process is a first order Markov process. There are three key problems in HMM use: evaluation, estimation, and decoding. The evaluation problem is that given an observation sequence and a model, what is the probability that the observed sequence was generated by the model (Pr(O|λ)). If this can be evaluated for all competing models for an observation sequence, then the model with the highest probability can be chosen for recognition.
Pr(O|λ) can be calculated several ways. The naive way is to sum the probability over all the possible state sequences in a model for the observation sequence:
      Pr    ⁡          (              O        |        λ            )        =         ⁢                  ∑                  all          ⁢                                          ⁢          S                                              ⁢                          ⁢                        ∏                      t            =            1                    T                ⁢                                  ⁢                              a                                          s                                  t                  -                  1                                            ⁢                              s                t                                              ⁢                                    b                              s                t                                      ⁡                          (                                                O                  t                                ⁢                                           )                                          
However, this method is exponential in time, so the more efficient forward-backward algorithm is used in practice. The following algorithm defines the forward variable a and uses it to generate Pr(O|λ) (π are the initial state probabilities, a are the state transition probabilities, and b are the output probabilites).                a1(i)=πibi(Oi), for all states i (if        
      i    ∈          S      I        ,                    π        i            =                        1                      a            I                          ⁢                     ;  otherwise πi=0)                Calculating α( ) along the time axis, for t=2, . . . , T, and all states j, compute        
            α      i        ⁡          (      j      )        =            [                        ∑          i                ⁢                              α                          i              -              1                                ⁢                   ⁢                      (            i            )                    ⁢                      α            ij                              ]        ⁢                  b        j            ⁡              (                              O            i                    ⁢                         )                            Final probability is given by        
      Pr    ⁡          (              O        |        λ            )        =                    ∑                  i          ⁢                                          ∈                                          ⁢          Sp                           ⁢                ⁢                        α          T                ⁡                  (          i          )                    
The first step initializes the forward variable with the initial probability for all states, while the second step inductively steps the forward variable through time. The final step gives the desired result Pr(O|λ), and it can be shown by constructing a lattice of states and transitions through time that the computation is only order O(N2T). The backward algorithm, using a process similar to the above, can also be used to compute Pr(O|∥) and defines the convenience variable β.
The estimation problem concerns how to adjust λ to maximize Pr(O|∥) given an observation sequence O. Given an initial model, which can have flat probabilities, the forward-backward algorithm allows us to evaluate this probability. All that remains is to find a method to improve the initial model. Unfortunately, an analytical solution is not known, but an iterative technique can be employed.
Using the actual evidence from the training data, a new estimate for the respective output probability can be assigned:
                    b        _            j        ⁡          (      k      )        =                                          ∑                                                          ⁢                                          ⁢          t                ⁢                                  ∈                                  ⁢                  O          t                    =                        v          k                ⁢                              γ            t                    ⁡                      (            j            )                                              ∑                  t          =          1                T            ⁢                          ⁢                        γ          t                ⁡                  (          j          )                    
where γi(i) is defined as the posterior probability of being in state i at time t given the observation sequence and the model. Similarly, the evidence can be used to develop a new estimate of the probability of a state transition ( αij) and initial state probabilities ( πi).
Thus all the components of model (λ) can be re-estimated. Since either the forward or backward algorithm can be used to evaluate Pr(O|∥) versus the previous estimation, the above technique can be used iteratively to converge the model to some limit. While the technique described only handles a single observation sequence, it is easy to extend to a set of observation sequences.
The Hidden Markov Model is a finite set of states, each of which is associated with a (generally multidimensional) probability distribution jedlik.phy.bme.hu/˜gerjanos/HMM/node4.html-r4#r4. Transitions among the states are governed by a set of probabilities called transition probabilities. In a particular state an outcome or observation can be generated, according to the associated probability distribution. It is only the outcome, not the state visible to an external observer and therefore states are “hidden” to the outside; hence the name Hidden Markov Model.
In order to define an HMM completely, following elements are needed.                The number of states of the model, N.        The number of observation symbols in the alphabet, M. If the observations are continuous then M is infinite.        A set of state transition probabilities Λ={aij}aij=P{qt+1=j|qt=i}, 1≦i, j≦N        
where qt denotes the current state.
Transition probabilities should satisfy the normal stochastic constraints, aij≧0, 1≦i, j≦N
and
          ⁢                              ∑                      j            =            1                    N                ⁢                  α          ij                    =      1        ,                  ⁢          1      ≤      i      ≤      N                      A probability distribution in each of the states, B={bj(k)}.        
bj(k)=p{ot=vk|qt=j}1≦j≦N, 1≦k≦M
where vk denotes the kth observation symbol in the alphabet, and ot the current parameter vector.
Following stochastic constraints must be satisfied.
          ⁢                    b        j            ≥      0        ,                  ⁢          1      ≤      j      ≤      N        ,                  ⁢                  1        ≤        k        ≤                  M          ⁢                                          ⁢          and          ⁢                                          ⁢                                    ∑                              k                =                1                            M                        ⁢                          b                              j                ⁡                                  (                  k                  )                                                                        =      1        ,                  ⁢          1      ≤      j      ≤      N      
If the observations are continuous then we will have to use a continuous probability density function, instead of a set of discrete probabilities. In this case we specify the parameters of the probability density function. Usually the probability density is approximated by a weighted sum of M Gaussian distributions
            𝒩      ⇔                        b          b                ⁡                  (                      o            t                    )                      =                  ∑                  m          =          1                M            ⁢                        c          jm                ⁢                  𝒩          ⁡                      (                                          μ                jm                            ,                                                ∑                  jm                                ⁢                                  ,                                      o                    t                                                                        )                                ,  where  ,
cjm=weigting coefficients
μjm=mean vectors
Σjm=Covariance matrices
cjm should satisfy the stochastic constrains, cjm≧0, 1≦j≦N, 1≦m≦M, and
          ⁢                              ∑                      m            =            1                    M                ⁢                  c          jm                    =      1        ,                  ⁢          1      ≤      j      ≤      N                      The initial state distribution, π={πi}.where, πi=p{q1=i}, 1≦i≦N        
Therefore we can use the compact notation λ=(Λ,B,π)
to denote an HMM with discrete probability distributions, while λ=(Λ, cjm,μjm, Σjm, π) to denote one with continuous densities.
For the sake of mathematical and computational tractability, following assumptions are made in the theory of HMMs.
(1) The Markov Assumption
As given in the definition of HMMs, transition probabilities are defined as, aij=p{qt+1=j|qt=i}.
In other words it is assumed that the next state is dependent only upon the current state. This is called the Markov assumption and the resulting model becomes actually a first order HMM.
However generally the next state may depend on past k states and it is possible to obtain a such model, called an kth order HMM by defining the transition probabilities as follows.ai1i2. . . ikj=p{qt+1=j|qt=i1,qt−1=i2, . . . ,qt−k+1=ik}, 1≦i1,i2, . . . ,ik, j≦N. 
But it is seen that a higher order HMM will have a higher complexity. Even though the first order HMMs are the most common, some attempts have been made to use the higher order HMMs too.
(2) The Stationarity Assumption
Here it is assumed that state transition probabilities are independent of the actual time at which the transitions takes place. Mathematically, p{qt1+1=j|qt1=i}=p{qt2+1=j|qt2=i}, for any tl and t2.
(3) The Output Independence Assumption
This is the assumption that current output (observation) is statistically independent of the previous outputs (observations). We can formulate this assumption mathematically, by considering a sequence of observations, O=o1, o2, . . . oT . . . .
Then according to the assumption for an HMM λ,
          ⁢            p      ⁢              {                              O            ❘                          q              1                                ,                      q            2                    ,          …          ⁢                                          ,                      q            T                    ,          λ                }              =                  ∏                  t          =          1                T            ⁢                          ⁢                        p          ⁡                      (                                                            o                  t                                ❘                                  q                  t                                            ,              λ                        )                          .            
However unlike the other two, this assumption has a very limited validity. In some cases this assumption may not be fair enough and therefore becomes a severe weakness of the HMMs.
A Hidden Markov Model (HMM) is a Markov chain, where each state generates an observation. You only see the observations, and the goal is to infer the hidden state sequence. HMMs are very useful for time-series modeling, since the discrete state-space can be used to approximate many non-linear, non-Gaussian systems.
HMMs and some common variants (e.g., input-output HMMs) can be concisely explained using the language of Bayesian Networks, as we now demonstrate.
Consider the Bayesian network in FIG. 1, which represents a hidden Markov model (HMM). (Circles denote continuous-valued random variables, squares denote discrete-valued, clear means hidden, shaded means observed.) This encodes the joint distribution P(Q,Y)=P(Q1)P(Y1|Q1)P(Q2|Q1)P(Y2|Q2) . . . .
For a sequence of length T, we simply “unroll” the model for T time steps. In general, such a dynamic Bayesian network (DBN) can be specified by just drawing two time slices (this is sometimes called a 2TBN)—the structure (and parameters) are assumed to repeat.
The Markov property states that the future is independent of the past given the present, i.e., Q{t+1}\ indep Q{t-1}|Qt. We can parameterize this Markov chain using a transition matrix, Mij=P(Q{t+1}=j|Qt=i), and a prior distribution, πi=P(Q1=i).
We have assumed that this is a homogeneous Markov chain, i.e., the parameters do not vary with time. This assumption can be made explicit by representing the parameters as nodes: see FIG. 2: P1 represents π, P2 represents the transition matrix, and P3 represents the parameters for the observation model. If we think of these parameters as random variables (as in the Bayesian approach), parameter estimation becomes equivalent to inference. If we think of the parameters as fixed, but unknown, quantities, parameter estimation requires a separate learning procedure (usually EM). In the latter case, we typically do not represent the parameters in the graph; shared parameters (as in this example) are implemented by specifying that the corresponding CPDs are “tied”.
An HMM is a hidden Markov model because we don't see the states of the Markov chain, Qt, but just a function of them, namely Yt. For example, if Yt is a vector, we might define P(Yt=y|Qt=i)=N(y,μiσi). A richer model, widely used in speech recognition, is to model the output (conditioned on the hidden state) as a mixture of Gaussians. This is shown in FIG. 3.
Some popular variations on the basic HMM theme are illustrated in FIGS. 4A, 4B and 4C, which represent, respectively, an input-output HMM, a factorial HMM, and a coupled HMM. (In the input-output model, the CPD P(Q|U) could be a softmax function, or a neural network.) Software is available to handle inference and learning in general Bayesian networks, making all of these models trivial to implement.
It is noted that the parameters may also vary with time. This does not violate the presumptions inherent in an HMM, but rather merely complicates the analysis since a static simplifying presumption may not be made. A discrete-time, discrete-space dynamical system governed by a Markov chain emits a sequence of observable outputs: one output (observation) for each state in a trajectory of such states. From the observable sequence of outputs, we may infer the most likely dynamical system. The result is a model for the underlying process. Alternatively, given a sequence of outputs, we can infer the most likely sequence of states. We might also use the model to predict the next observation or more generally a continuation of the sequence of observations.
The Evaluation Problem and the Forward Algorithm
We have a model λ=(Λ,B,π) and a sequence of observations O=o1, o2, . . . , oT, and p{O|λ} must be found. We can calculate this quantity using simple probabilistic arguments. But this calculation involves number of operations in the order of NT. This is very large even if the length of the sequence, T is moderate. Therefore we have to look for an other method for this calculation. Fortunately there exists one which has a considerably low complexity and makes use an auxiliary variable, αt(i) called forward variable.
The forward variable is defined as the probability of the partial observation sequence O=o1, o2, . . . , oT, when it terminates at the state i. Mathematically,αt(i)=p{O=o1,o2, . . . ,oT,qt=i|λ}  (1.1)
Then it is easy to see that following recursive relationship holds.
                                                                        α                                  t                  +                  1                                            ⁡                              (                j                )                                      =                                                            b                  j                                ⁡                                  (                                      o                                          t                      +                      1                                                        )                                            ⁢                                                ∑                                      i                    =                    1                                    N                                ⁢                                                                            α                      t                                        ⁡                                          (                      i                      )                                                        ⁢                                      a                    ij                                                                                ,                                          ⁢                      1            ≤            j            ≤            N                    ,                                          ⁢                      1            ≤            t            ≤                          T              -              1                                      ⁢                                  ⁢                  where          ,                                                    α                1                            ⁡                              (                j                )                                      =                                          π                j                            ⁢                                                b                  j                                ⁡                                  (                                      o                    1                                    )                                                              ,                                          ⁢                      1            ≤            j            ≤            N                                      1.2      
Using this recursion we can calculate αT(i), 1≦i≦N and then the required probability is given by,
                              p          ⁢                      {                          O              ❘              λ                        }                          =                              ∑                          i              =              1                        N                    ⁢                                                    α                T                            ⁡                              (                i                )                                      .                                      1.3      
The complexity of this method, known as the forward algorithm is proportional to N2T, which is linear with respect to T whereas the direct calculation mentioned earlier, had an exponential complexity.
In a similar way we can define the backward variable βt(i) as the probability of the partial observation sequence ot+1, ot+2, . . . , oT, given that the current state is i. Mathematically,βt(i)=p{ot+1,ot+2, . . . ,oT|qt=i,λ}  (1.4)
As in the case of αt (i) there is a recursive relationship which can be used to calculate βi(i) efficiently.
                                                        β              t                        ⁡                          (              i              )                                =                                    ∑                              j                =                1                            N                        ⁢                                                            β                                      t                    +                    1                                                  ⁡                                  (                  j                  )                                            ⁢                              a                ij                            ⁢                                                b                  j                                ⁡                                  (                                      o                                          t                      +                      1                                                        )                                                                    ,                                  ⁢                  1          ≤          t          ≤                      T            -            1                                      1.5      
where, βT(i)=1, 1≦i≦N
Further we can see that,αt(i)βt(i)=p{O,qt=i|λ}, 1≦i≦N, 1≦t≦T  (1.6)
Therefore this gives another way to calculate p{O|λ}, by using both forward and backward variables as given in eqn. 1.7. See, jedlik.phy.bme.hu/˜gerjanos/HMM/, expressly incorporated herein by reference.
                              p          ⁢                      {                          O              ❘              λ                        }                          =                                            ∑                              i                =                1                            N                        ⁢                          p              ⁢                              {                                  O                  ,                                                            q                      t                                        =                                          i                      ❘                      λ                                                                      }                                              =                                    ∑                              i                =                1                            N                        ⁢                                                            α                  t                                ⁡                                  (                  i                  )                                            ⁢                                                β                  t                                ⁡                                  (                  i                  )                                                                                1.7      
Eqn. 1.7 is very useful, specially in deriving the formulas required for gradient based training.
The Decoding Problem and the Viterbi Algorithm
While the estimation and evaluation processes described above are sufficient for the development of an HMM system, the Viterbi algorithm provides a quick means of evaluating a set of HMM's in practice as well as providing a solution for the decoding problem. In decoding, the goal is to recover the state sequence given an observation sequence. The Viterbi algorithm can be viewed as a special form of the forward-backward algorithm where only the maximum path at each time step is taken instead of all paths. This optimization reduces computational load and allows the recovery of the most likely state sequence. The steps to the Viterbi are                Initialization. For all states i, δ1 (i)=πibi(O1); ψi(i)=0        Recursion. From t=2 to T and for all states j, δi(j)=Maxi[δt−1(i)aij]bj(Ot); ψt(j)=arg maxi [δt−1 (i)aij]        Termination. P=Maxs∈Sp[δT(s)]; ST=arg maxs∈Sp[δT(s)]        Recovering the state sequence. From t=T−1 to 1, st=ψt+1 (st+1)        
In many HMM system implementations, the Viterbi algorithm is used for evaluation at recognition time. Note that since Viterbi only guarantees the maximum of Pr(O, S|δ) over all state sequences S (as a result of the first order Markov assumption) instead of the sum over all possible state sequences, the resultant scores are only an approximation.
So far the discussion has assumed some method of quantization of feature vectors into classes. However, instead of using vector quantization, the actual probability densities for the features may be used. Baum-Welch, Viterbi, and the forward-backward algorithms can be modified to handle a variety of characteristic densities. In this context, however, the densities will be assumed to be Gaussian. Specifically,
            b      j        ⁡          (              O        b            )        =            1                                                  (                              2                ⁢                                                                  ⁢                π                            )                        n                    ⁢                                                σ              j                                                        ⁢          ⅇ                        1          2                ⁢                              (                                          O                t                            -                              μ                j                                      )                    t                ⁢                              σ            j                          -              1                                ⁡                      (                                          O                t                            -                              μ                j                                      )                              
Initial estimations of μ and σ may be calculated by dividing the evidence evenly among the states of the model and calculating the mean and variance in the normal way. Whereas flat densities were used for the initialization step before, the evidence is used here. Now all that is needed is a way to provide new estimates for the output probability. We wish to weight the influence of a particular observation for each state based on the likelihood of that observation occurring in that state. Adapting the solution from the discrete case yields
            μ      _        j    =                                          ∑                          t              =              1                        T                    ⁢                                                    γ                t                            ⁡                              (                j                )                                      ⁢                          O              t                                                            ∑                          t              =              1                        T                    ⁢                                    γ              t                        ⁡                          (              j              )                                          ⁢                          ⁢      and      ⁢                          ⁢                        σ          _                j              =                            ∑                      t            =            1                    T                ⁢                                            γ              t                        ⁡                          (              j              )                                ⁢                      (                                          O                t                            -                                                μ                  _                                j                                      )                    ⁢                                    (                                                O                  t                                -                                                      μ                    _                                    j                                            )                        t                                                ∑                      t            =            1                    T                ⁢                              γ            t                    ⁡                      (            j            )                              
For convenience, μj is used to calculate σj instead of the re-estimated μj. While this is not strictly proper, the values are approximately equal in contiguous iterations and seem not to make an empirical difference. See, www-white.media.mitedu/˜testame/asl/asl-tr375, expressly incorporated herein by reference. Since only one stream of data is being used and only one mixture (Gaussian density) is being assumed, the algorithms above can proceed normally, incorporating these changes for the continuous density case.
We want to find the most likely state sequence for a given sequence of observations, O=o1, o2, . . . , oT and a model, λ=(Λ,B,π)
The solution to this problem depends upon the way “most likely state sequence” is defined. One approach is to find the most likely state qt at t=t and to concatenate all such ‘qt’s. But sometimes, this method does not give a physically meaningful state sequence. Therefore we would seek another method which has no such problems.
In this method, commonly known as Viterbi algorithm, the whole state sequence with the maximum likelihood is found. In order to facilitate the computation we define an auxiliary variable,
                    δ        t            ⁡              (        i        )              =                  max                              q            1                    ⁢                      q            2                    ⁢          …          ⁢                                          ⁢                      q                          i              -              1                                          ⁢              p        ⁢                  {                                    q              1                        ,                          q              2                        ,            …            ⁢                                                  ,                          q                              t                -                1                                      ,                                          q                t                            =              i                        ,                          o              1                        ,                          o              2                        ,            …            ⁢                                                  ,                                          o                                  t                  -                  1                                            ❘              λ                                }                      ,
which gives the highest probability that partial observation sequence and state sequence up to t=t can have, when the current state is i.
It is easy to observe that the following recursive relationship holds.
                                                                        δ                                  t                  +                  1                                            ⁡                              (                j                )                                      =                                                            b                  j                                ⁡                                  (                                      o                                          t                      +                      1                                                        )                                            ⁡                              [                                                      max                                          1                      ≤                      i                      ≤                      N                                                        ⁢                                                                                    δ                        t                                            ⁡                                              (                        i                        )                                                              ⁢                                          a                      ij                                                                      ]                                              ,                      1            ≤            i            ≤            N                    ,                      1            ≤            t            ≤                          T              -              1                                      ⁢                                  ⁢                  where          ,                                          ⁢                                                    δ                1                            ⁡                              (                j                )                                      =                                          π                j                            ⁢                                                b                  j                                ⁡                                  (                                      o                    1                                    )                                                              ,                      1            ≤            j            ≤            N                                              (        1.8        )            
So the procedure to find the most likely state sequence starts from calculation of δT(j), 1≦j≦N using recursion in 1.8, while always keeping a pointer to the “winning state” in the maximum finding operation. Finally the state j*, is found where
            j      *        =          arg      ⁢                          ⁢                        max          ⁢                                                          1          ≤          j          ≤          N                    ⁢                        δ          T                ⁡                  (          j          )                      ,and starting from this state, the sequence of states is back-tracked as the pointer in each state indicates. This gives the required set of states.
This whole algorithm can be interpreted as a search in a graph whose nodes are formed by the states of the HMM in each of the time instant t, 1≦t≦T.
The Learning Problem
Generally, the learning problem is how to adjust the HMM parameters, so that the given set of observations (called the training set) is represented by the model in the best way for the intended application. Thus it would be clear that the “quantity” we wish to optimize during the learning process can be different from application to application. In other words there may be several optimization criteria for learning, out of which a suitable one is selected depending on the application.
There are two main optimization criteria found in ASR literature; Maximum Likelihood (ML) and Maximum Mutual Information (MMI). The solutions to the learning problem under each of those criteria is described below.
Maximum Likelihood (ML) Criterion
In ML we try to maximize the probability of a given sequence of observations OW, belonging to a given class w, given the HMM λw of the class w, with respect to the parameters of the model λw. This probability is the total likelihood of the observations and can be expressed mathematically as Ltot=p{OW|λw}.
However since we consider only one class w at a time we can drop the subscript and superscript ‘w’s. Then the ML criterion can be given as,Ltot=p{O|λ}  (1.9)
However there is no known way to analytically solve for the model λ=(Λ,B,π), which maximize the quantity Ltot. But we can choose model parameters such that it is locally maximized, using an iterative procedure, like Baum-Welch method or a gradient based method, which are described below.
Baum-Welch Algorithm
This method can be derived using simple “occurrence counting” arguments or using calculus to maximize the auxiliary quantity
      Q    ⁡          (              λ        ,                  λ          _                    )        =            ∑      q        ⁢          p      ⁢              {                              q            ❘            O                    ,          λ                }            ⁢              log        ⁡                  [                      p            ⁢                          {                              O                ,                q                ,                                  λ                  _                                            }                                ]                    over λ[jedlik.phy.bme.hu/˜gerjanos/HMM/node11.html-r4#r4], [jedlik.phy.bme.hu/˜gerjanos/HMM/node11.html-r21#r21, p 344-346,]. A special feature of the algorithm is the guaranteed convergence.
To describe the Baum-Welch algorithm, (also known as Forward-Backward algorithm), we need to define two more auxiliary variables, in addition to the forward and backward variables defined in a previous section. These variables can however be expressed in terms of the forward and backward variables.
First one of those variables is defined as the probability of being in state i at t=t and in state j at t=t+1. Formally,ξt(i,j)=p{qt=i,qt+1=j|O,λ}  (1.10)
This is the same as,
                                          ξ            t                    ⁡                      (                          i              ,              j                        )                          =                              p            ⁢                          {                                                                    q                    t                                    =                  i                                ,                                                      q                                          t                      +                      1                                                        =                  j                                ,                                  O                  ❘                  λ                                            }                                            p            ⁢                          {                              O                ❘                λ                            }                                                          (        1.11        )            
Using forward and backward variables this can be expressed as,
                                          ξ            t                    ⁡                      (                          i              ,              j                        )                          =                                                            α                t                            ⁡                              (                i                )                                      ⁢                          a              ij                        ⁢                                          β                                  t                  +                  1                                            ⁡                              (                j                )                                      ⁢                                          b                j                            ⁡                              (                                  o                                      t                    +                    1                                                  )                                                                        ∑                              i                =                1                            N                        ⁢                                          ∑                                  j                  =                  1                                N                            ⁢                                                                    α                    t                                    ⁡                                      (                    i                    )                                                  ⁢                                  a                  ij                                ⁢                                                      β                                          t                      +                      1                                                        ⁡                                      (                    j                    )                                                  ⁢                                                      b                    j                                    ⁡                                      (                                          o                                              t                        +                        1                                                              )                                                                                                          (        1.12        )            
The second variable is the a posteriori probability,λt(i)=p{qt=i|O,λ}  (1.13)that is the probability of being in state i at t=t, given the observation sequence and the model. In forward and backward variables this can be expressed by,
                                          γ            t                    ⁡                      (            i            )                          =                  [                                                                      α                  t                                ⁡                                  (                  i                  )                                            ⁢                                                β                  t                                ⁡                                  (                  i                  )                                                                                    ∑                                  i                  =                  1                                N                            ⁢                                                                    α                    t                                    ⁡                                      (                    i                    )                                                  ⁢                                                      β                    t                                    ⁡                                      (                    i                    )                                                                                ]                                    (        1.14        )            
One can see that the relationship between γt (i) and ƒt(i, j) is given by,
                                                        γ              t                        ⁡                          (              i              )                                =                                    ∑                              j                =                1                            N                        ⁢                                          ξ                t                            ⁡                              (                                  i                  ,                  j                                )                                                    ,                  1          ≤          i          ≤          N                ,                  1          ≤          t          ≤          M                                    (        1.15        )            
Now it is possible to describe the Baum-Welch learning process, where parameters of the HMM is updated in such a way to maximize the quantity, p{O|λ}. Assuming a starting model λ=(Λ,B,π), we calculate the ‘αα’s and ‘β’s using the recursions 1.5 and 1.2, and then ‘ƒ’s and ‘γ’s using 1.12 and 1.15. Next step is to update the HMM parameters according to eqns 1.16 to 1.18, known as re-estimation formulas.
                                                        π              _                        i                    =                                    γ              1                        ⁡                          (              i              )                                      ,                  1          ≤          i          ≤          N                                    (        1.16        )                                                                    a              _                        ij                    =                                                    ∑                                  t                  =                  1                                                  T                  -                  1                                            ⁢                                                ξ                  t                                ⁡                                  (                                      i                    ,                    j                                    )                                                                                    ∑                                  t                  =                  1                                                  T                  -                  1                                            ⁢                                                γ                  t                                ⁡                                  (                  i                  )                                                                    ,                  1          ≤          i          ≤          N                ,                  1          ≤          j          ≤          N                                    (        1.17        )                                                                                    b                _                            j                        ⁡                          (              k              )                                =                                                    ∑                                                      t                    =                    1                                                                              o                      t                                        =                                          v                      k                                                                                        T                  -                  1                                            ⁢                                                γ                  t                                ⁡                                  (                  j                  )                                                                                    ∑                                  t                  =                  1                                T                            ⁢                                                γ                  t                                ⁡                                  (                  j                  )                                                                    ,                  1          ≤          j          ≤          N                ,                  1          ≤          k          ≤          M                                    (        1.18        )            
These reestimation formulas can easily be modified to deal with the continuous density case too.
Gradient Based Method
In the gradient based method, any parameter Θ of the HMM λ is updated according to the standard formula,
                              Θ          new                =                              Θ            old                    -                                    η              ⁡                              [                                                      ∂                    J                                                        ∂                    Θ                                                  ]                                                    ⊗                              =                                  ⊗                  old                                                                                        (        1.19        )            
where J is a quantity to be minimized. We define in this case,J=EML=−log(p{O|λ})=−log(Ltot)  (1.20)
Since the minimization of J=EML is equivalent to the maximization of Ltot, eqn. 1.19 yields the required optimization criterion, ML. But the problem is to find the derivative
      ∂    J        ∂    ⊗  for any parameter Θ of the model. This can be easily done by relating J to model parameters via Ltot. As a key step to do so, using the eqns. 1.7 and 1.9 we can obtain,
                              L          tot                =                                            ∑                              i                =                1                            N                        ⁢                          p              ⁢                              {                                  O                  ,                                                            q                      t                                        =                                          i                      ❘                      λ                                                                      }                                              =                                    ∑                              i                =                1                            N                        ⁢                                                            α                  t                                ⁡                                  (                  i                  )                                            ⁢                                                β                  t                                ⁡                                  (                  i                  )                                                                                        (        1.21        )            
Differentiating the last equality in eqn. 1.20 with respect to an arbitrary parameter Θ,
                                          ∂            J                                ∂            Θ                          =                              -                          1                              L                tot                                              ⁢                                    ∂                              L                tot                                                    ∂              Θ                                                          (        1.22        )            
Eqn. 1.22 gives
            ∂      J              ∂      Θ        ,if we know
      ∂          L      tot            ∂    Θ  which can be found using eqn. 1.21. However this derivative is specific to the actual parameter concerned. Since there are two main parameter sets in the HMM, namely transition probabilities aij, 1≦i, j≦N and observation probabilities bj(k), 1≦j≦N, 1≦k≦M, we can find the derivative
      ∂          L      tot            ∂    ⊗  for each of the parameter sets and hence the gradient,
            ∂      J              ∂      ⊗        .
Gradient with Respect to Transition Probabilities
Using the chain rule,
                                          ∂                          L              tot                                            ∂                          α              ij                                      =                              ∑                          t              =              1                        T                    ⁢                                                    ∂                                  L                  tot                                                                              α                  t                                ⁡                                  (                  j                  )                                                      ⁢                                          ∂                                                      α                    t                                    ⁡                                      (                    j                    )                                                                              ∂                                  a                  ij                                                                                        (        1.23        )            
By differentiating eqn. 1.21 with respect to αt (j) we get,
                                                        ∂                              L                tot                                                    ∂                                                α                  t                                ⁡                                  (                  j                  )                                                              =                                    β              t                        ⁡                          (              j              )                                      ,                            (        1.24        )            and differentiating (a time shifted version of) eqn 1.2 with respect to aij
                                          ∂                                          α                t                            ⁡                              (                j                )                                                          ∂                          a              ij                                      =                                            b              j                        ⁡                          (                              a                t                            )                                ⁢                                    α                              t                -                1                                      ⁡                          (              i              )                                                          (        1.25        )            
Eqns. 1.23, 1.24 and 1.25 give,
            ∂              L        tot                    ∂              a        ij              ,and substituting this quantity in eqn. 1.22 (keeping in mind that Θ=aij in this case), we get the required result,
                                          ∂            J                                ∂                          a              ij                                      =                              -                          1                              L                tot                                              ⁢                                    ∑                              t                =                1                            T                        ⁢                                                            β                  t                                ⁡                                  (                  j                  )                                            ⁢                                                b                  j                                ⁡                                  (                                      a                    t                                    )                                            ⁢                                                α                                      t                    -                    1                                                  ⁡                                  (                  i                  )                                                                                        (        1.26        )            
Gradient with Respect to Observation Probabilities
Using the chain rule,
                                          ∂                          L              tot                                            ∂                                          b                j                            ⁡                              (                                  o                  t                                )                                                    =                                            ∂                              L                tot                                                    ∂                                                α                  t                                ⁡                                  (                  j                  )                                                              ⁢                                    ∂                                                α                  t                                ⁡                                  (                  j                  )                                                                    ∂                                                b                  j                                ⁡                                  (                                      o                    t                                    )                                                                                        (        1.27        )            
Differentiating (a time shifted version of) the eqn. 1.2 with respect to bj (ot)
                                          ∂                                          α                t                            ⁡                              (                j                )                                                          ∂                                          b                j                            ⁡                              (                                  o                  t                                )                                                    =                                            α              t                        ⁡                          (              j              )                                                          b              j                        ⁡                          (                              o                t                            )                                                          (        1.28        )            
Finally we get the required probability, by substituting for
      ∂          L      tot            ∂                  b        j            ⁡              (                  o          i                )            in eqn. 1.22 (keeping in mind that Θ=bj(ot) in this case), which is obtained by substituting eqns. 1.28 and 1.24 in eqn. 1.27.
                                                        ∂              J                                      ∂                                                b                  j                                ⁡                                  (                                      o                    t                                    )                                                              =                                    -                              1                                  L                  tot                                                      ⁢                                                                                α                    t                                    ⁡                                      (                    j                    )                                                  ⁢                                                      β                    t                                    ⁡                                      (                    j                    )                                                                                                b                  j                                ⁡                                  (                                      o                    t                                    )                                                                    ,                            (        1.29        )            
Usually this is given the following form, by first substituting for Ltot from eqn. 1.21 and then substituting from eqn. 1.14.
                                                        ∂              J                                      ∂                                                b                  j                                ⁡                                  (                                      a                    t                                    )                                                              =                      -                                                            γ                  t                                ⁡                                  (                  j                  )                                                                              b                  j                                ⁡                                  (                                      a                    t                                    )                                                                    ,                            (        1.30        )            
If the continuous densities are used then
            ∂      J              ∂              c        jm              ,                    ∂        J                    ∂                  μ          jm                      ⁢                  ⁢    and    ⁢                  ⁢                  ∂        J                    ∂                  Σ          jm                    can be found by further propagating the derivative
      ∂    J        ∂                  b        j            ⁡              (                  O          i                )            using the chain rule.
The same method can be used to propagate the derivative (if necessary) to a front end processor of the HMM. This will be discussed in detail later.
Maximum Mutual Information (MMI) Criterion
In ML we optimize an HMM of only one class at a time, and do not touch the HMMs for other classes at that time. This procedure does not involve the concept “discrimination” which is of great interest in Pattern Recognition. Thus the ML learning procedure gives a poor discrimination ability to the HMM system, specially when the estimated parameters (in the training phase) of the HMM system do not match with the inputs used in the recognition phase. This type of mismatches can arise due to two reasons. One is that the training and recognition data may have considerably different statistical properties, and the other is the difficulties of obtaining reliable parameter estimates in the training.
The MMI criterion on the other hand consider HMMs of all the classes simultaneously, during training. Parameters of the correct model are updated to enhance it's contribution to the observations, while parameters of the alternative models are updated to reduce their contributions. This procedure gives a high discriminative ability to the system and thus MMI belongs to the so called “discriminative training” category.
In order to have a closer look at the MMI criterion, consider a set of HMMs Λ={λv, 1≦v≦v}.
The task is to minimize the conditional uncertainty of a class v of utterances given an observation sequence Ô of that class. This is equivalent minimize the conditional information,I(v|Ô,Λ)=−log p{v|Ô,Λ}  (1.31)with respect to Λ.
In an information theoretical frame work this leads to the minimization of conditional entropy, defined as the expectation (E(•)) of the conditional information I,H(V|O)=E[I(v|Ô]  (1.32)where V represents all the classes and O represents all the observation sequences. Then the mutual information between the classes and observations,H(V,O)=H(V)−H(V|O)  (1.33)become maximized; provided H (V) is constant. This is the reason for calling it Maximum Mutual Information (MMI) criterion. The other name of the method, Maximum A Posteriori (MAP) has the roots in eqn. 1.31 where the a posteriori probability p{v|Ô, Λ} is maximized.
Even though the eqn. 1.31 defines the MMI criterion, it can be rearranged using the Bayes theorem to obtain a better insight, as in eqn. 1.34, where w represents an arbitrary class.
                                                                        E                MMI                            =                            ⁢                                                -                  log                                ⁢                                                                  ⁢                p                ⁢                                  {                                                            v                      |                                              O                        ^                                                              ,                    Λ                                    }                                                                                                        =                            ⁢                                                -                  log                                ⁢                                                                  ⁢                                                      p                    ⁢                                          {                                              v                        ,                                                                              O                            ^                                                    |                          Λ                                                                    }                                                                            p                    ⁢                                          {                                                                        O                          ^                                                |                        Λ                                            }                                                                                                                                              =                            ⁢                                                -                  log                                ⁢                                                      p                    ⁢                                          {                                              v                        ,                                                                              O                            ^                                                    |                          Λ                                                                    }                                                                                                  ∑                      w                                        ⁢                                          p                      ⁢                                              {                                                  w                          ,                                                                                    O                              ^                                                        |                            Λ                                                                          }                                                                                                                                                    (        1.34        )            
If we use an analogous notation as in eqn. 1.9, we can write the likelihoods,Ltotclamped=p{v,Ô|λ}  (1.35)
                              L          tot          free                =                              ∑            w                    ⁢                      p            ⁢                          {                              w                ,                                                      O                    ^                                    ❘                  λ                                            }                                                          (        1.36        )            
In the above equations the superscripts clamped and free are used to imply the correct class and all the other classes respectively.
If we substitute eqns. 1.35 and 1.36 in the eqn. 1.34, we get,
                              E          MMI                =                              -            log                    ⁢                                    L              tot              clamped                                      L              tot              free                                                          (        1.37        )            
As in the case of ML re-estimation [ ] or gradient methods can be used to minimize the quantity EMMI. In the following a gradient based method, which again makes use of the eqn. 1.19, is described.
Since EMMI is to be minimized, in this case J=EMMI and therefore J is directly given by eqn. 1.37. The problem then simplifies to the calculation of gradients
            ∂      J              ∂      Θ        ,where Θ is an arbitrary parameter of the whole set of HMMs, Λ. This can be done by differentiating 1.37 with respect to Θ,
                                          ∂            J                                ∂            Θ                          =                                            1                              L                tot                free                                      ⁢                                          ∂                                  L                  tot                  free                                                            ∂                Θ                                              -                                    1                              L                tot                clamped                                      ⁢                                          ∂                                  L                  tot                  clamped                                                            ∂                Θ                                                                        (        1.38        )            
The same technique, as in the case of ML, can be used to compute the gradients of the likelihoods with respect to the parameters. As a first step likelihoods from eqns. 1.35 and 1.36, are expressed in terms of forward and backward variables using the form as in eqn. 1.7.
                              L          tot          clamped                =                              ∑                          i              ∈                              class                ⁢                                                                  ⁢                v                                              ⁢                                                    α                t                            ⁡                              (                i                )                                      ⁢                                          β                t                            ⁡                              (                i                )                                                                        (        1.39        )                                          L          tot          free                =                              ∑            w                    ⁢                                    ∑                              i                ∈                                  class                  ⁢                                                                          ⁢                  w                                                      ⁢                                                            α                  t                                ⁡                                  (                  i                  )                                            ⁢                                                β                  t                                ⁡                                  (                  i                  )                                                                                        (        1.40        )            
Then the required gradients can be found by differentiating eqns. 1.39 and 1.40. But we consider two cases; one for the transition probabilities and another for the observation probabilities, similar to the case of ML.
Gradient with Respect to Transition Probabilities
Using the chain rule for any of the likelihoods, free or clamped,
                                          ∂                          L              tot                              (                ·                )                                                          ∂                          a                              i                ⁢                                                                  ⁢                j                                                    =                              ∑                          t              =              1                        T                    ⁢                                                    ∂                                  L                  tot                                      (                    ·                    )                                                                              ∂                                                      α                    t                                    ⁡                                      (                    j                    )                                                                        ⁢                                          ∂                                                      α                    t                                    ⁡                                      (                    j                    )                                                                              ∂                                  a                                      i                    ⁢                                                                                  ⁢                    j                                                                                                          (        1.41        )            
Differentiating eqns. 1.39 and 1.40 with respect to αt (j), to get two results for free and clamped cases and using the common result in eqn. 1.25, we get substitutions for both terms on the right hand side of eqn. 1.41. This substitution yields two separate results for free and clamped cases.
                                                        ∂                              L                tot                clamped                                                    ∂                              a                                  i                  ⁢                                                                          ⁢                  j                                                              =                                    δ              kv                        ⁢                                          ∑                                  t                  =                  1                                T                            ⁢                                                                    β                    t                                    ⁡                                      (                    j                    )                                                  ⁢                                                      b                    j                                    ⁡                                      (                                          o                      t                                        )                                                  ⁢                                                      α                                          t                      -                      1                                                        ⁡                                      (                    i                    )                                                                                      ,                                  ⁢                  i          ∈                      class            ⁢                                                  ⁢            k                                              (        1.42        )            where δkv is a Kronecker delta.
                                          ∂                          L              tot              free                                            ∂                          a                              i                ⁢                                                                  ⁢                j                                                    =                              ∑                          t              =              1                        T                    ⁢                                                    β                t                            ⁡                              (                j                )                                      ⁢                                          b                j                            ⁡                              (                                  o                  t                                )                                      ⁢                                          α                                  t                  -                  1                                            ⁡                              (                i                )                                                                        (        1.43        )            
Substitution of eqns. 1.42 and 1.43 in the eqn. 1.38 (keeping in mind that Θ=aij in this case) gives the required result,
                                                        ∂              J                                      ∂                              a                                  i                  ⁢                                                                          ⁢                  j                                                              =                                    [                                                1                                      L                    tot                    free                                                  -                                                      δ                    kv                                                        L                    tot                    clamped                                                              ]                        ⁢                                          ∑                                  t                  =                  1                                T                            ⁢                                                                    β                    t                                    ⁡                                      (                    j                    )                                                  ⁢                                                      b                    j                                    ⁡                                      (                                          o                      t                                        )                                                  ⁢                                                      α                                          t                      -                      1                                                        ⁡                                      (                    i                    )                                                                                      ,                                  ⁢                  i          ∈                      class            ⁢                                                  ⁢            k                                              (        1.44        )            
Gradient with Respect to Observation Probabilities
Using the chain rule for any of the likelihoods, free or clamped,
                                          ∂                          L              tot                              (                •                )                                                          ∂                                          b                j                            ⁡                              (                                  o                  t                                )                                                    =                                            ∂                              L                tot                                  (                  •                  )                                                                    ∂                                                α                  t                                ⁡                                  (                  j                  )                                                              ⁢                                    ∂                                                α                  t                                ⁡                                  (                  j                  )                                                                    ∂                                                b                  j                                ⁡                                  (                                      o                    t                                    )                                                                                        (        1.45        )            
Differentiating eqns. 1.39 and 1.40 with respect to αt(j), to get two results for free and clamped cases, and using the common result in eqn. 1.28, we get substitutions for both terms on the right hand side of eqn. 1.45. This substitution yields two separate results for free and clamped cases.
                                                        ∂                              L                tot                clamped                                                    ∂                                                b                  j                                ⁡                                  (                                      o                    t                                    )                                                              =                                    δ              kv                        ⁢                                                                                α                    t                                    ⁡                                      (                    j                    )                                                  ⁢                                                      β                    t                                    ⁡                                      (                    j                    )                                                                                                b                  j                                ⁡                                  (                                      o                    t                                    )                                                                    ⁢                                  ⁢                  j          ∈                      class            ⁢                                                  ⁢            k                                              (        1.46        )            
where is a Kronecker delta. And
                                          ∂                          L              tot              free                                            ∂                                          b                j                            ⁡                              (                                  o                  t                                )                                                    =                                                            α                t                            ⁡                              (                j                )                                      ⁢                                          β                t                            ⁡                              (                j                )                                                                        b              j                        ⁡                          (                              o                t                            )                                                          (        1.47        )            
Substitution of eqns. 1.46 and 1.47 in eqn. 1.38 we get the required result,
                                                        ∂              J                                      ∂                                                b                  j                                ⁡                                  (                                      o                    t                                    )                                                              =                                    [                                                1                                      L                    tot                    free                                                  -                                                      δ                    kv                                                        L                    tot                    clamped                                                              ]                        ⁢                                                                                α                    t                                    ⁡                                      (                    j                    )                                                  ⁢                                                      β                    t                                    ⁡                                      (                    j                    )                                                                                                b                  j                                ⁡                                  (                                      o                    t                                    )                                                                    ⁢                                  ⁢                  j          ∈                      class            ⁢                                                  ⁢            k                                              (        1.48        )            
This equation can be given a more aesthetic form by defining,
                                                                        γ                t                            ⁡                              (                j                )                                      clamped                    =                                    δ              kv                        ⁢                                                                                α                    t                                    ⁡                                      (                    j                    )                                                  ⁢                                                      β                    t                                    ⁡                                      (                    j                    )                                                                              L                tot                clamped                                                    ⁢                                  ⁢                  j          ∈                      class            ⁢                                                  ⁢            k                                              (        1.49        )            
where δkv is a Kronecker delta, and
                                                        γ              t                        ⁡                          (              j              )                                free                =                                                            α                t                            ⁡                              (                j                )                                      ⁢                                          β                t                            ⁡                              (                j                )                                                          L            tot            clamped                                              (        1.50        )            
With these variables we express the eqn. 1.48 in the following form.
                                          ∂            J                                ∂                                          b                j                            ⁡                              (                                  o                  t                                )                                                    =                              1                                          b                j                            ⁡                              (                                  o                  t                                )                                              ⁡                      [                                                                                λ                    t                                    ⁡                                      (                    j                    )                                                  free                            -                                                                    λ                    t                                    ⁡                                      (                    j                    )                                                  clamped                                      ]                                              (        1.51        )            
This equation completely defines the update of observation probabilities. If however continuous densities are used then we can further propagate this derivative using the chain rule, in exactly the same way as mentioned in the case ML. A similar comments are valid also for preprocessors.
Training
We assume that the preprocessing part of the system gives out a sequence of observation vectors O={o1, o2, . . . , oN}.
Starting from a certain set of values, parameters of each of the HMMs λt, 1≦i≦N can be updated as given by the eqn. 1.19, while the required gradients will be given by eqns. 1.44 and 1.48. However for this particular case, isolated recognition, likelihoods in the last two equations are calculated in a peculiar way.
First consider the clamped case. Since we have an HMM for each class of units in isolated recognition, we can select the model λt vof the class l to which the current observation sequence O1 belongs. Then starting from eqn. 1.39,
                                                                        L                tot                clamped                            =                                                L                  l                  l                                =                                                      ∑                                          i                      ∈                                              λ                        i                                                                              ⁢                                                                                    α                        t                                            ⁡                                              (                        i                        )                                                              ⁢                                          β                      t                                                                                                                                              =                                                ∑                                      i                    ∈                                          λ                      i                                                                      ⁢                                                      α                    T                                    ⁡                                      (                    i                    )                                                                                                          (        1.52        )            where the second line follows from eqn. 1.3.
Similarly for the free case, starting from eqn. 1.40,
                                                                        L                tot                free                            =                                                                    ∑                                          m                      =                      1                                        N                                    ⁢                                      L                    m                    l                                                  =                                                      ∑                                          m                      =                      1                                        N                                    ⁢                                      [                                                                  ∑                                                  i                          ∈                                                      λ                            m                                                                                              ⁢                                                                                                    α                            t                                                    ⁡                                                      (                            i                            )                                                                          ⁢                                                                              β                            t                                                    ⁡                                                      (                            i                            )                                                                                                                ]                                                                                                                          =                                                ∑                                      m                    =                    1                                    N                                ⁢                                                      ∑                                          i                      ∈                                              λ                        m                                                                              ⁢                                                            α                      T                                        ⁡                                          (                      i                      )                                                                                                                              (        1.53        )            where Lml represents the likelihood of the current observation sequence belonging to class l, in the model λm. With those likelihoods defined in eqns. 1.52 and 1.53, the gradient giving equations 1.44 and 1.48 will take the forms,
                                                        ∂              J                                      ∂                              a                                  i                  ⁢                                                                          ⁢                  j                                                              =                                    [                                                1                                                            ∑                                              m                        =                        1                                            N                                        ⁢                                          L                      m                      l                                                                      -                                                      δ                    kl                                                        L                    l                    l                                                              ]                        ⁢                                          ∑                                  t                  =                  1                                T                            ⁢                                                                    β                    t                                    ⁡                                      (                    j                    )                                                  ⁢                                                      b                    j                                    ⁡                                      (                                          o                      t                                        )                                                  ⁢                                  α                                      t                    -                    1                                                  ⁢                                                                  ⁢                i                                                    ,                  j          ∈                      λ            k                                              (        1.54        )                                                                    ∂              J                                      ∂                                                b                  ij                                ⁡                                  (                                      o                    t                                    )                                                              =                                    [                                                1                                                            ∑                                              m                        =                        1                                            N                                        ⁢                                          L                      m                      l                                                                      -                                                      δ                    kl                                                        L                    l                    l                                                              ]                        ⁢                                                                                α                    t                                    ⁡                                      (                    j                    )                                                  ⁢                                                      β                    t                                    ⁡                                      (                    j                    )                                                                                                b                  j                                ⁡                                  (                                      o                    t                                    )                                                                    ,                                  ⁢                  j          ∈                      λ            k                                              (        1.55        )            
Now we can summarize the training procedure as follows.
(1) Initialize the each HMM, λi=(Λi, Bi, πi), 1≦i≦N with values generated randomly or using an initialization algorithm like segmental K means [jedlik.phy.bme.hu/˜gerjanos/HMM/node19.html-r4#r4].
(2) Take an observation sequence and                Calculate the forward and backward probabilities for each HMM, using the recursions 1.5 and 1.2.        Using the equations 1.52 and 1.53 calculate the likelihoods.        Using the equations 1.54 and 1.55 calculate the gradients with respect to parameters for each model.        Update parameters in each of the models using the eqn. 1.19.        
(3) Go to step (2), unless all the observation sequences are considered.
(4) Repeat step(2) to (3) until a convergence criterion is satisfied.
This procedure can easily be modified if the continuous density HMMs are used, by propagating the gradients via chain rule to the parameters of the continuous probability distributions. Further it is worth to mention that preprocessors can also be trained simultaneously, with such a further back propagation.
Recognition
Comparative to the training, recognition is much simpler and the procedure is given below.
(1) Take an observation sequence to be recognized and                Calculate the forward and backward probabilities for each HMM, using the recursions 1.5 and 1.2.        As in the equation 1.53 calculate the likelihoods, Lml, 1≦m≦N.        The recognized class l*, to which the observation sequence belongs, is given by        
      l    *    =      arg    ⁢                  ⁢                  max                  1          ≤          m          ≤          N                    ⁢                        L          m          l                .            
(3) Go to step (2), unless all the observation sequences to be recognized are considered.
The recognition rate in this case can be calculated as the ratio between number of correctly recognized speech units and total number of speech units (observation sequences) to be recognized.
Use of Fourier Transform in Pre-Processing
The Hartley Transform is an integral transform which shares some features with the Fourier Transform, but which (in the discrete case), multiplies the kernel by
                              cos          ⁡                      (                                          2                ⁢                π                ⁢                                                                  ⁢                k                ⁢                                                                  ⁢                n                            N                        )                          -                  sin          ⁡                      (                                          2                ⁢                π                ⁢                                                                  ⁢                k                ⁢                                                                  ⁢                n                            N                        )                                      1.56                          instead        ⁢                                  ⁢        of                                                                      ⅇ                                    -              2                        ⁢            π            ⁢                                                  ⁢            k            ⁢                                                  ⁢                          n              /              M                                      =                              cos            ⁡                          (                                                2                  ⁢                  π                  ⁢                                                                          ⁢                  k                  ⁢                                                                          ⁢                  n                                N                            )                                -                      ⅈ            ⁢                                                  ⁢                                          sin                ⁡                                  (                                                            2                      ⁢                      π                      ⁢                                                                                          ⁢                      k                      ⁢                                                                                          ⁢                      n                                        N                                    )                                            .                                                  1.57      
The Hartley transform produces real output for a real input, and is its own inverse. It therefore can have computational advantages over the discrete Fourier transform, although analytic expressions are usually more complicated for the Hartley transform.
The discrete version of the Hartley transform can be written explicitly as
                                                                                          ℋ                  ⁡                                      [                    a                    ]                                                  ⁢                ⋮                ⁢                                ⁢                                  1                                      N                                                  ⁢                                                      ∑                                          n                      =                      0                                                              N                      -                      1                                                        ⁢                                                            a                      n                                        ⁡                                          [                                                                        cos                          ⁡                                                      (                                                                                          2                                ⁢                                π                                ⁢                                                                                                                                  ⁢                                k                                ⁢                                                                                                                                  ⁢                                n                                                            N                                                        )                                                                          -                                                  sin                          ⁡                                                      (                                                                                          2                                ⁢                                π                                ⁢                                                                                                                                  ⁢                                k                                ⁢                                                                                                                                  ⁢                                n                                                            N                                                        )                                                                                              ]                                                                                  ⁢                                                                                        1.58                                                                              ⋮                ⁢                                ⁢                                  ℱ                  ⁢                                    [                  a                  ]                                            -                                              ⁢                                  ℱ                  ⁡                                      [                    a                    ]                                                                                            1.59                                                          where  denotes the Fourier Transform. The Hartley transform obeys the convolution property
                                                      ⁡                          [                              a                *                b                            ]                                k                =                              1            2                    ⁢                      (                                                            A                  k                                ⁢                                  B                  k                                            -                                                                    A                    _                                    k                                ⁢                                                      B                    _                                    k                                            +                                                A                  k                                ⁢                                                      B                    _                                    k                                            +                                                                    A                    _                                    k                                ⁢                                  B                  k                                                      )                                      1.60                  where                                                                a          _                ≡                  a          0                            1.61                                                a            _                                n            /            2                          ≡                  a                      n            /            2                                      1.62                                                a            _                    k                ≡                  a                      n            -            k                                      1.63      (Amdt). Like the fast Fourier Transform algorithm, there is a “fast” version of the Hartley transform algorithm. A decimation in time algorithm makes use ofnleft[a]=n/2[aeven]+χn/2[aadd]  (1.64)nright[a]=n/2[aeven]+χn/2[aadd]  (1.65)where χ denotes the sequence with elements
                                          a            n                    ⁢                      cos            ⁡                          (                                                π                  ⁢                                                                          ⁢                  n                                N                            )                                      -                              a            _                    ⁢                                          ⁢                      sin            ⁡                          (                                                π                  ⁢                                                                          ⁢                  n                                N                            )                                                  1.66      
A decimation in frequency algorithm makes use ofneven[a]=n/2[aleft+aright]  (1.67)nodd[a]=n/2[aleft−aright]  (1.68)
The discrete Fourier transform
                                          A            k                    ≡                      ℱ            ⁡                          [              a              ]                                      =                              ∑                          n              =              0                                      N              -              1                                ⁢                                    ⅇ                                                -                  2                                ⁢                πⅈ                ⁢                                                                  ⁢                k                ⁢                                                                  ⁢                                  n                  /                  N                                                      ⁢                          a              n                                                  1.69      can be written
                    [                                                                              A                  k                                                                                                      A                                      -                                                                                  ⁢                    i                                                                                ⁢                                    ∑                              n                =                0                                            N                -                1                                      ⁢                                                            [                                                                                                              ⅇ                                                                                    -                              2                                                        ⁢                            π                            ⁢                                                                                                                  ⁢                            ⅈ                            ⁢                                                                                                                  ⁢                                                          kn                              /                              N                                                                                                                                                  0                                                                                                            0                                                                                              ⅇ                                                      2                            ⁢                            π                            ⁢                                                                                                                  ⁢                            ⅈ                            ⁢                                                                                                                  ⁢                                                          kn                              /                              N                                                                                                                                                            ]                                                  ︸                  F                                            ⁡                              [                                                                                                    a                        n                                                                                                                                                a                        n                                                                                            ]                                                                        15        )                                                      ∑                          n              =              0                                      N              -              1                                ⁢                                                                      1                  2                                ⁡                                  [                                                                                                              1                          -                          i                                                                                                                      1                          +                          i                                                                                                                                                              1                          +                          i                                                                                                                      1                          -                          i                                                                                                      ]                                                            ︸                                  T                                      -                    1                                                                        ⁢                                          [                                                                                                    cos                        ⁡                                                  (                                                                                    2                              ⁢                              π                              ⁢                                                                                                                          ⁢                              kn                                                        N                                                    )                                                                                                                                    sin                        ⁡                                                  (                                                                                    2                              ⁢                              π                              ⁢                                                                                                                          ⁢                              k                              ⁢                                                                                                                          ⁢                              n                                                        N                                                    )                                                                                                                                                                        -                                                  sin                          ⁡                                                      (                                                                                          2                                ⁢                                π                                ⁢                                                                                                                                  ⁢                                k                                ⁢                                                                                                                                  ⁢                                n                                                            N                                                        )                                                                                                                                                              cos                        ⁡                                                  (                                                                                    2                              ⁢                              π                              ⁢                                                                                                                          ⁢                              kn                                                        N                                                    )                                                                                                                    ]                                            ︸                H                                                    ⁢                                  ⁢                                                                              1                  2                                ⁡                                  [                                                                                                              1                          +                          i                                                                                                                      1                          -                          i                                                                                                                                                              1                          -                          i                                                                                                                      1                          +                          i                                                                                                      ]                                                            ︸                T                                      ⁡                          [                                                                                          a                      n                                                                                                                                  a                      n                                                                                  ]                                ,                                          ⁢          so                ⁢                                  ⁢                  F          =                                    T                              -                1                                      ⁢                          HT              .                                                          16        )            
See, mathworld.wolfram.com/HartleyTransform.html.
A Hartley transform based fixed pre-processing may be considered, on some bases, inferior to that based on Fourier transform. One explanation for this is based on the respective symmetries and shift invariance properties. Therefore we expect improved performances from Fourier transform even when the pre-processing is adaptive. However a training procedure which preserves the symmetries of weight distributions must be used. Main argument of the use of Hartley transform is to avoid the complex weights. A Fourier transform, however, can be implemented as a neural network containing real weights, but with a slightly modified network structure than the usual MLP. We can easily derive the equations which give the forward and backward pass.
Forward pass is given by,
                                                        [                                                ∑                                      i                    =                    0                                                        N                    -                    1                                                  ⁢                                                                            x                      t                                        ⁡                                          (                      i                      )                                                        ⁢                                      cos                    ⁡                                          (                                                                        2                          ⁢                          π                          ⁢                                                                                                          ⁢                          ij                                                N                                            )                                                                                  ]                        2                    +                                    [                                                ∑                                      i                    =                    0                                                        N                    -                    1                                                  ⁢                                                                            x                      t                                        ⁡                                          (                      i                      )                                                        ⁢                                      sin                    ⁡                                          (                                                                        2                          ⁢                          π                          ⁢                                                                                                          ⁢                          ij                                                N                                            )                                                                                  ]                        2                          =                                            X              ~                        t            2                    ⁡                      (            j            )                                              (        2.1        )            
where N denotes the window length, and {tilde over (X)}t(j)=|Xt(j)|.
If we use the notation
            θ      ij        =                  2        ⁢        π        ⁢                                  ⁢        ij            N        ,and error is denoted by J, then we can find
      ∂    J        ∂          θ      ij      simply by using the chain rule,
                                          ∂            J                                ∂                          θ              ij                                      =                              ∑                          t              =              1                        T                    ⁢                                                    ∂                J                                            ∂                                                                            X                      ~                                                              t                      ⁢                                                                                                            2                                    ⁡                                      (                    j                    )                                                                        ⁢                                          ∂                                                                            X                      ~                                        t                    2                                    ⁡                                      (                    j                    )                                                                              ∂                                  θ                  ij                                                                                        (        2.2        )            
We assume that
      ∂    J        ∂                            X          ~                i        2            ⁡              (        j        )            is known and
      ∂                            X          ~                i        2            ⁡              (        j        )                  ∂          θ      ij      can simply be found by differentiating eqn. 2.1 with respect to θij. Thus we get,
                                          ∂                                          X                ~                                            t                ⁢                                                                              2                                            ∂                          θ              ij                                      =                              2            ⁢                                          x                t                            ⁡                              (                i                )                                      ⁢                          cos              ⁡                              (                                  θ                  ij                                )                                      ⁢                                          ∑                                  k                  =                  1                                                  N                  -                  1                                            ⁢                                                                    x                    t                                    ⁡                                      (                    k                    )                                                  ⁢                                  sin                  ⁡                                      (                                          θ                      kj                                        )                                                                                -                      2            ⁢                                          x                t                            ⁡                              (                i                )                                      ⁢                          sin              ⁡                              (                                  θ                  ij                                )                                      ⁢                                          ∑                                  k                  =                  1                                                  N                  -                  1                                            ⁢                                                                    x                    t                                    ⁡                                      (                    k                    )                                                  ⁢                                  cos                  ⁡                                      (                                          θ                      kj                                        )                                                                                                          (        2.3        )            
Eqns. 2.2 and 2.3 define the backward pass. Note that θij can be further back propagated as usual.
Training Procedure which Preserves Symmetry
We can use a training procedure which preserves symmetrical distribution of weights in the Hartley or Fourier transform stages. In addition to the improved shift invariance, this approach can lead to parameter reduction. The procedure starts by noting the equal weights at initialization. Then the forward and backward passes are performed as usual. But in updating we use the same weight update for all the equal weights, namely the average value of all the weight updates corresponding to the equal weights. In this way we can preserve any existing symmetry in the initial weight distributions. At the same time number of parameters is reduced because only one parameter is needed to represent the whole class of equal weights.
See, “A Hybrid ANN-HMM ASR system with NN based adaptive preprocessing”, Narada Dilp Warakagoda, M. Sc. thesis (Norges Tekniske Høgskole, Institutt for Teleteknikk Transmisjonsteknikk), jedlik.phy.bme.hu/˜gerjanos/HMM/hoved.html.
As al alternate to the Hartley transform, a Wavelet transform may be applied.
The fast Fourier transform (FFT) and the discrete wavelet transform (DWT) are both linear operations that generate a data structure that contains segments of various lengths, usually filling and transforming it into a different data vector of length.
The mathematical properties of the matrices involved in the transforms are similar as well. The inverse transform matrix for both the FFT and the DWT is the transpose of the original. As a result, both transforms can be viewed as a rotation in function space to a different domain. For the FFT, this new domain contains basis functions that are sines and cosines. For the wavelet transform, this new domain contains more complicated basis functions called wavelets, mother wavelets, or analyzing wavelets.
Both transforms have another similarity. The basis functions are localized in frequency, making mathematical tools such as power spectra (how much power is contained in a frequency interval) and scalegrams (to be defined later) useful at picking out frequencies and calculating power distributions.
The most interesting dissimilarity between these two kinds of transforms is that individual wavelet functions are localized in space. Fourier sine and cosine functions are not. This localization feature, along with wavelets' localization of frequency, makes many functions and operators using wavelets “sparse” when transformed into the wavelet domain. This sparseness, in turn, results in a number of useful applications such as data compression, detecting features in images, and removing noise from time series.
One way to see the time-frequency resolution differences between the Fourier transform and the wavelet transform is to look at the basis function coverage of the time-frequency plane.
In a windowed Fourier transform, where the window is simply a square wave, the square wave window truncates the sine or cosine function to fit a window of a particular width. Because a single window is used for all frequencies in the WFT, the resolution of the analysis is the same at all locations in the time-frequency plane.
An advantage of wavelet transforms is that the windows vary. In order to isolate signal discontinuities, one would like to have some very short basis functions. At the same time, in order to obtain detailed frequency analysis, one would like to have some very long basis functions. A way to achieve this is to have short high-frequency basis functions and long low-frequency ones. This happy medium is exactly what you get with wavelet transforms.
One thing to remember is that wavelet transforms do not have a single set of basis functions like the Fourier transform, which utilizes just the sine and cosine functions. Instead, wavelet transforms have an infinite set of possible basis functions. Thus wavelet analysis provides immediate access to information that can be obscured by other time-frequency methods such as Fourier analysis.
Wavelet transforms comprise an infinite set. The different wavelet families make different trade-offs between how compactly the basis functions are localized in space and how smooth they are.
Some of the wavelet bases have fractal structure. The Daubechies wavelet family is one example.
Within each family of wavelets (such as the Daubechies family) are wavelet subclasses distinguished by the number of coefficients and by the level of iteration. Wavelets are classified within a family most often by the number of vanishing moments. This is an extra set of mathematical relationships for the coefficients that must be satisfied, and is directly related to the number of coefficients. For example, within the Coiflet wavelet family are Coiflets with two vanishing moments, and Coiflets with three vanishing moments.
The Discrete Wavelet Transform
Dilations and translations of the “Mother function,” or “analyzing wavelet” Φ(x) define an orthogonal basis, our wavelet basis:
                                          Φ                          (              sf              )                                ⁡                      (            x            )                          =                              2                                          -                s                            2                                ⁢                                    Φ              ⁡                              (                                                                            2                                              -                        2                                                              ⁢                    x                                    -                  l                                )                                      .                                              (        3        )            
The variables s and l are integers that scale and dilate the mother function Φ(x) to generate wavelets, such as a Daubechies wavelet family. The scale index s indicates the wavelet's width, and the location index l gives its position. Notice that the mother functions are rescaled, or “dilated” by powers of two, and translated by integers. What makes wavelet bases especially interesting is the self-similarity caused by the scales and dilations. Once we know about the mother functions, we know everything about the basis. Note that the scaling-by-two is a feature of the Discrete Wavelet Transform (DWT), and is not, itself, compelled by Wavelet theory. That is, while it is computationally convenient to employ a binary tree, in theory, if one could define a precise wavelet that corresponds to a feature of a data set to be processed, this wavelet could be directly extracted. Clearly, the utility of the DWT is its ability to handle general cases without detailed pattern searching, and therefore the more theoretical wavelet transform techniques based on precise wavelet matching are often reserved for special cases. On the other hand, by carefully selecting wavelet basis functions, or combinations of basis functions, a very sparse representation of a complex and multidimensional data space may be obtained. The utility, however, may depend on being able to operate in the wavelet transform domain (or subsequent transforms of the sparse representation coefficients) for subsequent analysis. Note that, while wavelets are generally represented as two dimensional functions of amplitude and time, it is clear that wavelet theory extends into n-dimensional space.
Thus, the advantageous application of wavelet theory is in cases where a modest number of events, for example having associated limited time and space parameters, are represented in a large data space. If the events could be extracted with fair accuracy, the data space could be replaced with a vector quantized model (VQM), wherein the extracted events correspond to real events, and wherein the VQM is highly compressed as compared to the raw data space. Further, while there may be some data loss as a result of the VQM expression, if the real data corresponds to the wavelet used to model it, then the VQM may actually serve as a form of error correction. Clearly, in some cases, especially where events are overlapping, the possibility for error occurs. Further, while the DWT is often useful in denoising data, in some cases, noise may be inaccurately represented as an event, while in the raw data space, it might have been distinguished. Thus, one aspect of a denoised DWT representation is that there is an implicit presumption that all remaining elements of the representation matrix are signal.
A particular advantage of a DWT approach is that it facilitates a multiresolution analysis of data sets. That is, if decomposition of the raw data set with the basis function, transformed according to a regular progressions, e.g., powers of 2, then at each level of decomposition, a level of scale is revealed and presented. It is noted that the transform need not be a simple power of two, and itself may be a function or complex and/or multidimensional function. Typically, non-standard analyses are reserved for instances where there is, or is believed to be, a physical basis for the application of such functions instead of binary splitting of the data space.
Proceeding with the DWT analysis, we span our data domain at different resolutions, see www.eso.org/projects/esomidas/doc/user/98NOV/volb/node308.html, using the analyzing wavelet in a scaling equation:
                                          W            ⁡                          (              x              )                                =                                    ∑                              k                =                1                                            N                -                2                                      ⁢                                                            (                                      -                    1                                    )                                k                            ⁢                              c                                  k                  +                  1                                            ⁢                              Φ                ⁡                                  (                                                            2                      ⁢                      x                                        +                    k                                    )                                                                    ,                            (        4        )            where W(x) is the scaling function for the mother function Φ(x), and ck are the wavelet coefficients. The wavelet coefficients must satisfy linear and quadratic constraints of the form
                    ∑                  k          =          0                          N          -          1                    ⁢              c        k              =    2    ,                    ∑                  k          =          0                          N          -          1                    ⁢                        c          k                ⁢                  c                      k            +                          2              ⁢              l                                            =          2      ⁢              δ                  l          ,          0                    where δ is the delta function and l is the location index.
One of the most useful features of wavelets is the ease with which one can choose the defining coefficients for a given wavelet system to be adapted for a given problem. In Daubechies' original paper, I. Daubechies, “Orthonormal Bases of Compactly Supported Wavelets,” Comm. Pure Appl. Math., Vol 41, 1988, pp. 906-966, she developed specific families of wavelet systems that were very good for representing polynomial behavior. The Haar wavelet is even simpler, and it is often used for educational purposes. (That is, while it may be limited to certain classes of problems, the Haar wavelet often produces comprehensible output which can be generated into graphically pleasing results).
It is helpful to think of the coefficients {c0, . . . , cn} as a filter. The filter or coefficients are placed in a transformation matrix, which is applied to a raw data vector. The coefficients are ordered using two dominant patterns, one that works as a smoothing filter (like a moving average), and one pattern that works to bring out the data's “detail” information. These two orderings of the coefficients are called a quadrature mirror filter pair in signal processing parlance. A more detailed description of the transformation matrix can be found in W. Press et al., Numerical Recipes in Fortran, Cambridge University Press, New York, 1992, pp. 498-499, 584-602.
To complete our discussion of the DWT, let's look at how the wavelet coefficient matrix is applied to the data vector. The matrix is applied in a hierarchical algorithm, sometimes called a pyramidal algorithm. The wavelet coefficients are arranged so that odd rows contain an ordering of wavelet coefficients that act as the smoothing filter, and the even rows contain an ordering of wavelet coefficient with different signs that act to bring out the data's detail. The matrix is first applied to the original, full-length vector. Then the vector is smoothed and decimated by half and the matrix is applied again. Then the smoothed, halved vector is smoothed, and halved again, and the matrix applied once more. This process continues until a trivial number of “smooth-smooth-smooth . . . ” data remain. That is, each matrix application brings out a higher resolution of the data while at the same time smoothing the remaining data. The output of the DWT consists of the remaining “smooth (etc.)” components, and all of the accumulated “detail” components.
The Fast Wavelet Transform
If the DWT matrix is not sparse, so we face the same complexity issues that we had previously faced for the discrete Fourier transform. Wickerhauser, Adapted Wavelet Analysis from Theory to Software, A K Peters, Boston, 1994, pp. 213-214, 237, 273-274, 387. We solve it as we did for the FFT, by factoring the DWT into a product of a few sparse matrices using self-similarity properties. The result is an algorithm that requires only order n operations to transform an n-sample vector. This is the “fast” DWT of Mallat and Daubechies.
Wavelet Packets
The wavelet transform is actually a subset of a far more versatile transform, the wavelet packet transform. M. A. Cody, “The Wavelet Packet Transform,” Dr. Dobb's Journal, Vol 19, April 1994, pp. 44-46, 50-54.
Wavelet packets are particular linear combinations of wavelets. V. Wickerhauser, Adapted Wavelet Analysis from Theory to Software, A K Peters, Boston, 1994, pp. 213-214, 237, 273-274, 387. They form bases which retain many of the orthogonality, smoothness, and localization properties of their parent wavelets. The coefficients in the linear combinations are computed by a recursive algorithm making each newly computed wavelet packet coefficient sequence the root of its own analysis tree.
Adapted Waveforms
Because we have a choice among an infinite set of basis functions, we may wish to find the best basis function for a given representation of a signal. Wickerhauser, Id. A basis of adapted waveform is the best basis function for a given signal representation. The chosen basis carries substantial information about the signal, and if the basis description is efficient (that is, very few terms in the expansion are needed to represent the signal), then that signal information has been compressed.
According to Wickerhauser, Id., some desirable properties for adapted wavelet bases are
1. speedy computation of inner products with the other basis functions;
2. speedy superposition of the basis functions;
3. good spatial localization, so researchers can identify the position of a signal that is contributing a large component;
4. good frequency localization, so researchers can identify signal oscillations; and
5. independence, so that not too many basis elements match the same portion of the signal.
For adapted waveform analysis, researchers seek a basis in which the coefficients, when rearranged in decreasing order, decrease as rapidly as possible. to measure rates of decrease, they use tools from classical harmonic analysis including calculation of information cost functions. This is defined as the expense of storing the chosen representation. Examples of such functions include the number above a threshold, concentration, entropy, logarithm of energy, Gauss-Markov calculations, and the theoretical dimension of a sequence. Multiresolution analysis results from the embedded subsets generated by the interpolations at different scales.
A function ƒ(x) is projected at each step j onto the subset Vj. This projection is defined by the scalar product cj(k) of ƒ(x) with the scaling function φ(x) which is dilated and translated:cj(k)=<ƒ(x),2−jφ(2−jx−k)>
As φ(x) is a scaling function which has the property:
            1      2        ⁢          ϕ      ⁡              (                  x          2                )              =            ∑      n        ⁢                  h        ⁡                  (          n          )                    ⁢              ϕ        ⁡                  (                      x            -            n                    )                    or {circumflex over (φ)}(2v)=ĥ(v){circumflex over (φ)}(v) where ĥ(v) is the Fourier transform of the function Σn h(n)δ(x−n). We get:
            h      ^        ⁡          (      v      )        =            ∑      n        ⁢                  h        ⁡                  (          n          )                    ⁢                        ⅇ                                    -              2                        ⁢            π            ⁢                                                  ⁢            nv                          .            
The property of the scaling function of φ(x) is that it permits us to compute directly the set cj+1(k) from cj(k). If we start from the set c0(k) we compute all the sets cj (k), with j>0, without directly computing any other scalar product:
            c              j        +        1              ⁡          (      k      )        =            ∑      n        ⁢                  h        ⁡                  (                      n            -                          2              ⁢              k                                )                    ⁢                        c          j                ⁡                  (          n          )                    
At each step, the number of scalar products is divided by 2. Step by step the signal is smoothed and information is lost. The remaining information can be restored using the complementary subspace Wj+1of Vj+1 in Vj. This subspace can be generated by a suitable wavelet function Ψ(x) with translation and dilation.
                    1        2            ⁢              ψ        ⁡                  (                      x            2                    )                      =                  ∑        n            ⁢                        g          ⁡                      (            n            )                          ⁢                  ϕ          ⁡                      (                          x              -              n                        )                                or                    ψ        ^            ⁡              (                  2          ⁢          v                )              =                            g          ^                ⁡                  (          v          )                    ⁢                        ϕ          ^                ⁡                  (          v          )                    
We compute the scalar products <ƒ(x),2−(j+1)ψ(2−(j+1)x−k)> with:
            w              j        +        1              ⁡          (      k      )        =            ∑      n        ⁢                  g        ⁡                  (                      n            -                          2              ⁢              k                                )                    ⁢                        c          j                ⁡                  (          n          )                    
With this analysis, we have built the first part of a filter bank. In order to restore the original data, Mallat uses the properties of orthogonal wavelets, but the theory has been generalized to a large class of filters by introducing two other filters {tilde over (h)} and {tilde over (g)} named conjugated to h and g.
The restoration, that is, the inverse transform after filtering in the transform domain, is performed with:
            c      j        ⁡          (      k      )        =      2    ⁢                  ∑        l            ⁢              [                                                            c                                  j                  +                  1                                            ⁡                              (                l                )                                      ⁢                                          h                ~                            ⁡                              (                                  k                  +                                      2                    ⁢                    l                                                  )                                              +                                                    w                                  j                  +                  1                                            ⁡                              (                l                )                                      ⁢                                          g                ~                            ⁡                              (                                  k                  +                                      2                    ⁢                    l                                                  )                                                    ]            
In order to get an exact restoration, two conditions are required for the conjugate filters:                Dealiasing condition:        
                              h          ^                ⁡                  (                      v            +                          1              2                                )                    ⁢                        h                      ~            ^                          ⁡                  (          v          )                      +                            g          ^                ⁡                  (                      v            +                          1              2                                )                    ⁢                        g                      ~            ^                          ⁡                  (          v          )                      =  0                Exact restoration: ĥ(v){circumflex over ({tilde over (h)}(v)+ĝ(v) {circumflex over ({tilde over (g)}(v)=1        
In the decomposition, the function is successively convolved with the two filters H (low frequencies) and
G (high frequencies). Each resulting function is decimated by suppression of one sample out of two. The high frequency signal is left, and we iterate with the low frequency signal. In the reconstruction, we restore the sampling by inserting a 0 between each sample, then we convolve with the conjugate filters {tilde over (H)} and {tilde over (G)}, we add the resulting functions and we multiply the result by 2. We iterate up to the smallest scale.
Orthogonal wavelets correspond to the restricted case where:
                    ⁢                                                      g              ^                        ⁡                          (              v              )                                                                          ⅇ                                                -                  2                                ⁢                π                ⁢                                                                  ⁢                v                                      ⁢                                                            h                  ^                                *                            ⁡                              (                                  v                  +                                      1                    2                                                  )                                                                                                    h                              ~                ^                                      ⁡                          (              v              )                                                                                          h                ^                            *                        ⁡                          (              v              )                                                                                      g                              ~                ^                                      ⁡                          (              v              )                                                                                          g                ^                            *                        ⁡                          (              v              )                                            and                                                                  h              ^                        ⁡                          (              v              )                                                2            +                                                            h              ^                        ⁡                          (                              v                +                                  1                  2                                            )                                                2              =    1  
We can easily see that this set satisfies the dealiasing condition and exact restoration condition. Daubechies wavelets are the only compact solutions. For biorthogonal wavelets we have the relations:
                    g        ^            ⁡              (        v        )              =                  ⅇ                              -            2                    ⁢          π          ⁢                                          ⁢          v                    ⁢                                    h                          ~              ^                                *                ⁡                  (                      v            +                          1              2                                )                                        g                  ~          ^                    ⁡              (        v        )              =                  ⅇ                  2          ⁢          π          ⁢                                          ⁢          v                    ⁢                                    h            ^                    *                ⁡                  (                      v            +                          1              2                                )                      and                                          h            ^                    ⁡                      (            v            )                          ⁢                              h                          ~              ^                                ⁡                      (            v            )                              +                                                  h              ^                        *                    ⁡                      (                          v              +                              1                2                                      )                          ⁢                                            h                              ~                ^                                      *                    ⁡                      (                          v              +                              1                2                                      )                                =    1  
Which also satisfy the dealiasing condition and exact restoration condition. A large class of compact wavelet functions can be derived. Many sets of filters were proposed, especially for coding. The choice of these filters must be guided by the regularity of the scaling and the wavelet functions. The complexity is proportional to N. The algorithm provides a pyramid of N elements.
The 2D algorithm is based on separate variables leading to prioritizing of x and y directions. The scaling function is defined by: φ(x, y)=φ(x)φ(y)
The passage from a resolution to the next one is done by:
            f              j        +        1              ⁡          (                        k          x                ,                  k          y                    )        =            ∑                        l          x                =                  -          ∞                            +        ∞              ⁢                  ∑                              l            y                    =                      -            ∞                                    +          ∞                    ⁢                        h          ⁡                      (                                          l                x                            -                              2                ⁢                                  k                  x                                                      )                          ⁢                  h          ⁡                      (                                          l                y                            -                              2                ⁢                                  k                  y                                                      )                          ⁢                              f            j                    ⁡                      (                                          l                x                            ,                              l                y                                      )                              
The detail signal is obtained from three wavelets:
a vertical wavelet: ψ1(x, y)=φ(x)ψ(y)
a horizontal wavelet: ψ2 (x, y)=ψ(x)φ(y)
a diagonal wavelet: ψ3 (x, y)=ψ(x)ψ(y)
which leads to three sub-images:
                    C                  j          +          1                1            ⁡              (                              k            x                    ,                      k            y                          )              =                  ∑                              l            x                    =                      -            ∞                                    +          ∞                    ⁢                        ∑                                    l              y                        =                          -              ∞                                            +            ∞                          ⁢                              g            ⁡                          (                                                l                  x                                -                                  2                  ⁢                                      k                    x                                                              )                                ⁢                      h            ⁡                          (                                                l                  y                                -                                  2                  ⁢                                      k                    y                                                              )                                ⁢                                    f              j                        ⁡                          (                                                l                  x                                ,                                  l                  y                                            )                                                              C                  j          +          1                2            ⁡              (                              k            x                    ,                      k            y                          )              =                  ∑                              l            x                    =                      -            ∞                                    +          ∞                    ⁢                        ∑                                    l              y                        =                          -              ∞                                            +            ∞                          ⁢                              h            ⁡                          (                                                l                  x                                -                                  2                  ⁢                                      k                    x                                                              )                                ⁢                      g            ⁡                          (                                                l                  y                                -                                  2                  ⁢                                      k                    y                                                              )                                ⁢                                    f              j                        ⁡                          (                                                l                  x                                ,                                  l                  y                                            )                                                              C                  j          +          1                3            ⁡              (                              k            x                    ,                      k            y                          )              =                  ∑                              l            x                    =                      -            ∞                                    +          ∞                    ⁢                        ∑                                    l              y                        =                          -              ∞                                            +            ∞                          ⁢                              g            ⁡                          (                                                l                  x                                -                                  2                  ⁢                                      k                    x                                                              )                                ⁢                      g            ⁡                          (                                                l                  y                                -                                  2                  ⁢                                      k                    y                                                              )                                ⁢                                    f              j                        ⁡                          (                                                l                  x                                ,                                  l                  y                                            )                                          
TABLE 1Wavelet transform representation of an image (two dimensional matrix)F(2)H.D.Horizontal Horizontal Detailsk = 2Detailsj = 0V.D.D.D.j = 1j = 2j = 2Vertical DiagonalDetailsDetailsj = 1j = 1Vertical DetailsDiagonal Detailsj = 0j = 0
The wavelet transform can be interpreted as the decomposition on frequency sets with a spatial orientation.
The à trous algorithm
The discrete approach of the wavelet transform can be done with the special version of the so-called a trous algorithm (with holes). One assumes that the sampled data {c0(k)} are the scalar products at pixels k of the function ƒ(x) with a scaling function φ(x) which corresponds to a low pass filter.
The first filtering is then performed by a twice magnified scale leading to the {c1 (k)}set. The signal difference {c0(k)}−{c1(k)} contains the information between these two scales and is the discrete set associated with the wavelet transform corresponding to φ(x). The associated wavelet is therefore ψ(x).
            1      2        ⁢          ψ      ⁡              (                  x          2                )              =            ϕ      ⁡              (        x        )              -                  1        2            ⁢              ϕ        ⁡                  (                      x            2                    )                    
The distance between samples increasing by a factor 2 from the scale (i−1) (j>0) to the next one, ci (k) is given by:
            c      i        ⁡          (      k      )        =            ∑      l        ⁢                  h        ⁡                  (          l          )                    ⁢                        c                      i            -            1                          ⁡                  (                      k            +                                          2                                  i                  -                  1                                            ⁢              l                                )                    
and the discrete wavelet transform wi (k) by: wi(k)=ci−1(k)−ci(k)
The coefficients {h(k)} derive from the scaling function φ(x):
            1      2        ⁢          ϕ      ⁡              (                  x          2                )              =            ∑      l        ⁢                  h        ⁡                  (          l          )                    ⁢              ϕ        ⁡                  (                      x            -            l                    )                    
The algorithm allowing one to rebuild the data frame is evident: the last smoothed array cnp is added to all the differences
            w      i        ·                  c        0            ⁡              (        k        )              =                    c                  n          p                    ⁡              (        k        )              ⁢                  ∑                  j          =          1                          n          p                    ⁢                        w          j                ⁡                  (          k          )                    
If we choose the linear interpolation for the scaling function φ: φ(x) 1−|x| if x ∈[−1,1]                φ(x)=0 if x∉[−1,1]        
we have:
            1      2        ⁢          ϕ      ⁡              (                  x          2                )              =                    1        4            ⁢              ϕ        ⁡                  (                      x            +            1                    )                      +                  1        2            ⁢              ϕ        ⁡                  (          x          )                      +                  1        4            ⁢              ϕ        ⁡                  (                      x            -            1                    )                    
c1 is obtained by:
            c      1        ⁡          (      k      )        =                    1        4            ⁢                        c          0                ⁡                  (                      k            -            1                    )                      +                  1        2            ⁢                        c          0                ⁡                  (          k          )                      +                  1        4            ⁢                        c          0                ⁡                  (                      k            +            1                    )                    
and cj+1 is obtained from cj by
            c              j        +        1              ⁡          (      k      )        =                    1        4            ⁢                        c          j                ⁡                  (                      k            -                          2              j                                )                      +                  1        2            ⁢                        c          j                ⁡                  (          k          )                      +                  1        4            ⁢                        c          j                ⁡                  (                      k            +                          2              j                                )                    
The wavelet coefficients at the scale j are:
            C              j        +        1              ⁡          (      k      )        =                    -                  1          4                    ⁢                        c          j                ⁡                  (                      k            -                          2              j                                )                      +                  1        2            ⁢                        c          j                ⁡                  (          k          )                      -                  1        4            ⁢                        c          j                ⁡                  (                      k            +                          2              j                                )                    
The above à trous algorithm is easily extensible to the two dimensional space. This leads to a convolution with a mask of 3×3 pixels for the wavelet connected to linear interpolation. The coefficents of the mask are:
         (                                        1            16                                                1            8                                                1            16                                                            1            8                                                1            4                                                1            8                                                            1            16                                                1            8                                                1            16                                )  
At each scale j, we obtain a set {wj(k,l)} (we will call it wavelet plane in the following), which has the same number of pixels as the image.
If we choose a B3-spline for the scaling function, the coefficients of the convolution mask in one dimension are
      (                  1        16            ,              1        4            ,              3        8            ,              1        4            ,              1        16            ,        )    ,and in two dimensions:
         (                                        1            256                                                1            64                                                3            128                                                1            64                                                1            256                                                            1            64                                                1            16                                                3            32                                                1            16                                                1            64                                                            3            128                                                3            32                                                9            64                                                3            32                                                3            128                                                            1            64                                                1            16                                                3            32                                                1            16                                                1            64                                                            1            256                                                1            64                                                3            128                                                1            64                                                1            256                                )  
The Wavelet transform using the Fourier transform
We start with the set of scalar products c0(k)=ƒ(x),φ(x−k). If φ(x) has a cut-off frequency
            ν      c        ≤          1      2        ,the data are correctly sampled. The data at the resolution j=1 are:
                    c        1            ⁡              (        k        )              =          <              f        ⁡                  (          x          )                      ,                    1        2            ⁢              ϕ        ⁡                  (                                    x              2                        -            k                    )                      >  
and we can compute the set cl(k) from c0(k) with a discrete filter ĥ(v):
            h      ^        ⁡          (      ν      )        =      {                                                                                                            ϕ                    ^                                    ⁡                                      (                                          2                      ⁢                                                                                          ⁢                      ν                                        )                                                                                        ϕ                    ^                                    ⁡                                      (                    ν                    )                                                                                                                        if                  ⁢                                                                          ⁢                                                          ν                                                                      <                                  ν                  c                                                                                        0                                                                        if                  ⁢                                                                          ⁢                                      ν                    c                                                  ≤                                                    ν                                                  <                                  1                  2                                                                    ⁢                                  ⁢        and        ⁢                                  ⁢                  ∀          ν                    ,                        ∀                      n            ⁢                                                  ⁢                                          h                ^                            ⁡                              (                                  ν                  +                  n                                )                                                    =                              h            ^                    ⁡                      (            ν            )                              
where n is an integer. So: ĉj+1(v)=ĉj(v)ĥ(2jv)
The cut-off frequency is reduced by a factor 2 at each step, allowing a reduction of the number of samples by this factor.
The wavelet coefficients at the scale j+1 are: wj+1(k)=ƒ(x),2−(j+1)ψ(2−(j+1) x−k)
and they can be computed directly from cj(k) by: ŵj=1(v)=ĉj(v)ĝ(2jv)
where g is the following discrete filter:
            g      ^        ⁡          (      ν      )        =      {                                                                                                            ψ                    ^                                    ⁡                                      (                                          2                      ⁢                                                                                          ⁢                      ν                                        )                                                                                        ϕ                    ^                                    ⁡                                      (                    ν                    )                                                                                                                        if                  ⁢                                                                          ⁢                                                          ν                                                                      <                                  ν                  c                                                                                        1                                                                        if                  ⁢                                                                          ⁢                                      ν                    c                                                  ≤                                                    ν                                                  <                                  1                  2                                                                    ⁢                                  ⁢        and        ⁢                                  ⁢                  ∀          ν                    ,                        ∀                      n            ⁢                                                  ⁢                                          g                ^                            ⁡                              (                                  ν                  +                  n                                )                                                    =                              g            ^                    ⁡                      (            ν            )                              
The frequency band is also reduced by a factor 2 at each step. Applying the sampling theorem, we can build a pyramid of
      N    +          N      2        +    …    +    1    =      2    ⁢    N  elements. For an image analysis the number of elements is
      4    3    ⁢            N      2        .  The overdetermination is not very high.
The B-spline functions are compact in this directe space. They correspond to the autoconvolution of a square function. In the Fourier space we have:
                    B        ^            l        ⁡          (      ν      )        =            sin      ⁢                          ⁢      π      ⁢                          ⁢              ν                  l          +          1                            π      ⁢                          ⁢      ν      
B3(x) is a set of 4 polynomials of degree 3. We choose the scaling function φ(v) which has a Bx(x) profile in the Fourier space:
            ϕ      ^        ⁡          (      ν      )        =            3      2        ⁢                  B        3            ⁡              (                  4          ⁢                                          ⁢          ν                )            
In the direct space we get:
      ϕ    ⁡          (      x      )        =                    3        8            ⁡              [                              sin            ⁢                                          π                ⁢                                                                  ⁢                x                            4                                                          π              ⁢                                                          ⁢              x                        4                          ]              4  
This function is quite similar to a Gaussian one and converges rapidly to 0. For 2-D the scaling function is defined by
                              ϕ          ^                ⁡                  (                      u            ,            v                    )                    =                        3          2                ⁢                              B            3                    ⁡                      (                          4              ⁢                                                          ⁢              r                        )                                ,                  ⁢    with        r    =                                                ⁢                        (                                    u              2                        +                          v              2                                )                .            It is an isotropic function.
The wavelet transform algorithm with np scales is the following one:
1. We start with a B3-Spline scaling function and we derive ψ, h and g numerically.
2. We compute the corresponding image FFT. We name T0 the resulting complex array;
3. We set j to 0. We iterate:
4. We multiply Tj by ĝ(2ju, 2jv). We get the complex array Wj+1. The inverse FFT gives the wavelet coefficients at the scale 2j;
5. We multiply Tj by ĥ(2ju,2jv). We get the array Tj+1. Its inverse FFT gives the image at the scale 2j+1. The frequency band is reduced by a factor 2.
6. We increment j
7. If j≦np, we go back to 4.
8. The set {w1, w2, . . . , wnp, cnp} describes the wavelet transform.
If the wavelet is the difference between two resolutions, we have: {circumflex over (ψ)}(2v)={circumflex over (φ)}(v)−{circumflex over (φ)}(2v) and: ĝ(v)=1−ĥ(v) then the wavelet coefficients ŵj(v) can be computed by ĉj−1(v)−ĉj (v).
The Reconstruction
If the wavelet is the difference between two resolutions, an evident reconstruction for a wavelet transform W={w1, w2, . . . , wnp, cnp} is:
                    c        ^            0        ⁡          (      ν      )        =                              c          ^                          n          p                    ⁡              (        ν        )              +                  ∑        j            ⁢                                    w            ^                    j                ⁡                  (          ν          )                    
But this is a particular case and other wavelet functions can be chosen. The reconstruction can be done step by step, starting from the lowest resolution. At each scale, we have the relations:ĉj+1=ĥ(v)(2jv)ĉj(v)ŵj+1=ĝ(v)(2jv)ĉj(v)
we look for cj knowing cj+1, wj+1, h and g. We restore ĉj(v) with a least mean square estimator:{circumflex over (p)}h(2jv)|ĉj+1(v)−ĥ(2jv)ĉj(v)|2+{circumflex over (p)}g(2jv)|ŵj+1(v)−ĝ(2jv)ĉj(v)|2 is minimum. {circumflex over (p)}h(v) and {circumflex over (p)}g(v) are weight functions which permit a general solution to the restoration of ĉj(v). By ĉj(v) derivation we get: ĉj(v)=ĉj+1(v){circumflex over ({tilde over (h)}(2jv)+ŵj+1(v){circumflex over ({tilde over (g)}(2jv)
where the conjugate filters have the expression:
                              h          ~                ^            ⁡              (        ν        )              =                                                      p              ^                        h                    ⁡                      (            ν            )                          ⁢                                            h              ^                        *                    ⁡                      (            ν            )                                                                                          p                ^                            h                        ⁡                          (              ν              )                                ⁢                                                                                    h                  ^                                ⁡                                  (                  ν                  )                                                                    2                          +                                                            p                ^                            g                        ⁡                          (              ν              )                                ⁢                                                                                    g                  ^                                ⁡                                  (                  ν                  )                                                                    2                                                            g          ~                ^            ⁡              (        ν        )              =                                                      p              ^                        g                    ⁡                      (            ν            )                          ⁢                                            g              ^                        *                    ⁡                      (            ν            )                                                                                          p                ^                            h                        ⁡                          (              ν              )                                ⁢                                                                                    h                  ^                                ⁡                                  (                  ν                  )                                                                    2                          +                                                            p                ^                            g                        ⁡                          (              ν              )                                ⁢                                                                                    g                  ^                                ⁡                                  (                  ν                  )                                                                    2                              
It is easy to see that these filters satisfy the exact reconstruction equation. In fact, above pair of equations give the general solution to this equation. In this analysis, the Shannon sampling condition is always respected. No aliasing exists, so that the dealiasing condition is not necessary (i.e., it is satisfied as a matter of course).
The denominator is reduced if we choose: ĝ(v)=√{square root over (1−|ĥ(v)|2)}
This corresponds to the case where the wavelet is the difference between the square of two resolutions:|{circumflex over (ψ)}(2v)|2=|{circumflex over (φ)}(v)|2−|{circumflex over (φ)}(2v)|2 
The reconstruction algorithm is:
1. We compute the FFT of the image at the low resolution.
2. We set j to np. We iterate:
3. We compute the FFT of the wavelet coefficients at the scale j.
4. We multiply the wavelet coefficients ŵi by {circumflex over ({tilde over (g)}.
5. We multiply the image at the lower resolution ĉi by {circumflex over ({tilde over (h)}.
6. The inverse Fourier Transform of the addition of ŵj{circumflex over ({tilde over (g)} and ĉi{circumflex over ({tilde over (h)} gives the image cj+1.
7. j=j−1 and we go back to 3.
The use of a scaling function with a cut-off frequency allows a reduction of sampling at each scale, and limits the computing time and the memory size.
Thus, it is seen that the DWT is in many respects comparable to the DFT, and, where convenient, may be employed in place thereof. While substantial work has been done in the application of wavelet analysis and filtering to image data, it is noted that the wavelet transform analysis is not so limited. In particular, one embodiment of the present invention applies the transform to describe statistical events represented within a multidimensional data-space. By understanding the multi-resolution interrelationships of various events and probabilities of events, in a time-space representation, a higher level analysis is possible than with other common techniques. Likewise, because aspects of the analysis are relatively content dependent, they may be accelerated by digital signal processing techniques or array processors, without need to apply artificial intelligence. On the other hand, the transformed (and possibly filtered) data set, is advantageously suitable for intelligent analysis, either by machine or human.
Generally, there will be no need to perform an inverse transform on the data set. On the other hand, the wavelet analysis may be useful for characterizing and analyzing only a limited range of events. Advantageously, if an event is recognized with high reliability within a transform domain, the event may be extracted from the data representation and an inverse transform performed to provide the data set absent the recognized feature or event. This allows a number of different feature-specific transforms to be conducted, and analyzed. This analysis may be in series, that is, having a defined sequence of transforms, feature extractions, and inverse transforms. On the other hand, the process may be performed in parallel. That is, the data set is subjected to various “tests”, which are conducted by optimally transforming the data to determine if a particular feature (event) is present, determined with high reliability. As each feature is identified, the base data set may be updated for the remaining “tests”, which will likely simplify the respective analysis, or improve the reliability of the respective determination. As each event or feature is extracted, the data set becomes simpler and simpler, until only noise remains.
It should be noted that, in some instances, a high reliability determination of the existence of an event cannot be concluded. In those cases, it is also possible to perform a contingent analysis, leading to a plurality of possible results for each contingency. Thus, a putative feature is extracted or not extracted from the data set and both results passed on for further analysis. Where one of the contingencies is inconsistent with a subsequent high reliability determination, that entire branch of analysis may be truncated. Ideally, the output consists of a data representation with probabilistic representation of the existence of events or features represented within the data set. As discussed below, this may form the basis for a risk-reliability output space representation of the data, useable directly by a human (typically in the form of a visual output) and/or for further automated analysis.
It is also noted that the data set is not temporally static, and therefore the analysis may be conducted in real time based on a stream of data.
The Process to be Estimated
The Kalman filter addresses the general problem of trying to estimate the state x∈n of a discrete-time controlled process that is governed by the linear stochastic difference equationxk=Axk−1+Buk+wk−1,  (3.1)with a measurementzk∈m that is zk=Hxk+vk.  (3.2)
The random variables wk and vk represent the process and measurement noise (respectively). They are assumed to be independent (of each other), white, and with normal probability distributionsp(w)−N(0,Q),  (3.3)p(v)−N(0,R).  (3.4)
In practice, the process noise covariance Q and measurement noise covariance R matrices might change with each time step or measurement, however here we assume they are constant.
Kalman, Rudolph, Emil, “New Approach to Linear Filtering and Prediction Problems”, Transactions of the ASME—Journal of Basic Engineering, 82D:35-45 (1960) (describes the namesake Kalman filter, which is a set of mathematical equations that provides an efficient computational (recursive) solution of the least-squares method. The filter is very powerful in several aspects: it supports estimations of past, present, and even future states, and it can do so even when the precise nature of the modeled system is unknown.)
The n×n matrix A in the difference equation (3.1) relates the state at the previous time step k−1 to the state at the current step k, in the absence of either a driving function or process noise. Note that in practice A might change with each time step, but here we assume it is constant. The n×1 matrix B relates the optional control input u ∈l to the state x. The m×u matrix H in the measurement equation (3.2) relates the state to the measurement zk. In practice H might change with each time step or measurement, but here we assume it is constant.
The Computational Origins of the Filter
We define {circumflex over (x)}k−∈n (note the “super minus”) to be our a priori state estimate at step k given knowledge of the process prior to step k, and {circumflex over (x)}k ∈n to be our a posteriori state estimate at step k given measurement zk. We can then define a priori and a posteriori estimate errors as ek−≡xk−{circumflex over (x)}k− and ek≡xk−{circumflex over (x)}k 
The a priori estimate error covariance is thenPk−=E[ek−ek−T],  (3.5)and the a posteriori estimate error covariance isPk−=E[ekekT].  (3.6)
In deriving the equations for the Kalman filter, we begin with the goal of finding an equation that computes an a posteriori state estimate {circumflex over (x)}k as a linear combination of an a priori estimate {circumflex over (x)}k− and a weighted difference between an actual measurement zk and a measurement prediction H{circumflex over (x)}k− as shown below in (3.7). Some justification for (3.7) is given in “The Probabilistic Origins of the Filter” found below. See, www.cs.unc.edu/˜welch/kalman/kalman_filter/kalman-1.htm, expressly incorporated herein by reference.{circumflex over (x)}k={circumflex over (x)}k−+K(zk−H{circumflex over (x)}k−)  (3.7)
The difference (zk−H{circumflex over (x)}k−) in (3.7) is called the measurement innovation, or the residual. The residual reflects the discrepancy between the predicted measurement H{circumflex over (x)}k− and the actual measurement zk. A residual of zero means that the two are in complete agreement.
The n×m matrix K in (3.7) is chosen to be the gain or blending factor that minimizes the a posteriori error covariance (3.6). This minimization can be accomplished by first substituting (3.7) into the above definition for ek, substituting that into (3.6), performing the indicated expectations, taking the derivative of the trace of the result with respect to K, setting that result equal to zero, and then solving for K. For more details see [Maybeck79; Brown92; Jacobs93]. One form of the resulting K that minimizes (3.6) is given by
                              K          k                =                                            P              k              -                        ⁢                                                            H                  T                                ⁡                                  (                                                                                    HP                        k                        -                                            ⁢                                              H                        T                                                              +                    R                                    )                                                            -                1                                              =                                                                      P                  k                  -                                ⁢                                  H                  T                                                                                                  HP                    k                    -                                    ⁢                                      H                    T                                                  +                R                                      .                                      3.8      
Looking at (3.8) we see that as the measurement error covariance R approaches zero, the gain K weights the residual more heavily. Specifically,
            lim                        R          k                →        0              ⁢          K      k        =            H              -        1              .  
On the other hand, as the a priori estimate error covariance Pk− approaches zero, the gain K weights the residual less heavily. Specifically,
            lim                        P          α          x                ->        0              ⁢          K      k        =  0.
Another way of thinking about the weighting by K is that as the measurement error covariance R approaches zero, the actual measurement zk is “trusted” more and more, while the predicted measurement H{circumflex over (x)}k− is trusted less and less. On the other hand, as the a priori estimate error covariance Pk− approaches zero the actual measurement zk is trusted less and less, while the predicted measurement H{circumflex over (x)}k− is trusted more and more.
The Probabilistic Origins of the Filter
The justification for (3.7) is rooted in the probability of the a priori estimate {circumflex over (x)}k− conditioned on all prior measurements zk (Bayes' rule). For now let it suffice to point out that the Kalman filter maintains the first two moments of the state distribution,E[xk]={circumflex over (x)}k E[(xk−{circumflex over (x)}k)(xk−{circumflex over (x)}k)T]=Pk 
The a posteriori state estimate (3.7) reflects the mean (the first moment) of the state distribution—it is normally distributed if the conditions of (3.3) and (3.4) are met. The a posteriori estimate error covariance (3.6) reflects the variance of the state distribution (the second non-central moment). In other words,
      p    ⁡          (                        x          k                ❘                  z          k                    )        -      N    (                  E        ⁡                  [                      x            k                    ]                    ,                        E          ⁡                      [                                          (                                                      x                    k                                    -                                                            x                      ^                                        k                                                  )                            ⁢                                                (                                                            x                      k                                        -                                                                  x                        ^                                            k                                                        )                                T                                      ]                          =                              N            ⁡                          (                                                                    x                    ^                                    k                                ,                                  P                  k                                            )                                .                    
For more details on the probabilistic origins of the Kalman filter, see [Maybeck79; Brown92; Jacobs93].
The Discrete Kalman Filter Algorithm
The Kalman filter estimates a process by using a form of feedback control: the filter estimates the process state at some time and then obtains feedback in the form of (noisy) measurements. As such, the equations for the Kalman filter fall into two groups: time update equations and measurement update equations. The time update equations are responsible for projecting forward (in time) the current state and error covariance estimates to obtain the a priori estimates for the next time step. The measurement update equations are responsible for the feedback—i.e. for incorporating a new measurement into the a priori estimate to obtain an improved a posteriori estimate.
The time update equations can also be thought of as predictor equations, while the measurement update equations can be thought of as corrector equations. Indeed the final estimation algorithm resembles that of a predictor-corrector algorithm for solving numerical problems as shown below in FIG. 5, which shows the ongoing discrete Kalman filter cycle. The time update projects the current state estimate ahead in time. The measurement update adjusts the projected estimate by an actual measurement at that time.
The specific equations for the time and measurement updates are presented below:
Discrete Kalman filter time update equations.xk−=A{circumflex over (x)}k−1+Buk  (3.9)Pk−=APk−1AT+Q  (3.10)
Again notice how the time update equations (3.9) and (3.10) project the state and covariance estimates forward from time step k−1 to step k. A and B are from (3.1), while Q is from (3.3). Initial conditions for the filter are discussed in the earlier references.
Discrete Kalman filter measurement update equations.Kk=Pk−HT(HPk−HT+R)−1  (3.11)xk={circumflex over (x)}k−+Kk(zk−H{circumflex over (x)}k−)  (3.12)Pk=(i−KkH)Pk−  (3.13)
The first task during the measurement update is to compute the Kalman gain, Kk. Notice that the equation given here as (3.11) is the same as (3.8). The next step is to actually measure the process to obtain zk, and then to generate an a posteriori state estimate by incorporating the measurement as in (3.12). Again (3.12) is simply (3.7) repeated here for completeness. The final step is to obtain an a posteriori error covariance estimate via (3.13). All of the Kalman filter equations can be algebraically manipulated into to several forms. Equation (3.8) represents the Kalman gain in one popular form.
After each time and measurement update pair, the process is repeated with the previous a posteriori estimates used to project or predict the new a priori estimates. This recursive nature is one of the very appealing features of the Kalman filter—it makes practical implementations much more feasible than (for example) an implementation of a Wiener filter [Brown92] which is designed to operate on all of the data directly for each estimate. The Kalman filter instead recursively conditions the current estimate on all of the past measurements. FIG. 6 offers a complete picture of the operation of the filter, combining the high-level diagram of FIG. 5 with the equations (3.9) to (3.13).
Filter Parameters and Tuning
In the actual implementation of the filter, the measurement noise covariance R is usually measured prior to operation of the filter. Measuring the measurement error covariance R is generally practical (possible) because we need to be able to measure the process anyway (while operating the filter) so we should generally be able to take some off-line sample measurements in order to determine the variance of the measurement noise.
The determination of the process noise covariance Q is generally more difficult as we typically do not have the ability to directly observe the process we are estimating. Sometimes a relatively simple (poor) process model can produce acceptable results if one “injects” enough uncertainty into the process via the selection of Q. Certainly in this case one would hope that the process measurements are reliable.
In either case, whether or not we have a rational basis for choosing the parameters, often times superior filter performance (statistically speaking) can be obtained by tuning the filter parameters Q and R. The tuning is usually performed off-line, frequently with the help of another (distinct) Kalman filter in a process generally referred to as system identification.
Under conditions where Q and R. are in fact constant, both the estimation error covariance Pk and the Kalman gain Kk will stabilize quickly and then remain constant (see the filter update equations in FIG. 6). If this is the case, these parameters can be pre-computed by either running the filter off-line, or for example by determining the steady-state value of Pk as described in [Grewal93].
It is frequently the case however that the measurement error (in particular) does not remain constant. For example, observing like transmitters, the noise in measurements of nearby transmitters will generally be smaller than that in far-away transmitters. Also, the process noise Q is sometimes changed dynamically during filter operation—becoming Qk—in order to adjust to different dynamics. For example, in the case of tracking the head of a user of a 3D virtual environment we might reduce the magnitude of Qk if the user seems to be moving slowly, and increase the magnitude if the dynamics start changing rapidly. In such cases Qk might be chosen to account for both uncertainty about the user's intentions and uncertainty in the model.
2 The Extended Kalman Filter (EKF)
The Process to be Estimated As described above, the Kalman filter addresses the general problem of trying to estimate the state x∈n of a discrete-time controlled process that is governed by a linear stochastic difference equation. But what happens if the process to be estimated and (or) the measurement relationship to the process is non-linear? Some of the most interesting and successful applications of Kalman filtering have been such situations. A Kalman filter that linearizes about the current mean and covariance is referred to as an extended Kalman filter or EKF.
In something akin to a Taylor series, we can linearize the estimation around the current estimate using the partial derivatives of the process and measurement functions to compute estimates even in the face of non-linear relationships. To do so, we must begin by modifying some of the analysis presented above. Let us assume that our process again has a state vector x∈n, but that the process is now governed by the non-linear stochastic difference equationxk=ƒ(xk−vuk,wk−1),  (4.1)with a measurementzk=m that is zk=h(xkvk),  (4.2)
where the random variables wk and vk again represent the process and measurement noise as in (4.3) and (4.4). In this case the non-linear function ƒ in the difference equation (4.1) relates the state at the previous time step k−1 to the state at the current time step k. It includes as parameters any driving function vk and the zero-mean process noise wk. The non-linear function h in the measurement equation (4.2) relates the state xk to the measurement zk. See, www.cs.unc.edu/˜welch/kalman/kalman_filter/kalman-2.html, expressly incorporated herein by reference.
In practice of course one does not know the individual values of the noise wk and vk at each time step. However, one can approximate the state and measurement vector without them as{tilde over (x)}k=ƒ({circumflex over (x)}k−1,uk,0)  (4.3)and{tilde over (z)}k=h({tilde over (x)}k,0),  (4.4)where {tilde over (x)}k is some a posteriori estimate of the state (from a previous time step k).
It is important to note that a fundamental flaw of the EKF is that the distributions (or densities in the continuous case) of the various random variables are no longer normal after undergoing their respective nonlinear transformations. The EKF is simply an ad hoc state estimator that only approximates the optimality of Bayes' rule by linearization. Some interesting work has been done by Julier et al. in developing a variation to the EKF, using methods that preserve the normal distributions throughout the non-linear transformations [Julier96].
The Computational Origins of the Filter
To estimate a process with non-linear difference and measurement relationships, we begin by writing new governing equations that linearize an estimate about (4.3) and (4.4),xk={tilde over (x)}k+A(xk−1−{circumflex over (x)}k−1)+Wwk−1,  (4.5)zk={tilde over (z)}k+H(xk−{tilde over (x)}k)+Vvk.  (4.6)
Where xk and zk are the actual state and measurement vectors, {tilde over (x)}k and {tilde over (z)}k are the approximate state and measurement vectors from (4.3) and (4.4), {circumflex over (x)}k is an a posteriori estimate of the state at step k, the random variables wk and vk represent the process and measurement noise as in (3.3) and (4.4).
A is the Jacobian matrix of partial derivatives of ƒ with respect to x, that is,
            A              (                  i          ,          j                )              =                            ∂                      f                          (              i              )                                                ∂                      x                          (              j              )                                          ⁢              (                                            x              ^                                      k              -              1                                ,                      u            k                    ,          0                )              ,
W is the Jacobian matrix of partial derivatives of ƒ with respect to w,
            W              (                  i          ,          j                )              =                            ∂                      f                          (              i              )                                                ∂                      W                          (              j              )                                          ⁢              (                                            x              ^                                      k              -              1                                ,                      u            k                    ,          0                )              ,
H is the Jacobian matrix of partial derivatives of h with respect to x,
            H              (                  i          ,          j                )              =                            ∂                      h                          (              i              )                                                ∂                      x                          (              j              )                                          ⁢              (                                            x              ~                        k                    ,          0                )              ,
V is the Jacobian matrix of partial derivatives of h with respect to v,
      V          (              i        ,        j            )        =                    ∂                  h                      (            i            )                                      ∂                  v                      (            j            )                                ⁢                  (                                            x              ^                        k                    ,          0                )            .      
Note that for simplicity in the notation we do not use the time step subscript kk with the Jacobians A, W, H, and V, even though they are in fact different at each time step.
Now we define a new notation for the prediction error,{tilde over (e)}xk≡xk−{tilde over (x)}k,  (4.7)and the measurement residual,{tilde over (e)}zk≡zk−{tilde over (z)}k.  (4.8)
Remember that in practice one does not have access to xx in (4.7), it is the actual state vector, i.e. the quantity one is trying to estimate. On the other hand, one does have access to zk in (4.8), it is the actual measurement that one is using to estimate zk. Using (4.7) and (4.8) we can write governing equations for an error process as{tilde over (e)}xk=A(xk−1−{circumflex over (x)}k−1)+εk,  (4.9){tilde over (e)}zk=H{tilde over (e)}xk+ηk,  (4.10)where εk and ηk represent new independent random variables having zero mean and covariance matrices WQWT and VRVT, with Q and R as in (3.3) and (3.4) respectively.
Notice that the equations (4.9) and (4.10) are linear, and that they closely resemble the difference and measurement equations (3.1) and (3.2) from the discrete Kalman filter. This motivates us to use the actual measurement residual {tilde over (e)}zk in (4.8) and a second (hypothetical) Kalman filter to estimate the prediction error {tilde over (e)}xk given by (4.9). This estimate, call it êk, could then be used along with (4.7) to obtain the a posteriori state estimates for the original non-linear process as{circumflex over (x)}k={tilde over (x)}k+êk.  (4.11)
The random variables of (4.9) and (4.10) have approximately the following probability distributions:p({tilde over (e)}xk)−N(0,E[{tilde over (e)}xk{tilde over (e)}xkT])p({tilde over (ε)}k)−N(0,WQkWT)p(ηk)−N(0,VRkVT)
Given these approximations and letting the predicted value of êk be zero, the Kalman filter equation used to estimate êk isêk=Kk{tilde over (e)}zk.  (4.12)
By substituting (4.12) back into (4.11) and making use of (4.8) we see that we do not actually need the second (hypothetical) Kalman filter:
                                          x            ^                    k                =                ⁢                                                            x                ~                            k                        +                                          K                k                            ⁢                                                e                  ~                                                  z                  k                                                              =                    ⁢                                                    x                ~                            k                        +                                          K                k                            ⁡                              (                                                      z                    k                                    -                                                            z                      ~                                        k                                                  )                                                                4.13      
Equation (4.13) can now be used for the measurement update in the extended Kalman filter, with {tilde over (x)}k and {tilde over (z)}k coming from (4.3) and (4.4), and the Kalman gain Kk coming from (3.11) with the appropriate substitution for the measurement error covariance.
The complete set of EKF equations is shown below. Note that we have substituted {circumflex over (x)}k− for {tilde over (x)}k to remain consistent with the earlier “super minus” a priori notation, and that we now attach the subscript k to the Jacobians A, W, H, and V, to reinforce the notion that they are different at (and therefore must be recomputed at) each time step.
EKF time update equations.{circumflex over (x)}k=ƒ({circumflex over (x)}k−1,uk,0)  (4.14)Pk−=AkPk−1AkT+WkQk−1WkT  (4.15)
As with the basic discrete Kalman filter, the time update equations (4.14) and (4.15) project the state and covariance estimates from the previous time step k−1 to the current time step k. Again ƒ in (4.14) comes from (4.3), Ak and Wk are the process Jacobians at step k, and Qk is the process noise covariance (3.3) at step k.
EKF measurement update equations.Kk=Pk−HkT(HkPk−HkT+VkRkVkT)−1  (4.16){circumflex over (x)}k={circumflex over (x)}k−+Kk(zk−h({circumflex over (x)}k−,0))  (4.17)Pk=(I−KkHk)Pk−  (4.18)
As with the basic discrete Kalman filter, the measurement update equations (4.16), (4.17) and (4.18) correct the state and covariance estimates with the measurement zk. Again h in (4.17) comes from (3.4), Hk and V are the measurement Jacobians at step k, and Rk is the measurement noise covariance (3.4) at step k. (Note we now subscript R allowing it to change with each measurement.)
The basic operation of the EKF is the same as the linear discrete Kalman filter as shown in FIG. 5. FIG. 7 offers a complete picture of the operation of the EKF, combining the high-level diagram of FIG. 5 with the equations (4.14) through (4.18).
An important feature of the EKF is that the Jacobian Hk in the equation for the Kalman gain Kk serves to correctly propagate or “magnify” only the relevant component of the measurement information. For example, if there is not a one-to-one mapping between the measurement zk and the state via h, the Jacobian Hk affects the Kalman gain so that it only magnifies the portion of the residual zk−h({circumflex over (x)}k−,0) that does affect the state. Of course if over all measurements there is not a one-to-one mapping between the measurement zk and the state via h, then as you might expect the filter will quickly diverge. In this case the process is unobservable.
The Process Model
In a simple example we attempt to estimate a scalar random constant, a voltage for example. Let's assume that we have the ability to take measurements of the constant, but that the measurements are corrupted by a 0.1 volt RMS white measurement noise (e.g. our analog to digital converter is not very accurate). In this example, our process is governed by the linear difference equation
            x      k        =                            Ax                      k            -            1                          +                  Bu          k                +                  w          k                    =                        x                      k            -            1                          +                  w          k                      ,with a measurement z=1 that is
      z    k    =                    Hx        k            +              v        k              =                  x        k            +                        v          k                .            
The state does not change from step to step so A=1. There is no control input so u=0. Our noisy measurement is of the state directly so H=1. (Notice that we dropped the subscript k in several places because the respective parameters remain constant in our simple model.)
The Filter Equations and Parameters
Our time update equations are {circumflex over (x)}k−={circumflex over (x)}k−1, Pk−=Pk−1+Q, and our measurement update equations are
                                          K            k                    =                                                                      P                  k                  -                                ⁡                                  (                                                            P                      k                      -                                        +                    R                                    )                                                            -                1                                      =                                          P                k                -                                                              P                  k                  -                                +                R                                                    ,                                            x              ^                        k                    =                                                    x                ^                            k              -                        +                                          K                k                            ⁡                              (                                                      z                    k                                    -                                                            x                      ^                                        k                    -                                                  )                                                    ,                                  ⁢                              P            k                    =                                    (                              1                -                                  K                  k                                            )                        ⁢                                          P                k                -                            .                                                  5.1      
Presuming a very small process variance, we let Q=1e-5. (We could certainly let Q=0 but assuming a small but non-zero value gives us more flexibility in “tuning” the filter as we will demonstrate below.) Let's assume that from experience we know that the true value of the random constant has a standard normal probability distribution, so we will “seed” our filter with the guess that the constant is 0. In other words, before starting we let {circumflex over (x)}k−1=0.
Similarly we need to choose an initial value for Pk−1, call it P0. If we were absolutely certain that our initial state estimate {circumflex over (x)}=0 was correct, we would let P0=0. However given the uncertainty in our initial estimate {circumflex over (x)}0, choosing P0=0 would cause the filter to initially and always believe {circumflex over (x)}k=0. As it turns out, the alternative choice is not critical. We could choose almost any P0≠0 and the filter would eventually converge. It is convenient, for example, to start with P0=0.    Brown92 Brown, R. G. and P. Y. C. Hwang. 1992. Introduction to Random Signals and Applied Kalman Filtering, Second Edition, John Wiley & Sons, Inc.    Gelb74 Gelb, A. 1974. Applied Optimal Estimation, MIT Press, Cambridge, Mass.    Grewal93 Grewal, Mohinder S., and Angus P. Andrews (1993). Kalman Filtering Theory and Practice. Upper Saddle River, N.J. USA, Prentice Hall.    Jacobs93 Jacobs, O. L. R. 1993. Introduction to Control Theory, 2nd Edison. Oxford University Press.    Julier96 Julier, Simon and Jeffrey Uhlman. “A General Method of Approximating Nonlinear Transformations of Probability Distributions,” Robotics Research Group, Department of Engineering Science, University of Oxford [cited 14 Nov. 1995]. Available from www.robots.ox.ac.uk/˜siju/work/publications/Unscented.zip.    Kalman60 Kalman, R. E. 1960. “A New Approach to Linear Filtering and Prediction Problems,” Transaction of the ASME—Journal of Basic Engineering, pp. 35-45 (March 1960).    Lewis86 Lewis, Richard. 1986. Optimal Estimation with an Introduction to Stochastic Control Theory, John Wiley & Sons, Inc.    Maybeck79 Maybeck, Peter S. 1979. Stochastic Models, Estimation, and Control, Volume 1, Academic Press, Inc.    Sorenson70 Sorenson, H. W. 1970. “Least-Squares estimation: from Gauss to Kalman,” IEEE Spectrum, vol. 7, pp. 63-68, July 1970.
See, also:    “A New Approach for Filtering Nonlinear Systems” by S. J. Julier, J. K. Uhlmann, and H. F. Durrant-Whyte, Proceedings of the 1995 American Control Conference, Seattle, Wash., Pages: 1628-1632. Available from www.robots.ox.ac.uk/˜siju/work/publications/ACC95_pr.zip    Simon Julier's home page at www.robots.ox.ac.ukl˜siju/.    “Fuzzy Logic Simplifies Complex Control Problems”, Tom Williams, Computer Design, Mar. 1, 1991.    “Neural Network And Fuzzy Systems—A Dynamical Systems Approach To Machine Intelligence”, Bart Kosko; Prentice Hall 1992; Englewood Cliffs, N.J.; pp. 13, 18, 19.    B. Krogh et al., “Integrated Path Planning and Dynamic Steering Control for Autonomous Vehicles,” 1986.    Brockstein, A., “GPS-Kalman-Augmented Inertial Navigation System Performance,” Naecom '76 Record, pp. 864-868, 1976.    Brooks, R., “Solving the Fine-Path Problem by Good Representation of Free Space,” IEEE Transactions on Systems, Man, and Cybernetics, pp. 190-197, March-April, 1983.    Brown, R., “Kalman Filtering Study Guide-A Guided Tour,” Iowa State University, pp. 1-19, 1984.    Brown, R., Random Signal Analysis & Kalman Filtering, Chapter 5, pp. 181-209, no date.    D. Kuan et al., “Model-based Geometric Reasoning for Autonomous Road Following,” pp. 416-423, 1987.    D. Kuan, “Autonomous Robotic Vehicle Road Following,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 647-658, 1988.    D. Touretzky et al., “What's Hidden in the Hidden Layers?,” Byte, pp. 227-233, August 1989.    Data Fusion in Pathfinder and Travtek, Roy Sumner, VNIS '91 conference, October 20-23, Dearborn, Mich.    Database Accuracy Effects on Vehicle Positioning as Measured by the Certainty Factor, R. Borcherts, C. Collier, E. Koch, R. Bennet, VNIS '91 conference from October 20-23, Dearborn, Mich.    Daum, F., et al., “Decoupled Kalman Filters for Phased Array Radar Tracking,” IEEE Transactions on Automatic Control, pp. 269-283, March 1983.    Denavit, J. et al., “A Kinematic Notation for Lower-Pair Mechanisms Bases on Matrices,” pp. 215-221, June, 1955.    Dickmanns, E. et al., “Guiding Land Vehicles Along Roadways by Computer Vision”, The Tools for Tomorrow, Oct. 23, 1985.    Edward J. Krakiwsky, “A Kalman Filter for Integrating Dead Reckoning, Map Matching and GPS Positioning”, IEEE Plans '88 Position Location and Navigation Symposium Record, Kissemee, Fla. USA, Nov. 29-Dec. 2, 1988, pp. 39-46.    Fuzzy Systems and Applications, United Signals and Systems, Inc., Bart Kosko with Fred Watkins, Jun. 5-7, 1991.    IEEE Journal of Robotics & Automation, vol. 4, No. 4, August. 1988, IEEE (New York) J. LeM “Domain-dependent reasoning for visual navigation of roadways, pp. 419-427 (Nissan) Mar. 24, 1988.    J. Crowley, “Part 3: Knowledge Based Supervision of Robotics Systems,” 1989 IEEE Conference on Robotics and Automation, pp. 37-42, 1989.    Kaczmarek, K. W., “Cellular Networking: A Carrier's Perspective”, 39th IEEE Vehicular Technology Conference, May 1, 1989, vol. 1, pp. 1-6.    Knowledge Representation in Fuzzy Logic, Loth A. Zadeh, IEEE Transactions on Knowledge and Data Engineering, vol. 1, No. 1, March 1989.    Sennott, J. et al., “A Queuing Model for Analysis of A Bursty Multiple-Access Communication Channel,” IEEE, pp. 317-321, 1981.    Sheridan, T. “Three Models of Preview Control,” IEEE Transactions on Human Factors in Electronics, pp. 91-102, June 1966.    Sheth, P., et al., “A Generalized Symbolic Notation for Mechanism,” Transactions of the ASME, pp. 102-112, Febuary 1971.    Sorenson, W., “Least-Squares estimation: From Gauss to Kalman,” IEEE Spectrum, pp. 63-68, July 1970.    “Automobile Navigation System Using Beacon Information” pp. 139-145.    W. Uttal, “Teleoperators,” Scientific American, pp. 124-129, December 1989.    Wareby, Jan, “Intelligent Signaling: FAR & SS7”, Cellular Business, pp. 58, 60 and 62, July 1990.    Wescon/87 Conference Record, vol. 31, 1987, (LA, US) M. T. Allison et al “The next generation navigation system”, pp. 941-947.    Ekaterina L.-Rundblad, Alexei Maidan, Peter Novak, Valeriy Labunets, Fast Color Wavelet-Haar Hartley-Prometheus Transforms For Image Processing, www.prometheus-inc.comlasi/algebra2003/papers/katya2.pdf.    Richard Tolimieri and Myoung An, Group Filters And Image Processing, www.prometheus-inc.com/asi/algebra2003/papers/tolimieri.pdf.    Daniel N. Rockmore, Recent Progress And Applications In Group FFTs, www.prometheus-inc.com/asi/algebra2003/papers/rockmore.pdf.    Thomas Theuβl and Robert F. Tobler and Eduard Grolier, “The Multi-Dimensional Hartley Transform as a Basis for Volume Rendering”, citeseer.nj.nec.com/450842.html.
See also, U.S. patent Nos. (expressly incorporated herein by reference):    U.S. Pat. Nos. 3,582,926; 4,291,749; 4,314,232; 4,337,821; 4,401,848; 4,407,564; 4,419,730; 4,441,405; 4,451,887; 4,477,874; 4,536,739; 4,582,389; 4,636,782; 4,653,003; 4,707,788; 4,731,769; 4,740,779; 4,740,780; 4,752,824; 4,787,039; 4,795,223; 4,809,180; 4,818,048; 4,827,520; 4,837,551; 4,853,687; 4,876,594; 4,914,705; 4,967,178; 4,988,976; 4,995,258; 4,996,959; 5,006,829; 5,043,736; 5,051,735; 5,070,323; 5,070,931; 5,119,504; 5,198,797; 5,203,499; 5,214,413; 5,214,707; 5,235,633; 5,257,190; 5,274,560; 5,278,532; 5,293,115; 5,299,132; 5,334,974; 5,335,276; 5,335,743; 5,345,817; 5,351,041; 5,361,165; 5,371,510; 5,400,045; 5,404,443; 5,414,439; 5,416,318; 5,422,565; 5,432,904; 5,440,428; 5,442,553; 5,450,321; 5,450,329; 5,450,613; 5,475,399; 5,479,482; 5,483,632; 5,486,840; 5,493,658; 5,494,097; 5,497,271; 5,497,339; 5,504,622; 5,506,595; 5,511,724; 5,519,403; 5,519,410; 5,523,559; 5,525,977; 5,528,248; 5,528,496; 5,534,888; 5,539,869; 5,547,125; 5,553,661; 5,555,172; 5,555,286; 5,555,502; 5,559,520; 5,572,204; 5,576,724; 5,579,535; 5,627,547; 5,638,305; 5,648,769; 5,650,929; 5,653,386; 5,654,715; 5,666,102; 5,670,953; 5,689,252; 5,691,695; 5,702,165; 5,712,625; 5,712,640; 5,714,852; 5,717,387; 5,732,368; 5,734,973; 5,742,226; 5,752,754; 5,758,311; 5,777,394; 5,781,872; 5,919,239; 6,002,326; 6,013,956; 6,078,853; 6,104,101; and 6,449,535.    M. Krebs, “Cars That Tell You Where To Go,” The New York Times, Dec. 15, 1996, section 11, p. 1.    L. Kraar, “Knowledge Engineering,” Fortune, Oct. 28, 1996, pp. 163-164.    S. Heuchert, “Eyes Forward: An ergonomic solution to driver information overload,” Society of Automobile Engineering, September 1996, pp. 27-31.    J. Braunstein, “Airbag Technology Take Off,” Automotive & Transportation Interiors, August 1996, p. 16.    I. Adcock, “No Longer Square,” Automotive & Transportation Interiors, August 1996, p. 38
One embodiment of the present invention advances the art by explicitly communicating reliability or risk information to the user. Therefore, in addition to communicating an event or predicted event, the system also computes or determines a reliability of the information and outputs this information. The reliability referred to herein generally is unavailable to the original detection device, though such device may generate its own reliability information for a sensor reading.
Therefore, the user interface according to this embodiment is improved by outputting information relating to both the event and a reliability or risk with respect to that information.
According to a preferred embodiment of the invention, a vehicle travel information system is provided, for example integrated with a vehicular navigation system. In a symmetric peer-to-peer model, each vehicle includes both environmental event sensors and a user interface, but the present invention is not dependent on both aspects being present in a device. As the vehicle travels, and as time advances, its context sphere is altered. For any context sphere, certain events or sensed conditions will be most relevant. These most relevant events or sensed, to the extent known by the system, are then output through a user interface. However, often, the nature or existence of relevant or potentially relevant event is unreliable, or reliance thereon entails risk.
In the case of a vehicle traveling along a roadway, there are two particular risks to analyze: first, that the recorded event may not exist (false positive), and second, that an absence of indication of an event is in error (false negative). For example, the degree of risk may be indicated by an indication of color (e.g., red, yellow green) or magnitude (e.g., a bar graph or dial).
In many cases, the degree of risk is calculable, and thus may be readily available. For example, if the event sensor is a detection of police radar, reliability may be inferred from a time since last recording of an event. If a car is traveling along a highway, and receives a warning of traffic enforcement radar from a car one mile ahead, there is a high degree of certainty that the traffic enforcement radar will actually exist as the vehicle proceeds along the highway. Further, if the traffic radar is in fixed location, there is a high degree of certainty that there is no traffic enforcement radar closer than one mile. On the other hand, if a warning of traffic radar at a given location is two hours old, then the risk of reliance on this information is high, and the warning should be deemed general and advisory of the nature of risks in the region. Preferably, as such a warning ages, the temporal proximity of the warning is spread from its original focus.
On the contrary, if the warning relates to a pothole in a certain lane on the highway, the temporal range of risk is much broader: even a week later, the reliability of the continued existence at that location remains high. However, over the course of a year, the reliability wanes. On the other hand, while there may be a risk of other potholes nearby, the particular detected pothole would not normally move.
The algorithm may also be more complex. For example, if a traffic accident occurs at a particular location, there are generally acceptable predictions of the effect of the accident on road traffic for many hours thereafter. These include rubbemecking, migrations of the traffic pattern, and secondary accidents. These considerations may be programmed, and the set of events and datapoints used to predict spatial and temporal effects, as well as the reliability of the existence of such effects. This, in turn, may be used to advise a traveler to take a certain route to a destination.
Eventually, the reliability of the information is inferred to be so low as to cause an expiration of the event, although preferably a statistical database is maintained to indicate geographic regional issues broadly.
Therefore, the system and method according to the present invention provides an output that can be considered “two dimensional” (or higher dimensional); the nature of the warning, and the reliability of the warning. In conjunction, the system may therefore output a reliability of an absence of warning. In order to conserve communications bandwidth, it is preferred that an absence of warning is inferred from the existence of a communications channel with a counterpart, along with a failure of a detection of an event triggering a warning. Alternately, such communications may be explicit.
The present invention can provide a mobile warning system having a user interface for conveying an event warning and an associated reliability or risk of reliance on the warning.
Preferably, the reliability or risk of reliance is assessed based on a time between original sensing and proximity. The reliability may also be based on the nature of the event or sensed condition. An intrinsic reliability of the original sensed event or condition may also be relayed, as distinct from the reliability or risk of reliance assuming the event or condition to have been accurately sensed.
In order to determine risk, often statistical and probabilistic techniques may be used. Alternately, non-linear techniques, such as neural networks, may be employed. In employing a probabilistic scheme, a sensor reading at time zero, and the associated intrinsic probability of error are stored. A model is associated with the sensor reading to determine a decay pattern. Thus, in the case of traffic enforcement radar, the half-life for a “radar trap” for K band radar being fixed in one location is, for example, about 5 minutes. Thereafter, the enforcement officer may give a ticket, and proceed up the road. Thus, for times less than three minutes, the probability of the traffic enforcement radar remaining in fixed position is high. For this same time-period, the probability that the traffic enforcement officer has moved up the road against the direction of traffic flow is low. A car following 3 miles behind a reliable sensor at 60 mph would therefore have a highly reliable indication of prospective conditions. As the time increases, so does the risk; a car following ten miles behind a sensor would only have a general waming of hazards, and a general indication of the lack thereof. However, over time, a general (and possibly diurnal or other cyclic time-sensitive variation) risk of travel within a region may be established, to provide a baseline.
It is noted that the risks are not limited to traffic enforcement radar or laser. Rather, the scheme according to the present invention is generalized to all sorts of risks. For example, a sensor may detect or predict sun glare. In this case, a model would be quite accurate for determining changes over time, and assuming a reliable model is employed, this condition could generally be accurately predicted.
Another example is road flooding. This may be detected, for example, through the use of optical sensors, tire drag sensors, “splash” sensors, or other known sensors. In this case, the relevant time-constant for onset and decay will be variable, although for a given location, the dynamics may be modeled with some accuracy, based on sensed actual conditions, regional rainfall, ground saturation, and particular storm pattern. Therefore, a puddle or hydroplaning risk may be communicated to the driver in terms of location, likely magnitude, and confidence.
It is noted that these three independent parameters need not all be conveyed to the user. For example, the geographic proximity to an event location may be used to trigger an output. Therefore, no independent output of location may be necessary in this case. In some cases, the magnitude of the threat is relevant, in other cases it is not. In many present systems (e.g., radar detection), threat magnitude is used as a surrogate for risk. However, it is well understood that there are high magnitude artifacts, and low magnitude true threats, and thus this paradigm has limited basis for use. The use of risk or confidence as an independent factor may be express or intermediate. Thus, a confidence threshold may be internally applied before communicating an event to the user. In determining or predicting risk or confidence, it may be preferred to provide a central database. Therefore, generally more complex models may be employed, supported by a richer data set derived from many measurements over an extended period of time. The central database may either directly perform the necessary computations, or convey an appropriate model, preferably limited to the context (e.g., geography, time, general environmental conditions), for local calculation of risk.
The incorporated references relate, for example, to methods and apparatus which may be used as part of, or in conjunction with the present invention. Therefore, it is understood that the present invention may integrate other systems, or be integrated in other systems, having complementary, synergistic or related in some way. For example, common sensors, antennas, processors, memory, communications hardware, subsystems and the like may provide a basis for combination, even if the functions are separate.
The techniques according to the present invention may be applied to other circumstances. Therefore, it is understood that the present invention has, as an object to provide a user interface harnessing the power of statistical methods. Therefore, it is seen that, as an aspect of the present invention, a user interface, a method of providing a user interface, computer software for generating a human-computer interface, and a system providing such a user interface, presents a prediction of a state as well as an indication of a statistical reliability of the prediction.
Within a vehicular environment, the statistical analysis according to the present invention may also be used to improve performance and the user interface of other systems. In particular, modern vehicles have a number of indicators and warnings. In most known systems, warnings are provided at pre-established thresholds. According to the present invention, a risk analysis may be performed on sensor and other data to provide further information for the user, e.g., an indication of the reliability of the sensor data, or the reliability under the circumstances of the sensor data as basis for decision. (For example, a temperature sensor alone does not indicate whether an engine is operating normally.)