Certain commercial specifications and protocols are relevant to the present description, each of which is incorporated herein by reference. They include:    (1) “High-Definition Multimedia Interface Specification, Version 1.2”;    (2) “High-bandwidth Digital Content Protection System”;    (3) “RTP: A Transport Protocol for Real-Time Applications”;    (4) “RTP Payload Format for JPEG 2000 Video Streams”;    (5) “The Secure Real-time Transport Protocol”; and    (6) “Avocent Audio Visual Protocol Specification,” a copy of which is included in U.S. Patent Application No. 60/842,706, filed Sep. 7, 2006.
Certain papers and books provide background material for the present description, the teachings of which will be assumed to be known to the reader. They include:    (1) O'Reilly, “802.11 Wireless Networks The Definitive Guide,” Sebastopol, Calif., 2005;    (2) Castro, Elizabeth, “HTML for the World Wide Web, Fifth Edition, with XHTML and CSS: Visual QuickStart Guide,” Berkeley: Peachpit Press, 2003;    (3) Castro, Elizabeth, “Perl and CGI for the World Wide Web, Second Edition: Visual QuickStart Guide,” Berkley: Peachpit Press, 2001;    (4) “Intrastream Synchronization for Continuous Media Streams: A Survey of Playout Schedulers”;    (5) “Multipoint Mutilmedia Teleconference System with Adaptive Synchronization”;    (6) “The Design of the OpenBSD Cryptographic Framework”; and    (7) “Survey of Error Recovery Techniques for IP-Based Audio-Visual Multicast Applications.”
Some manuals and documents are relevant to the operation of the main processor board. Each of them are incorporated herein by reference, and include:
Certain manuals and datasheets describe in more detail various aspects of the present description, each of which are incorporated wherein by reference. They include:
For the processor:    (1) Intel IXP45X and Intel IXP46X Product Line of Network Processors Developer's Manual (located at http://www.intel.com/design/network/products/npfamily/docs/ixp4xx.htm    (2) For the processor, Intel IXP45X and Intel IXP46X Product Line of Network Processors Data Sheet (located at same link immediately above)
For the audio visual subsystem:    (3) HDMI Display Interface—Preliminary Technical Data—AD9398    (4) 800 Mbit High Performance HDMI/DVI Transmitter—Preliminary Technical Data—AD9889    (5) ADV7401 Integrated Multi-Format SDTV/HDTV Video Decoder and RGB Graphics Digitizer    (6) Multiformat 216 MHz Video Encoder with Six NSV 12-Bit DACs—ADV7320/ADV7321    (7) JPEG2000 Video Codec ADV202    (8) ADV202 JPEG2000 Video Processor User's Guide    (9) Getting Started with the ADV202 Programming Guide    (10) ADV202 Output Formats and Attribute Data Format    (11) Using the ADV202 in a Multi-chip Application    (12) Front Panel Processor (LPC2103)
For the Radio Chips:    (13) AR5002 STA Programmer's Guide    (14) Boa Documentation;
For the DDR:    (15) Micron Double Data Rate (DDR) SDRAM Data Sheet
The documents described above are incorporated by reference.
The present description relates to a multipoint extender (wired or wireless) for transmitting high-definition content from one source to one or more destinations incorporating High-Definition Multimedia Interface (HDMI) technology. In its preferred embodiment, the system can wirelessly transmit high resolution computer graphics, high-definition video, stereo audio, and control data to eight wireless or LAN connected receivers. The multipoint system includes a transmitter and receivers that provide high-definition media support suitable for such applications as professional audio-visual applications. Audio-video synchronization is provided at each display. All wirelessly connected receivers remain synchronized with each other. Interchangeable modules can be added for analog signals including composite video, component video, computer graphics, or digital HDMI/DVI signals.
In FIG. 1, the system 10 provides a wire replacement for a High Definition Multimedia Interface (in the art sometimes abbreviated HDMI, or simply HD) source connecting to an HDMI display. The system is composed of a transmitter 11 and one or more receivers 12. The transmitter will support communication to up to 8 receiver units. Each receiver will utilize a virtual clock that is synchronized by the transmitter therefore allowing each receiver to be in sync with each other receiver.
The audio and video data is encoded by the transmitter 11 that is connected to the HDMI source 13 and transmits the data to a receiver 12. The receiver units 12 can be connected to respective HDMI display devices 14.
The transmitter 11 and receiver 12 are based around JPEG2000 codecs that will support the compression of High Definition Video along with standard definition content. JPEG2000 codecs are not new and are well-known video transmission protocols to the artisan. It is not necessary to repeat their details here.
Video, audio and control data (such as RS-232, infra-red, HDMI, or Consumer Electronics Control codes), which are supported at the hardware level but could be supported either by software or firmware level, are input into the transmitter 11. In the transmitter, the video & audio data is converted to the digital domain, time stamped, compressed, encrypted, packetized, and transmitted over a 10/100 bit Ethernet link or over a wireless link to the receiver. The order in which encryption occurs is not a requirement.
In an example embodiment, the above-described control data is not interpreted at the transmitter, but is forwarded as-is.
At the other end, the receiver 12 decrypts, decompresses, converts/manipulates the data back into the desired format, and outputs to the connected HDMI-supported audio/video equipment (such as video display 14). The control data that is received by the receiver 12 allows for several methods to control the external A/V equipment. In some examples, this is done through RS232 serial, Infra-Red (IR)/IR-Blaster and CEC commands.
The present set of transmitter 11 and receiver 12 systems will allow different A/V modules to be installed, such as the HDMI/Euro-Block module.
The block diagrams and detailed design information for the HW appliances can be found below.
The transmitter is designed to accept an input source 13, such as either a HDMI source (Audio & Video) or one DVI source, either of which will accept PC graphics or video. The rear of the Transmitter is shown in FIG. 3 and includes an 10/100 Ethernet port 17, RS-232 ports 24, and USB Client port 19.
The front of the transmitter is shown in FIG. 2 and includes a Vacuum Fluorescent Display (VFD) 15 or similar display to communicate to the user the state of the device along with pertinent information, InfraRad receiver and blaster pair 16, five control buttons (UP, DOWN, LEFT, RIGHT, and SELECT) 23, and a USB host port 18 to supply power for an optional light.) The transmitter 11 also includes a single lane PCI express port 20, and ports for HDMI or DVI source video 21. Ports 22 receive balanced, stereo audio input.
The receiver 12 is shown in more detail in FIG. 4. It includes an HDMI video connector 25, Ethernet port 26, USB host port 27, USB client port 28, RS232 port 29, and InfraRed transmitter and receiver pair 30. The I/R pair 30 (as well as the I/R pair 16 on the transmitter) support signals at 38 KHz signal+/−2 KHz window.
In general, the system incorporates a High Definition source or standard definition source connected to the transmitter 11 (FIG. 1) via an HDMI connection to the HDMI port 21. A distance from the transmitter (which could be in excess of 100 feet) are N number (where N<=8 in the preferred embodiment but more or less as appropriate) of receivers 12 each connected via HDMI to a respective display device 14. Wireless transmission from the transmitter to the receivers is per 802.11 certified radios.
HD-JPEG2000 is employed to compress the video and can run at, optionally, 20 MBps to meet commercial quality requirements.
The processor subsystem of the receiver 12 is based on a PCI Bus architecture, using, for example, an Intel IXP455 processor with DDR memory. The JPEG2000 codecs and WIFI radio communicate via the PCI bus. For that reason, the transmitter includes an 802.11a WIFI mini-PCI radio mounted on a PCI bus. The processor subsystem design allows use of the Linux OS (Kernel Version 2.6.x) which takes advantage of an OS with PCI support, network stacks, USB stacks, Radio drivers and an off the shelf Web server.
The video codecs interface to the PCI bus. A PCI card in the receiver and transmitter supports the video processing circuits, the codec's and the digital front ends. This will allow the front ends, video and audio codec's to be tested prior to the being fully implemented in HW.
An FPGA optimizes the interface between the digital interface and the microprocessor (uP). The FPGA handles (1) Configuration of the digital front ends, (2) audio time stamping, (3) video time stamping, and (4) configuration and read/write of compressed video data to and from the PCI bus.
Firmware on the transmitter 11 provides a number of functions. The firmware assigns each unit an electronic product identification number that is programmed at production time and cannot be modified by the user. It assigns each unit with an electronic OEM identification number that is programmed at production time and cannot be modified by the user. Each unit also has a country code stored in nonvolatile memory.
Each unit has a unique MAC address for each Wireless port (radio) in the device. The MAC address is programmed by the radio manufacturer and cannot be changed at production time or by the user. Each unit also has a unique MAC address for the Ethernet port in the device that is programmed at production time and cannot be modified by the user. For the purpose of management, components can be identified by their Ethernet MAC address.
New operational firmware can be received via the RJ45 connection port. In the event of an interrupted or corrupted transfer, the unit remains sufficiently operational to communicate and receive a valid firmware application. Alternatively, upgrades to firmware can be made via the wireless link. A firmware upgrade operation for the transmitter 11 ideally contains the upgrade code for all processors and sub-processors contained within the transmitter 11, rather than upgrading the parts of the transmitter individually.
The firmware also loads a test image for use in testing video transmission to a receiver 12 and a test tone for use in testing audio transmission to the receiver 12.
The firmware also allows the transmitter 11 to respond to ICMP pings.
The system is configured by the use of web pages. The user can configure the system by the use of an intuitive menu system on multiple web pages.
The factory default configuration will set the IP and subnet and a given password. The user will need to connect the PCs Ethernet port to the device. The user will be required to setup their PC to a valid IP address (other than the factory default IP address) per the subnet mask. The user will then enter the IP address of the unit into their web browser the follow the instructions for configuring the device(s). Examples of the type of information that will be provided and configurable are described below:
A. Transmitter Management Software—Web Pages
The Transmitter unit will allow configuration of the system by the user of a series of web pages. Information regarding specific modes or configuration will be consolidated onto specific web pages however the flow of the pages will be architected to facilitate easy configuration of the system. The information below is an example of the type of information that can be displayed or be allowed to be input by the user.                Status Page                    Display the Transmitter's name            Display the Transmitter's unique ID            Display the Receivers in range, signal strength and their status            Display who is the master Receiver            Display the 802.11a Wireless channel being utilized.            Display country code                        Version Page                    Display AV subsystem firmware version            Display Front Panel firmware version            Display system hardware version            Display front panel hardware version            Display Transmitter ID            Display Transmitters name            Display the receivers joined the system and their:                            IDs                Names                FW versions                                                Setup Page                    Set the Transmitter's name            Set A/V connection type (Wired, Wireless)            Set IP Address of the Ethernet connection            Set Subnet mask of the Ethernet connection            Set the video quality            Set which Receiver will be part of the A/V system by entering their unique ID            Set which Receiver will be the Master Receiver.            Set wireless Channel to utilize            Set transmit power            Enable/Disable                            IR Blaster                RS-232                Wireless Radio                                                
B. Receiver Management Software—Webpages
The Receiver unit will allow configuration of the system by the use of a series of web pages. The information below is an example of the type of information that can be displayed or be allowed to be input by the user.                Status Page                    Display the Receiver's name            Display the Receiver's unique ID            Display the system's Transmitter's Name, ID, signal strength and Status (Connected or not connected)            Display the 802.11a Wireless channel being utilized.            Display country code                        Version Page                    Display AV subsystem firmware version            Display system hardware version            Display Receivers ID            Display Receivers name                        Setup Page                    Set the Receivers name            Set A/V connection type (Wired, Wireless)            Set IP Address of the Ethernet connection            Set Subnet mask of the Ethernet connection            Set the ID of the Transmitter that the Receiver will communicate with.            Set transmit power            Enable/Disable                            IR Blaster                RS-232                Wireless Radio                                                
Within a system 10, a single receiver 12 of the set of receivers (such as are shown in FIG. 1) is designed as the master receiver. Data received by the serial port of the transmitter 11 is passed unmodified to the master receiver 12 which sends it out its RS232 port at the same data rate as received by the transmitter 11. Data received by the serial port of the Master Receiver will be passed unmodified to the Transmitter which will send it out its RS232 port at the same data rate as received by the Master Receiver.
IR data received by the transmitter 11 is sent unmodified out the transmitter IR blaster port 16 (unless disabled) as well as the master receiver IR blaster port 30. IR data received by the Master Receiver will be sent un-modified out the Master Receiver IR (unless disabled) blaster port as well as the Transmitter IR blaster port.
A number of transmitters should be able to operate within a set radius (such as 200 feet) from each other without interfering with each other, although directional antennas may be required depending on the topology specified. Higher density operations may require a wired connection between transmitter and receiver or the use of directional antennas and attenuation material.
Video information is received at the transmitter 11, encrypted, and sent by wireless, wireline or webpage to the appropriate receiver 12. Closed Captioning information located in lines 5-21 of active video inserted at an analog input of the transmitter are recoverable at the output of the receiver.
For audio information, if a load is present on the Euro-block connector of the transmitter 11, then audio is sampled from that input rather than the HDMI stream. The receiver 12 then outputs video on the HDMI connector and Audio on Euro-blocks and/or RCA connectors. Audio received by the transmitter 11 on the HDMI stream is not transcoded by the transmitter 11. Rather, valid HDMI audio is received by the transmitter 11 and simply output to the receiver 12.
Receivers 12 in the system 10 are synchronized so (1) a viewer cannot perceive audio echoes from the display devices, and (2) a viewer cannot perceive that the video frames are not synchronized with the audio.
A protocol that typically causes an appliance to act as the Consumer Electronics Control (CEC) root device will not in this system connect the CEC line to any HDMI output. The transmitter 11 is the CEC root. The transmitter 11 generates the CEC Physical address of all source devices connected to it as per the HDMI specification, by appending a port number onto its own physical address and placing that value in the Extended Display Identification Data (EDID) for that port.
The system 10 acts as a HDMI Repeater/Distributor as defined in the HDMI specification.
The user interface for configuring the transmitter 11 is via a web-based (http) browser. The user interface permits the user to:                Set the transmitter's name;        Set the Wireless Channel used for A/V stream;        Set the A/V connection Type (Wired, Wireless);        Set the IP Address of the Ethernet connection;        Set the Subnet mask of the Ethernet connection;        Set the Gateway for the Ethernet connection;        Display the version information (such as front panel firmware, FGPA Version, hardware version, etc.);        Transmit an internal Test Signal;        Set the video quality level;        Set which Receiver unit(s) will be part of the A/V system by entering their Unique ID (I.e. the MAC address);        Set which Receiver will be the Master Receiver;        Set transmit power below the regions max;        Allow enabling/disabling the IR Blaster (to prevent local loopback);        Allow enabling/disabling the RS-232 pass through;        Set the RS-232 parameters;        Enter a password, before changing settings;        Set a password; and        Restore factory defaults.        
The front panel operation is also governed by firmware that provides for several modes of displaying data. The signal strength mode displays the signal strength of a single Receiver unit's transmission at a time. When in signal strength mode, the display periodically changes which Receiver unit's signal strength is displayed.
The error rate mode of display will display an indication of the error rate of the communication channel for a single Receiver unit at a time. When in error rate mode, the display periodically changes which Receiver unit's error rate is displayed.
The channel mode of display will display the wireless channel being utilized if the AV data stream is using wireless mode.
Finally, in the address mode of display, the IP address of the transmitter 11 will be displayed on the VFD display.
On the receiver side, each receiver 12 supports DDC per the HDMI specification. Like the transmitter 11, the receiver 12 has an 802.11a WIFI mini-PCI radio mounted on a mini-PCI bus. Like the transmitter, 802.11a diversity antennae support wireless communications.
If a load is present on the Euro-block and/or RCA connector, then audio will be routed to this output and not the HDMI stream.
Each receiver 12 can be connected or disconnected without causing damage to any other device in the system 10. This goal (hot-plugability) is satisfied in hardware design. When hot-plugging, all receivers 12, transmitter 11, and servers, mice, keyboards, monitors, and other network devices remain uncompromised.
The receiver is controlled by an FPGA running firmware. The firmware assigns each receiver with an electronic product identification number that is programmed at production time and cannot be modified by the user. Each receiver also has an electronic OEM identification number that is programmed at production time and cannot be modified by the user. Each receiver has a country code stored in nonvolatile memory. Each receiver has a unique MAC address for each Wireless port (radio) in the device. Each unit has a unique MAC address for each Ethernet-port in the device that is programmed at production time and cannot be modified by the user. For the purpose of management, components are identified by their Ethernet MAC address. The firmware allows the receiver to respond to ICMP pings.
New operational firmware is received by the receiver via an Ethernet RJ45 connection port. In the event of an interrupted or corrupted transfer, the product shall remain sufficiently operational to communicate and receive a valid firmware application. A firmware upgrade operation of the receiver contains the upgrade code for all processors and sub-processors contained within the receiver, so there is no need to upgrade the parts of the receiver individually. Firmware upgrades can also be done by the wireless link.
The user interface for configuring the receiver 12 is via a web-based (http) browser. The user interface permits the user to:                Set the receiver's name;        Set the IP Address of the Ethernet connection;        Set the Subnet mask of the Ethernet connection;        Set the Gateway for the Ethernet connection;        Set transmit power below the regions max;        Set which transmitters to accept connections with;        Allow enabling/disabling the IR Blaster (to prevent local loopback); and        Restore factory defaults.        
The front panel operation is also governed by firmware that controls the display of an indication (by use of the 5 LEDs) that the system is booting and/or fully booted.
As shown in FIG. 1, the system 10 provides an audiovisual product that supports point to multi-point transmission of High Definition (HD) video and audio from the transmitter 11 to one or more remote receivers 12. The link between the transmitter and a receiver is either 100 mega-bit Ethernet (wired) or 802.11a (wireless).
The audiovisual subsystem consists of two video JPEG2000 CODEC's (ADV202, one for Y and the other for CBCR), an audio CODEC, a multi-format video decoder and graphics digitizer (ADV7401), a HDMI receiver (AD9398), a HDMI transmitter (AD9889) and a video encoder (ADV7321). A Xilinx XC3S500E FPGA is used to interface the AV subsystem to a 32 bit, 33 MHz PCI bus.
In addition to the HDMI and DVI video described, other and/or combined embodiments can support other video standards such as Component Video, Composite Video and S Video.
This section describes the software architecture. The following sections describe the software in terms of the concurrent threads of execution, the calling hierarchy or layers of software, and the strategy for handling incoming events in this event-driven system. Other aspects of the architecture are covered as well, such as error handling and the physical layout of files and folders.
A. The Framework (XFw)
The XFw allows the software engineer to focus on the responsibilities of a specific part of the system. The system can be broken down into Active Objects (AOs) that are responsible for a specific aspect of the system as a whole. The framework provides and handles the common functionality of AOs. Each AO runs in its own thread of execution and only communicates with other AOs via events.
B. Layers
The framework is outside of the layering. Layering is a convenient way to abstract the operating system and hardware to minimize changes in the system when either the hardware or operating system changes. Layering also facilitates testing of Active Objects and Logical Drivers. The relationship between the framework and the layers is shown in FIG. 5.
In FIG. 5, each active object has its own thread of execution. AOs interpret events and process them as prescribed by the design. AOs can post events to logical drivers directly, but logical drivers must Publish or Post events to AOs only through the framework. In general, AOs should never need to change due to change of OS or hardware platform.
The logical driver, shown in FIG. 5, hides the actual hardware/OS design. An AO posts events directly to a Logical Driver. A logical driver communicates with the system in one of two ways. It can publish an event through the framework or it can post an event directly to an active object by using the framework. Logical drivers will have a thread that blocks on tend to have a thread and therefore tend to be dependent on the operating system. They provide the highest level of abstraction within the system. The Logical Drivers should not need to change when the hardware changes, but probably will require change to accommodate a change in OS.
The redirector provides a convenient hardware abstraction point. By changing the redirector you can direct the requests to any hardware driver you want. For example: If a redirector is used to make calls to an RS-232 serial port (which is a character driver) it is easy to change the redirector to call another character driver such as an I2C driver. Logical drivers or active objects may call a redirector. However, only one object (thread context) may call a redirector. Redirectors do not have threads and merely provide functionality. A redirector may implement user mode driver functionality (such as MMIO) directly. Finally in
FIG. 56, the kernel modules interact with the hardware. Only redirectors call kernel modules.
An example layer system is shown in FIG. 6.
The system architecture can be illustrated by breaking it down into different functional areas and model them as classes. The overall class diagrams for a transmitter and receiver are shown in FIGS. 7 and 8, respectively. There are different types of classes, namely:
1. Active Object Classes: An active object is a state machine with a thread and event queues. It provides an event interface to other objects. It is responsible for modeling the behaviors of one functional area.
2. Passive Object Classes: A passive object is an object without a thread. It provides a function interface to other objects. It does not exhibit state behaviors or process events. Examples include database and utility objects.
3. Logical Driver Classes: A logical driver is an object with a thread and an event queue. Like active objects, it provides an event interface to other objects. However it does not exhibit state behaviors. It serves as an interface to device drivers and network sockets. For example, it is used to convert interrupt status into events.
4. Redirector Classes: A redirector object provides an interface to a device driver or kernel services in order to isolate the upper layers from the actual driver or kernel implementation. It allows us to only swap out the redirector when the underlying hardware or OS is changed.
For classes that are similar or the same for both transmitter 11 and receivers 12, they are shared between the two architectures. As seen in FIGS. 7 and 8, most classes fall into this category. For classes that differ significantly for transmitter 11 and receivers 12, they are distinct in each architecture, such as AOAvSessionTx and AOAvSessionRx.
Active objects are shown as packages in the class diagrams. Each package is composed of the active object itself, as well as any state machine objects implementing the orthogonal regions. This is an overview of the functions of statecharts and class diagrams for active objects:
1. AOSysManager—This is the main active object responsible for overall system control and coordination among active objects. Its tasks include (1) System initialization and reset, (2) Firmware upgrade, and (3) Management of wireless and wired network interface.
2. AOLocalUi—This is the active object that manages the Local User Interface, which includes an LCD front panel and buttons on a transmitter, and signal strength LEDs on a receiver. It handles passing of IR data to the LPC (ARM slave processor). Its tasks include (1) Initialization and firmware upgrade of the ARM slave processor, (2) Button input detection, (3) LCD display and mode control, and (4) LED control.
3. AOAvControlTx—This active object controls the AV subsystem hardware of a transmitter. Its tasks include (1) Hardware initialization, (2) Hot-plug assertion, (3) Video mode detection and configuration, (4) HDCP authentication, and (5) Capturing HDMI control packets (e.g. SPD, ISCR1, ISCR2, ACP) from hardware.
4. AOAvControlRx—This active object controls the AV subsystem hardware of a receiver. Its tasks include (1) Hardware initialization, (2) Hot-plug detection, (3) Video mode configuration, (4) HDCP authentication, (5) Writing HDMI control packets (e.g. SPD, ISCR1, ISCR2, ACP) to hardware.
5. AOAvSessionTx—This active object manages the point-to-multipoint links and sessions from the transmitter to receivers. Its tasks include (1) Link management, including (a) Discovery and validation of new receivers, (b) Assignment of IP addresses to new receivers, and (c) Polling the status of existing receivers; and (2) Session management, including (a) Session establishment, e.g. sending RTP and A/V parameters to receivers, (b) Receiving audio/video frames from LDAV logical driver and multicast them to receivers using RTP via the LDRTPAudio/LDRTPVideo logical drivers, and (c) Exchanging CEC, HDCP and HDMI control packets with receivers.
6. AOAvSessionRx—This active object manages the point-to-point link and session from a receiver to the transmitter. Its tasks include (1) Link management, including (a) Probing for a transmitter, (b) Joining a transmitter, and (c) Responding to poll requests from the associated transmitter; and (2) Session management, including (a) Session establishment, e.g. receiving RTP and A/V parameters from the transmitter, (b) Reassembly of audio/video frames from received RTP packets, (c) Time synchronization with the transmitter, (d) Playback of audio/video frames to LDAV logical driver, and (e) Exchanging CEC, HDCP and HDMI control packets with the transmitter.
7. AOCmdCtl—This active object manages the flow of command and control data from a transmitter to a receiver, and vice versa. Its tasks include (1) reading serial data from the serial logical driver, (2) processing local IR data (from LDLocalUi), (3) processing remote IR data (from the master receiver), (4) for transmitter, sending the IR & serial data to the master receiver via AOAvSessionTx unless serial pass through is disabled, (5) for receiver, sending the IR & serial data to the transmitter via AOAvSessionRx, (6) writing received serial data to the serial logical driver, unless serial pass through is disabled, (7) publishing IR data to AOLocalUi, and (8) handling CEC.
The AO applications described herein are by way of example. Alternatives could be XFApp or other suitable applications.
The RTP stack protocol is used to transport the audio and video information over the multicast. RTP is a known and popular protocol for transporting real-time data such as audio and video over multicast or unicast network services. It typically runs on top of UDP/IP. It adds a light-weighted header (min 12 bytes) to each packet to carry the sequence number, timestamp, source identifier, etc. An accompanying protocol RTCP provides control functions such as QoS (Quality of Service) reporting and time synchronization.
At least two commercially available RTP stacks are suitable for use in the present embodiments. They are:
ccrtp (http://www.gnu.org/software/ccrtp/)
jrtplib (http://research.edm.luc.ac.be/jori/jrtplib/jrtplib.html)
Although both are suitable, jrtplib is preferred.
Audio and/or video data can be encrypted as they are transmitted across the network, wired or wireless. Various encryption programs are available, including standard cipher algorithms, Secure Socket Layer, OCF (OpenBSD Crytographic Framework), Secure RTP, IPSec, and Openswan. Other encryption schemes will undoubtedly arise in the future. Presently, Openswan is preferred. There are also several approaches for key management, including fixed shared keys, manually set shared keys, and dynamic keys such as described at http://www.securemulticast.org/msec-index.htm. Either manual or fixed shared keys are preferred.
FIG. 9 illustrates the transmission of a video frame using RTP at the transmitter 11. As shown, the hardware signals an interrupt after it has transferred a compressed video frame via DMA to system memory. It breaks the frame into multiple data blocks, leaving gaps in-between for software to insert headers. FIG. 9 shows block i−1 and block i as examples. The device driver wakes up the logical driver which is pending at select( ). The logical driver converts the interrupt status into a frame ready event for the active object. If it is in the right state, the active object passes the frame to the LDRTPVideo logical driver. LDRTPVideo requests the crypto driver to encrypt the frame via an ioctl( ). When it returns, the encryption request has been queued up by the driver. The crypto driver notifies the completion of encryption by waking up LDRTPVideo asynchronously. For each data block in the frame, LDRTPVideo adds an RTP header by calling SendFrameInPlace( ). Since space for the header is already reserved, there is no need to allocate a new buffer and copy the data block. LDRTPVideo sends each RTP packet to the socket by calling sendto( ). It copies the packet to a buffer allocated by the kernel IP stack. After sendto( ) is called for all packets in a frame, LDRTPVideo sends an event to the active object to signal that the frame has been sent (to the socket). The active object releases the ownership of the frame buffer back to the device driver so that it can reuse it to store a new frame. The IP stack adds the UDP and IP headers and asks the network driver to transmit the packet. The network driver sets up a DMA transfer to send the packet to hardware.
FIG. 10 illustrates the reception of a video frame at the receiver 12. As shown, the LDRTPVideo logical driver waits for packets to arrive by calling select( ) at the socket. When a packet arrives at the network interface, the packet is processed by the kernel IP stack. Since a packet is ready to read, select( ) returns. LDRTPVideo calls recvfrom( ) to read the packet into its buffer. Packet data is copied to the buffer provided by LDRTPVideo. LDRTPVideo call POPlaybackBuffer::StorePacket( ) to store the received packet into the frame buffer to reassemble a complete frame. For efficiency, event is not used here and a function is called directly. Note that the buffer memory is allocated by the AV subsystem device driver. LDRTPVideo calls crypto driver to decrypt the frame when the frame is complete. The crypto driver notifies LDRTPVideo asynchronously when decryption is done. LDRTPVideo marks the frame as playable. The AV subsystem FPGA issues an interrupt when it has completed a frame transfer to AD202 decoder. This interrupt requests software to update the frame pointer in FPGA. Note that the frame pointed to by the original frame pointer in FPGA may still be accessed until the next interrupt. The device driver wakes up the LDAVData logical driver. LDAVData converts the interrupt status into a data request event for the active object. The active object performs time synchronization and gets the frame to playback from the frame buffer. It passes the frame pointer to LDAVData via an event. LDAVData passes the frame pointer to the device driver. The device driver sets up the video frame pointer in FPGA as the next frame to transfer to AD202 decoder.
In order to achieve maximum transmission throughput it is preferable to avoid having the processor copy the data. Therefore a DMA controller within the A/V subsystem FPGA transfers the video/audio data into frame buffers owned by the A/V subsystem driver. The FPGA has an array of pointers to 16 video frame buffers and 16 audio frame buffers. The frame pointer arrays will be initialized by the processor during startup. The FPGA will iterate through all the frame buffers before beginning at the first one again. The frame buffers will be memory mapped by the A/V subsystem logical driver in order to avoid having to copy the data into user space. The A/V subsystem driver will notify the logical driver of the arrival of data. If there is an active session the logical driver will then send an event to AOAvSessionTx to transmit the data, using RTP. Note that the marker bit in the RTP header will be set to indicate the start of a frame. The payload type field will indicate whether the packet is video or audio.
FIG. 11 shows example transmit frame buffers. To prevent IP fragmentation the maximum data payload should be the maximum UDP packet size (1472) minus the RTP header, IP header and IPSec header size. In addition the DMA destination address must allow space for the RTP, IP, and IPSec headers to be inserted by the application. The FPGA allows the maximum data packet size and reserved space for a header to be set by the processor.
FIG. 12 illustrates how the FPGA places data in a single video frame buffer. The first data packet of the frame has a header appended by the FPGA specifying the timestamp, frame number data packet count, order and size of the chrominance and luminance components.
The receiver 12 audiovisual subsystem data interface includes an audiovisual subsystem driver that owns an array of receive video and audio frame buffers. An example receive frame buffer format is shown in FIG. 13. The receive frame buffers will be memory mapped by the A/V subsystem logical driver in order to avoid having to copy the data into user space. Video and audio packets minus the RTP header are written to a frame buffer by the AOAvSessionRx object. When a complete frame has arrived the FPGA is informed of the new frame and the next frame buffer can start to be filled by the processor. A frame complete interrupt informs the processor that the frame has been transferred to the decoder and is ready to receive the next frame.
The receivers 12 operate in synchronism. Continuous audio and video streams are delivered in real-time. When using asynchronous networks for data transmission, however, timing information of the media units produced gets lost and a mechanism is required to ensure continuous and synchronous playback at the receiver side. Inter-stream synchronization between audio and video streams, as well as between different receivers are also required.
The paper, Laoutaris, “A Survey of Playout Schedulers” presented a number of synchronization schemes. For this embodiment, the synchronization scheme is a time-orientated one and uses an approximated clock. Media units are time-stamped at the transmitter and the receiver clock is synchronized to the transmitter clock at regular intervals. In RTP, this is achieved by using the timestamp field in RTP headers and sending the transmitter clock regularly via RTCP Sender Report packets.
Packets arriving at the receiver are buffered in order to compensate for varying propagation times between the transmitter and receiver. The jitter buffer should be sized to allow for the largest network delay. The total end-to-end delay of a Media unit is fixed and is composed of a variable network delay and buffering delay introduced by the jitter buffer.
Packets that arrive at the receiver with a timestamp larger that the local clock are buffered. Packets that arrive with timestamps smaller the local clock are discarded. Packets are extracted from the buffer and played when the local clock equals their timestamp.
The following sections present the formal and concrete design of the synchronization scheme introduced above. They discuss intra-stream and inter-stream synchronization, as well as how to incorporate reliability into RTP.
1. Intra-Stream Synchronization (Between Transmitter and Receivers)
In this scheme, there is no concept of a global clock. Rather, receivers in the network approximated the transmitter clock and use it to derive the playback clock (or virtual clock).
A. Normalized Offset Between Transmitter and Receiver Clocks
The clocks of the transmitter and receivers are 32-bit counters incremented periodically by the encoding and decoding hardware respectively. They are not physically synchronized to one another so there are offset and will drift among them.
Referring to FIG. 14, Ttx(t) and Trx(t) are the step functions representing the transmitter and receiver clock values at time t respectively. The goal of synchronization is to produce an accurate estimate of Ttx(t), denoted by Ttx, estimated(t), at the receiver. To achieve this, the transmitter periodically sends its clock value to the receiver via the “RTP timestamp” field in the Sender Report packets. For example at some time tn, it sends Ttx(tn) to the receiver.
When the receiver gets the Sender Report packet at time tm, it measures the current offset between the transmitter and receiver clocks, Toffset, measured(tm), by calculating the difference between Ttx(tn) carried by the Sender Report and the current receiver clock value, Trx(tm). That is:Toffset,measured(tm)=Ttx(tn)−Trx(tm)
Note that all clock and offset values are treated as 32-bit signed integers using 2's complement. That is, 0x7FFFFFFF is the most positive clock value. After one increment, it becomes 0x80000000 which is interpreted as the most negative clock value.
When the offset is positive, we say the transmitter clock is leading the receiver clock. When negative, the transmitter clock is lagging the receiver clock. When zero, the two clocks are in phase. Discontinuity happens when the phase difference between the two clocks crosses 180°. The offset jumps from the most positive to the most negative (or vice versa). For example, consider c=a−b. If a is 0x00001000 and b is 0x80001000, a−b=0x80000000 (most −ve). When b is incremented by just one to be 0x80001001, a−b=0x7FFFFFFF (most +ve). This would cause problems when calculating averages using signed arithmetic. For example, the average of 0x80000000 and 0x7FFFFFFF is zero which incorrectly means in-phase.
To avoid the above problem, we ensure the offset does not overflow or underflow. Let the initial offset at to be Toffset, measured(to), we have:Toffset,measured(to)=Ttx(tp)−Trx(to),where tp is the time when the Sender Report is sent. We derive the normalized receiver clock (FIG. 15) and the normalized offset measured at some time tm (FIG. 16) as:T′rx(t)=Trx(t)+Toffset,measured(to)T′offset,measured(tm)=Ttx(tn)−T′rx(tm)  (1)
To verify, at initial time to,
                                          T                          offset              ,              measured                        ′                    ⁡                      (                          t              o                        )                          =                                            T              tx                        ⁡                          (                              t                p                            )                                -                                    T              rx              ′                        ⁡                          (                              t                o                            )                                                              =                                            T              tx                        ⁡                          (                              t                p                            )                                -                                    T              rx                        ⁡                          (                              t                o                            )                                -                                    T                              offset                ,                measured                                      ⁡                          (                              t                o                            )                                                              =                                            T                              offset                ,                measured                                      ⁡                          (                              t                o                            )                                -                                    T                              offset                ,                measured                                      ⁡                          (                              t                o                            )                                                              =        0            
Now the initial offset is normalized to 0. Over time, drift between the transmitter and receiver clocks will cause the offset to change slowly. Since the rate of change is so slow, it is safe to assume that the offset will not overflow or underflow before synchronization restarts (for a new RTP session). To validate this argument, assume the resolution of the transmitter and receiver clock is 1 ms (1 increment per ms) and the clock drift is 1 s per minute (which is huge). It would take 4 years for the offset to overflow or underflow.
Using values of T′offset, measured(tm) for various tm, we can construct the step function T′offset, measured(t) representing the (normalized) measured offset between the transmitter and receiver clocks at time t (FIG. 16).
B. Estimation of Transmitter Clock by Receivers
In equation (1), because of network and processing delays, tn and tm are not identical and hence the measured offset differs from the actual one defined as FIG. 15:T′offset,actual(t)=Ttx(t)−T′rx(t)
We can represent the measurement error as a constant error ε caused by fixed delays, plus a varying component δ(t) caused by jitter. Now we have the relation:T′offset,actual(t)=T′offset,measured(t)+ε+δ(t)  (2)
Because of clock drift between the transmitter and receiver, T′offset, actual(t) is not constant, but changes slowly over time. However within a small time window, it is almost constant and can be treated as so. Also, δ(t) averages to zero. Using these two properties, we can estimate the actual offset by calculating the running-average as follows:
Let averageN,L,S(f(t)) be the running average of f(t) over the last N samples of f(t), with the largest L samples and the smallest S samples ignored to avoid errors caused by extreme jitters. We have:
                                                                                                                       T                                          offset                      ,                      estimated                                        ′                                    ⁡                                      (                    t                    )                                                  =                                ⁢                                                      average                                          N                      ,                      L                      ,                      S                                                        ⁡                                      (                                                                  T                                                  offset                          ,                          measured                                                ′                                            ⁡                                              (                        t                        )                                                              )                                                                                                                          =                                ⁢                                                      average                                          N                      ,                      L                      ,                      S                                                        ⁡                                      (                                                                                            T                                                      offset                            ,                            actual                                                    ′                                                ⁡                                                  (                          t                          )                                                                    -                      ɛ                      -                                              δ                        ⁡                                                  (                          t                          )                                                                                      )                                                                                                                          ≈                                ⁢                                                                            average                                              N                        ,                        L                        ,                        S                                                              ⁡                                          (                                                                        T                                                      offset                            ,                            actual                                                    ′                                                ⁡                                                  (                          t                          )                                                                    )                                                        -                  ɛ                                                                                                        ≈                                ⁢                                                                            T                                              offset                        ,                        actual                                            ′                                        ⁡                                          (                      t                      )                                                        -                  ɛ                                                                                        (        3        )            
The values of N, L and S are to be determined empirically. Later we prove that the constant error ε can be cancelled out.
Now we introduce T′offset, used(t) to be the offset function actually used by the receiver. In the simplest case, we use the estimated offset function directly:T′offset,used(t)=T′offset,estimated(t)  (4)
However, as we shall see later, changes in T′offset, estimated(t) over time (due to clock drift) may cause the playback clock to cross the frame boundary which would result in frame skip/repeat. In terms of user experience, it is arguable whether it is better to have frequent but small skip/repeat, or to have rare but large skip/repeat. Introducing Toffset, used(t) gives us the flexibility to update it with Toffset, estimated(t) only when their difference is larger than a certain threshold. For simplicity, we assume (4) holds for the rest of the discussion.
With Toffset, used(t) defined, the receiver estimates the transmitter clock as below (FIG. 17). FIG. 18 demonstrates how the estimated transmitter clock follows the actual one.
                                                                                          T                                      tx                    ,                    estimated                                                  ⁡                                  (                  t                  )                                            =                                                                    T                    rx                    ′                                    ⁡                                      (                    t                    )                                                  +                                                      T                                          offset                      ,                      used                                        ′                                    ⁡                                      (                    t                    )                                                                                                                          =                                                                    T                    rx                    ′                                    ⁡                                      (                    t                    )                                                  +                                                      T                                          offset                      ,                      estimated                                        ′                                    ⁡                                      (                    t                    )                                                                                                          (        5        )            
C. Playback Clocks
Once the receiver has an estimate of the transmitter clock, Ttx, estimated(t), it can derive the playback clocks from it. Because of the timing difference between audio and video decoding, there are separate playback clocks, Tplayback, audio(t) and Tplayback, video(t). They are derived from the estimated transmitter clock as follows:Tplayback,audio(t)=Ttx,estimated(t)−Tdelay,audio  (6a)Tplayback,video(t)=Ttx,estimated(t)−Tdelay,video  (6b)where Tdelay,audio and Tdelay,video are constant non-negative playback delays (in transmitter clock unit) for audio and video respectively. They allow the receiver to buffer up packets to absorb network and processing jitters.
Now we prove that the playback clock is synchronized to the original transmitter clock within a constant delay. We take audio as example, but it can be generalized to video as well.
                                          T                          playback              ,              audio                                ⁡                      (            t            )                          =                                            T                              tx                ,                estimated                                      ⁡                          (              t              )                                -                                    T                              delay                ,                audio                                      ⁢                                                  ⁢            from                                              (                  6          ⁢          a                )                                =                                            T              rx              ′                        ⁡                          (              t              )                                +                                    T                              offset                ,                estimated                            ′                        ⁡                          (              t              )                                -                                    T                              delay                ,                audio                                      ⁢                                                  ⁢            from                                              (        5        )                                ≈                                            T              rx              ′                        ⁡                          (              t              )                                +                                    T                              offset                ,                autual                            ′                        ⁡                          (              t              )                                -          ɛ          -                                    T                              delay                ,                audio                                      ⁢                                                  ⁢            from                                              (        3        )                                =                                            T              tx                        ⁡                          (              t              )                                -          ɛ          -                                    T                              delay                ,                audio                                      ⁢                                                  ⁢            by            ⁢                                                  ⁢            definition                                                                              =                                            T              tx                        ⁡                          (              t              )                                -                      T                          delay              ,              audio                        ′                                              (        7        )            where T′delay,audio is a constant equal to Tdelay,audio+ε.
D. RTP Timestamps
In each RTP packet, the transmitter puts the transmitter clock value at sampling instant ts, Ttx(ts), into the “timestamp” field of the RTP header. This indicates to the receiver when this packet should be played according to the playback clock, for both audio and video packets. In case time-stamping takes place after video compression, the delay introduced by compression should be compensated to ensure that the packet timestamp represents the sampling instant.
On the receiver side, it saves the received RTP packets into the jitter buffers. The decoding hardware interrupts the processor at fixed frequency equal to the frame rate (according to the receiver clock). We call it the frame interrupt. When handling the frame interrupt, the receiver compares the current playback clocks, Tplayback,audio(ti) and Tplayback,video(ti) to the “timestamp” fields of the received RTP packets to determine which audio and video packets are to be decoded.
Taking audio as example, the receiver checks whether Tplayback,audio(ti) falls into the playback period of each playable audio frame in the jitter buffer, starting from the oldest. As the phrase is used herein, a frame is playable if all of its packets have been received and the frame is decrypted. An audio frame is a set of consecutive audio packets with the same timestamp. The playback period of a frame is defined as the range:                1. if a next playable frame is available in jitter buffer,                    [timestamp of this frame, timestamp of next good frame)                        2. otherwise,                    [timestamp of this frame, ∞)Denote the playback period of a frame as [Plower bound, Pupper bound). There are three possibilities:                        1. Tplayback,audio(ti) within the range.                    (i.e. Tplayback,audio(ti)−Plower bound>0 and Pupper bound−Tplayback,audio(ti)>0)            The receiver plays this audio frame.                        2. Tplayback,audio(ti) earlier than the range.                    (i.e. Plower bound−Tplayback,audio(ti)>0)            This means the receiver clock has been running faster than the transmitter clock and it is not the time to play this audio frame yet. The receiver either plays silence, white noise, or the previous frame if one is available (depending on which scheme is best in concealing error.)                        3. Tplayback,audio(ti) later than the range.                    (i.e. Tplayback,audio(ti)−Pupper bound>=0)            This means the receiver clock has been running slower than the transmitter clock and the time to play this audio frame has passed. The receiver skips this audio frame and repeats the check on the next frame in the buffer.                        
As stated before, clock values are interrupted as 32-bit signed integers. This automatically handles the wrap-around cases during comparison. Discontinuity in computing differences is not an issue here since the playback clock is very close the timestamp (far from being 180° out of phase).
In the above example, we observe that packets are sent according to the transmitter clock Ttx(t) and they leave the jitter buffer according to the playback clock Tplayback,audio(t). Since the playback clock is synchronized to the transmitter clock within a constant delay T′delay,audio, the number of packets in transit and in the jitter buffer is equal to the number of packets sent in the duration T′delay,audio, which is a constant. As the network and processing delays vary, the number of packets in the jitter buffer varies. Provided a large enough buffer, buffer underflow should not happen. In this design, the size of the jitter buffer is determined, empirically without proof.
2. Inter-Stream Synchronization
A. Between Audio and Video Streams
Using intra-stream synchronization explained in the previous section, we can synchronize the playback clock of a stream at the receiver to the transmitter clock. As in this design both audio and video streams are time-stamped using the same clock source, inter-stream synchronization is implicitly achieved by virtue of intra-stream synchronization.
For video stream, all packets of a video frame share the same timestamp as the first packet of the frame. For audio stream, all packets sampled in the duration of a video frame have the same timestamp as the first video packet of the frame. We call the set of those audio packets having the same timestamp to be in an audio frame.
As the decoding time for video packets is longer than that for audio packets, in order to ensure that video and audio packets with the same timestamps output at the decoder simultaneously, the video playback clock Tplayback,video(t) should be ahead of the audio playback clock Tplayback,audio(t). That is, the audio playback delay should be larger than the video playback delay and we have:Tdelay,audio=Tdelay,video+η  (8)where η is the absolute value of the difference between video and audio decoding times (in transmitter clock unit). η is to be determined empirically and has been suggested to be around 2 video frame periods.
B. Among Receivers
Using intra-stream synchronization, the playback clock of a receiver is synchronized to the transmitter clock within a constant delay. Since there is only one transmitter in a network, if all receivers choose the same playback delay (Tdelay,audio/Tdelay,video), they are effectively synchronized to each other.
As this synchronization scheme is software-based, the underlying hardware clocks of receivers are still not synchronized. For example, frame interrupts may happen at different times on different receivers. Assuming ideal software synchronization (i.e. Ttx,estimated(t)=Ttx(t)), there are still errors caused by phase differences among hardware clocks, which is upper-bounded by the frame period as illustrated in FIG. 19.
At 30 frames per second, the error is limited by 33 ms. This is acceptable since a delay less than 100 ms will be perceived as reverberation rather than echo.
3. Reliable Multicast
RTP runs on top of UDP which is an unreliable transport protocol. RTP itself does not provide reliability services. In general, this is acceptable to multimedia streams since the emphasis is on efficiency and on meeting the timing requirements of the majority of packets. A retransmitted but delayed packet would be of little use to the user.
While the loss of a single video frame may not be perceived by the user, the loss of an audio frame may be more noticeable. In order to provide a high-quality service, we extend RTP to support reliable multicast for audio. The basic theory has been brought up in an RFC draft called “RTP extension for Scalable Reliable Multicast” dated 1996, which we improve upon and modify into the present environment.
A. Sequence Numbers
The order of packets is identified by a 16-bit sequence number in the RTP header. It increments by one for each packet sent. When it reaches 0xFFFF, it wraps around to 0. Sequence numbers are interpreted as 16-bit signed shorts using 2's complements. Signed arithmetic automatically handles wrap around cases when comparing sequence numbers. Given two sequence numbers a and b, we say a leads (is ahead of) b if a−b>0 and a lags (is behind) b if a−b<0.
A tricky situation occurs when a and b are offset by about half of the total range. A small change in a or b would cause a−b to jump from the most positive to the most negative. In other words, it is ambiguous whether a leads b by vice versa. In reality, we will only compare sequence numbers within a small window relative to the total range, and therefore the ambiguity would not occur.
To assist frame re-assembly, the fixed RTP header is extended to include fields “frame number” and “packet number”. Frame number increments by one for each audio/video frame sent and packet number identifies the position of the packet in the frame. They help locate the space in the frame buffer to store a received packet. Like sequence number, the frame number is interpreted as a signed integer (32-bit).
B. Detection of Lost Packets
Unlike TCP in which senders detect packet losses via positive acknowledgements (ACKs), this design places that responsibility to receivers by using negative acknowledgements (NACK), which reduces communication overheads.
Let the sequence number of the first received packet be N0. When the receiver gets the first packet, it sets the expected sequence number Nexpected to N0+1.
When the receiver gets a packet, it compares its sequence number Nk to Nexpected. There are several possibilities:                1. Nk=Nexpected                     This is the normal case in which the sequence number of the received packet matches the expected one. It means it is an in-order packet. Nexpected increments by one.                        2. Nk leads Nexpected                     (Nk−Nexpected>0)            This indicates some packet(s) are missing, which may be caused by packet loss or out-of-order delivery. The number of missing packets is equal to Nk−Nexpected. We add entries for all missing packets to a linked list of MissingPacket objects defined as:            class MissingPacket                        
{short seqNum;// sequence number of packetint nackTime;// time to send NACKs to transmitterint ignoreTime; // until which to ignore duplicate NACKs};                                    This linked list stores the sequence numbers of missing packets. The purpose of nackTime is to avoid all receivers missing the same packet from sending NACKs at the same time. The purpose of ignoreTime is to allow the receiver to ignore duplicate NACKs arriving within a short period. Their uses are explained in the next section.                            Finally, we set Nexpected=Nk+1.                                                3. Nk lags Nexpected                     (Nk−Nexpected<0)            This indicates an earlier missing packet or a duplicate packet has arrived. If it is a duplicate, it is discarded. If it is a missing packet, it is either delivered out-of-order or retransmitted. In either case, it is stored to the frame buffer and its entry in the linked list of MissingPacket objects is removed.                        
C. NACK Suppression
In a multicast environment, a packet is sent to multiple receivers. If one receiver misses a packet, it is likely that others miss it too. If all receivers send NACKs at the same time, it may cause network congestion and result in more packet losses. Besides, since a successful retransmission after a single NACK is seen by all receivers, multiple NACKs are unnecessary.
The solution is to have receivers waiting for random times before sending NACKs. We maintain the timer by MissingPacket::nackTime (called nackTime for brevity). It is more scalable than creating a framework-based timer object for each missing packet. When a missing packet is detected, we initialize nackTime according to:nackTime=Trx(tc)+multiplier(timeout_count)*Tnack—wait  (9)where Trx(tc) is the current receiver clock value, timeout_count is the number of times the timer expired (which is zero here) and Tnack—wait is the initial duration to wait-before sending NACKs. The function multiplier(n) determines by how much the timeout period is increased after each time-out. The function is to be determined but it is required that multiplier(0)=1 and multiplier(n+1)>=multiplier(n). Possibilities include linear (1+n), exponential (2^n) or constant (1). Incidentally, ignoreTime is initialized to Trx(tc) such that NACKs will not be ignored initially (see later). The choice of Tnack—wait will be discussed later.
The receiver checks for time-outs periodically by means of interrupts, such as frame interrupts. It scans the list of MissingPacket objects and for each object compares nackTime against the current receiver clock value Trx(tc). If the timer expires (i.e. Trx(tc)−nackTime>=0), it multicasts NACKs to request for retransmission. Note that multiple sequence numbers can be carried by one NACKs packet to reduce overhead. After sending NACKs, the receiver increments timeout_count and resets nackTime according to (9) to wait for retransmission. It also sets up ignoreTime as explained in the next paragraph.
If the missing packet is received before time-out, the timer is canceled and its entry in the linked list of MissingPacket objects is removed. If a NACKs packet is received and is not ignored (see later), the receiver treats it as time-out. It increments timeout_count and resets nackTime according to (9) to wait for retransmission. To avoid duplicate NACKs causing nackTime to increase multiple times in a short period, after resetting nackTime, the receiver sets ignoreTime halfway between the current time and the new nackTime. If a NACKs is received before nackTime (i.e. Trx(tc)−ignoreTime<0), it is ignored. As a special case, set ignoreTime to the current time to not ignore any NACKs.
The original RFC draft requires each receiver sets its initial wait time (Tnack—wait) to be a random number within in a certain range [C1, C2] where C1 and C2 are constants. The purpose is to avoid receivers sending NACKs simultaneously. In this design, time-outs are polled by means of interrupts. Assuming frame interrupts are used, the resolution of timeout is limited by the period of frame interrupts, which is 16 ms at 30 frames/sec. With such course resolution, in order to provide enough randomness, the range [C1, C2] should be large. It means a longer wait time before sending NACKs and hence calls for a longer playback delay, which is undesirable.
Fortunately, by the fact that the hardware clock of different receivers are not synchronized to each other, there is a random phase difference between the frame interrupts on any two receivers. As a result, there is randomness in the time when a receiver checks for time-outs and sends NACKs. Therefore, we can choose [C1, C2] to be a small range, yet provides enough randomness. C1 and C2 are to be determined empirically. Possible values are C1=frame period (in receiver clock unit) and C2=2*C1.
D. Retransmission Suppression
Even with NACKs suppression, multiple NACKs from different receivers for the same missing packet may still reach the transmitter within a short period. It is unnecessary to retransmit the packet multiple times. The solution is to start a timer after retransmitting a packet. If NACKs for the same packet arrive before time-out, they are ignored.
First we introduce the class ReTxPacket:
class ReTxPacket
{short seqNum;// sequence number of packetint ignoreTime; // until which to ignore duplicate NACKs};
After a packet is re-sent, the transmitter adds an entry for it in the linked list of ReTxPacket objects. The entry contains its sequence number and the time until which retransmission requests for the same packet are to be ignored. The time is initialized to:ignoreTime=Ttx(tc)+Tignore  (10)where Ttx(tc) is the current transmitter clock value and Tignore is the ignoring duration. Tignore is a constant to be determined empirically.
When a NACKs is received, the transmitter checks the sequence number(s) it contains against those in the linked list of ReTxPacket objects. For each matched object, it checks if ignoreTime has expired. If not (i.e. Trx(tc)−ignoreTime<0), the retransmission request for that packet is ignored.
The transmitter loops through the linked list of ReTxPacket objects periodically to purge entries with expired ignoreTime.
Next, the transmitter A/V subsystem control is described with respect to FIG. 20. The HDMI receiver supports the following input pixel encodings.                4:4:4 YCrCb 8 bit        4:2:2 YCrCb 8, 10, and 12 bit        RGB 8 bit        
The output of the HDMI receiver is connected to the digital interface of a multi-format video decoder and graphics digitizer. Regardless of the input pixel encoding the HDMI receiver colorspace converter must be used to set the pixel encoding to 4:4:4 YCbCr 24 bit as this is required by the Component Processor of the video decoder. The colorspace converter of the video decoder is used to convert the output to 4:2:2 YCrCb 16 bit necessary for the JPEG2000 encoder. ally to purge entries with expired ignoreTime.
HDMI carries auxiliary data that describe the active audio and video streams. This includes the following data.                Auxiliary Video Information (AVI) InfoFrame        Audio InfoFrame        Source Product Description (SPD) InfoFrame        Audio Content Protection (ACP) Packets        International Standard Recording Code ISRC1/ISRC2 Packets        
The auxiliary data needs to be sent from the source to the sink. In our product this data is treated as out-of-band information and will be sent as control packets over the wired or wireless link. The format of infoFrames and infoPackets can be found in the CEA-861B specification.
When the video source is DVI separate inputs are used for audio. An audio CODEC is used to generate an I2S digital audio stream.
The host processor is required to perform initial configuration of the A/V Subsystem. In addition configuration is required whenever the video resolution or audio format of the source changes. At a high level the following is required.
The HDMI Receiver of FIG. 20 performs the following:                1. Set audio PLL and VCO range        2. Set HSYNC and VSYNC source, polarity and timing        3. If the input encoding is not 4:4:4 YCrCb enable the color space converter and program the coefficients        4. Enable BT656—Start of Active Video (SAV) and End of Active Video (EAV) controls        5. Enable SPDIF audio output        6. Monitor the New Data Flags (NDFs) to detect changes in the auxiliary data        7. Read the AVI, audio and SPD Infoframes along with ACP and ISRC1/C2 packets, to send to the receiver        
The Video Decoder of FIG. 20 performs the following:                1. Set the global registers        2. Set the primary mode to HDMI support        3. Set video standard        4. Set the color space converter to output 4:2:2 YCrCb        5. Set the Component Processor registers        
The JPEG2000 Encoder of FIG. 20 performs the following:                1. Set PLL registers        2. Set bus mode        3. Load encode firmware        4. Set encode parameters        5. Set dimension registers (If custom mode)        6. Start program        
In order to support custom formats the dimension registers must be set using the information available in the AVI InfoFrame.
The Audio CODEC (TLV320AIC33) of FIG. 20 performs the following:                1. Set the PLL        2. Select the input source        3. Setup the Programmable Gain Amplifier (PGA)        4. Select I2S output        5. Setup the DAC and outputs to provide local audio        
The receiver A/V subsystem control will now be described with respect to FIG. 21.
In the receiver the host processor sends video frames to the JPEG2000 decoder via the A/V subsystem FPGA. A HDMI transmitter receives uncompressed video from the decoder and outputs an HDMI stream. Audio frames are sent to the A/V subsystem FPGA which after processing forwards the audio data to the HDMI transmitter as well as the audio CODEC. When the source is DVI audio is supplied via separate audio connectors from the audio CODEC.
The host processor performs the following configurations:                JPEG2000 Decoder (ADV202)        Set PLL registers        Set busmode        Load decode firmware        Set decode parameters        Set dimension registers        Start programFor custom formats the values for programming the dimension registers come form the AVI InfoFrame sent by the transmitter at the start of a session and whenever a resolution change is detected.        
HDMI Transmitter (ADV9889)                Set audio type to S/PDIF        Set audio registers (N and CTS parameters)        Set input pixel encoding to 4:2:2 YCrCb 16 bit with embedded syncs        Set color space converter to set the output pixel encoding to be the same as the video source        
Audio CODEC (TLV320AIC33)                Set the PLL        Set the volume control and effects        Select the analog output        
The Display Data Channel will now be described.
The enhanced display data channel (E-DDC) is used by the Source to read the Sink's Enhanced Extended Display Identification Data (E-EDID) in order to discover the Sink's configuration and/or capabilities. HDMI Sources are expected to read the Sink's E-EDID and to deliver only the audio and video formats that are supported by the Sink. All Sinks contain an EIA/CEA-861B compliant E-EDID data structure accessible through the E-DDC.
Extended EDID (E-EDID) supports up to 256 Segments. A segment is a 256 byte segment of EDID containing one or two EDID blocks. A normal HDMI system will have only two EDID blocks and so will only use segment 0. The first EDID block is always a base EDID version 3 structure 128 bytes in length. This structure contains a Vendor Specific data block defined for HDMI systems and holds the 2-byte Source Physical Address field used for CEC message addressing. The second EDID block is not used by HDMI devices.
The HDMI transmitter reads EDID segment 0 of the connected display device when the Hot-Plug-Detect is asserted and generates an EDID Ready interrupt. The System processor can read the EDID segment via the I2C bus and send it via an out-of-band packet to the transmitter.
The system acts as a Repeater with a Duplicator function i.e. Single-input, multiple-output device, where more than one output is active. The transmitter needs to determine the video standard and audio format to use based on the EDID data from all the receivers in the system. The video standard used must be suitable for the lowest resolution display.
The HDMI/DVI source also prevents all protected audiovisual data from being copied. Content protection is provided by High-bandwidth Digital Content Protection (HDCP) specification version 1.10.
The HDCP Authentication protocol is an exchange between an HDCP Transmitter and an HDCP Receiver that affirms to the HDCP Transmitter that the HDCP Receiver is authorized to receive HDCP Content. This affirmation is in the form of the HDCP Receiver demonstrating knowledge of a set of secret device keys. Each HDCP Device is provided with a unique set of secret device keys, referred to as the Device Private Keys, from the Digital Content Protection LLC. The communication exchange, which allows for the receiver to demonstrate knowledge of such secret device keys, also provides for both HDCP Devices to generate a shared secret value that cannot be determined by eavesdroppers on this exchange. By having this shared secret formation melded into the demonstration of authorization, the shared secret can then be used as a symmetric key to encrypt HDCP Content intended only for the Authorized Device. Thus, a communication path is established between the HDCP Transmitter and HDCP Receiver that only Authorized Devices can access.
Through a process defined in the HDCP Adopter's License, the Digital Content Protection LLC may determine that a set of Device Private Keys has been compromised. If so, it places the corresponding KSV on a revocation list that the HDCP Transmitter checks during authentication. Revocation lists are provided as part of the source media (i.e. on the DVD). Other authorized HDCP receivers are not affected by this revocation because they have different sets of Device Private Keys.
An HDMI Transmitter at the source (i.e. a DVD player) can initiate authentication at any time. The HDMI Receiver responds by sending a response message containing the receivers Key Selection Vector (KSV). The HDCP Transmitter verifies that the HDCP Receiver's KSV has not been revoked.
The Receiver must gather the authentication data of all downstream sinks and report it back to the Transmitter. The required data is as follows.                KSV lists—The KSV's from all attached displays.        Device Count—The number of displays/repeaters in the connection topology.        Depth—The number of connection layers in the topology.        Max_Cascade_Exceeded—A flag that is set if Depth exceeds a maximum value (for example, 7).        Max_Devs_Exceeded—A flag that is set if the Device Count exceeds a maximum value (for example, 127).        
The HDMI transmitter EDID and HDCP controller is a state machine that is implemented in hardware. Its purpose is to retrieve the EDID and Key Selection Vectors from downstream receivers. The following steps illustrate the sequence in which the state machine performs the EDID and HDCP handling. This process takes place every time a Hot Plug Detect is sensed (as described in step 1). It will also take place every time the transmitter requests a re-authorization. In this case, software would begin the re-authorization at step 4.                1. Hot Plug Detect goes high (toggled by the attached display)        2. The AD9889 automatically reads EDID segment 0. The EDID Ready flag (0xC5[4]) is set once the EDID has been read successfully and sends an EDID Ready interrupt to the system.        3. After receiving EDID Ready interrupt, the system software evaluates the EDID. (EDID data is stored under at I2C address 0x7E, beginning at offset 0.        4. Once the Receiver has set the video/audio mode it then sets the HDCP Desired bit (0xAF[7]) to high. The HDMI/DVI bit (0xAF[1]) should not be changed after setting the HDCP Desired bit.        5. The receiver's BKSV is reported in the BKSV registers (0xBF-0xC3), the BKSV flag is set (0xC7[7]) (generates an interrupt) and the BKSV count (0xC7[6:0]) is set to 0 by the AD9889.        6. Upon receiving the BKSV flag interrupt, the system software reads the BKSV and clears the BKSV Flag interrupt.        7. Once the BKSV flag is cleared, the AD9889 begins HDCP encryption and checks if the receiver is a repeater.                    a. If not a repeater, then HDCP initialization is complete and the AD9889 begins HDCP management, the firmware will know if this state is reached when HDCP Controller State (0xC8) reaches state ‘4’. Process complete.                            i. One BKSV should be stored by software at this point.                ii. DEVICE_COUNT=1                iii. DEPTH=0                iv. MAX_DEVS_EXCEEDED=0                v. MAX_CASCADE_EXCEEDED=0                                    b. If the Receiver is a repeater the AD9889 must complete the HDCP repeater authentication, the HDCP Controller State will not reach state ‘4’. Continue to step 8.                        8. The AD9889 reads up to 13 KSV's from the downstream repeater (the AD9889 can only process 13 at a time)        9. The AD9889 signals a BKSV Flag interrupt with the BKSV count (can be up to 13)        10. System software reads bStatus from EDID memory space (device 0x7E) registers 0xF9 (LSB) and 0xFA (MSB). (Only read the first time through this loop)        11. bStatus[6:0] contains DEVICE_COUNT.        12. bStatus[7] contains MAX_DEVS_EXCEEDED.        13. bStatus[10:8] contains DEPTH.        14. bStatus[11] contains MAX_CASCADE_EXCEEDED.        15. If either of the ‘MAX . . . ’ values are set to 1, then we can exit this routine and just forward those status flags upstream. (Depth and Device_Count don't matter when the maximums are exceeded). Process Complete.        16. System firmware reads the BKSV Count to see how many valid BKSV's are in EDID memory.        17. System firmware reads the BKSV's from the EDID memory. This list of BKSVs is stored in a separate memory space inside the AD9889 that can be accessed by using the I2C device address 0x7E (instead of 0x72 or 0x7A). When you read from this I2C device, up to 13 BKSVs from downstream devices will be in the memory starting with offset 0x00 then 0x05 . . . etc.        18. System software clears the BKSV Flag interrupt.        19. If more KSV's remain (there are more than 13 downstream devices) then go back to step 9. Software will know if more keys remain from the DEVICE_COUNT field read in step 11.        20. System software now has a full list of all downstream KSV's+the directly attached BKSV.        
Once the Authentication is complete the AD9889 will manage the ongoing HDCP link authentication every 128 frames. A failure authentication will generate an HDCP/EDID Controller Error interrupt and restart the HDCP authentication.
The system firmware should periodically check the state of the “Encryption ON” status bit (0xB8 [6]) while sending protected audio or video to ensure that HDCP is enabled. This should be checked with a frequency of no less than once every two seconds. Checking this bit protects against third party meddling with the AD9889's register settings to defeat HDCP.
The Transmitter must consolidate all downstream Receiver KSV lists into a single list. The list is represented by a contiguous set of bytes, with each KSV occupying 5 bytes stored in little-endian order. The total length of the KSV list is 5 bytes time the total number of downstream sinks.
The Transmitter must also compare all DEPTH parameters from each attached Receiver. The maximum reported DEPTH will be incremented by 1 and reported to the source. If the new DEPTH is greater than 7, then MAX_CASCADE_EXCEEDED shall be set to 1.
The Transmitter must also collect all DEVICE_COUNT parameters from each Receiver, these numbers will be added together for a total DEVICE_COUNT to be reported to the source. If the total is greater than a maximum value (for example, 127), then MAX_DEVS_EXCEEDED shall be set to 1.
The authentication data must be forwarded to the source by the HDMI Receiver in the Transmitter. An issue here is that AD9398 doesn't provide documented registers for doing this. This issue will be solved with a new ADI HDMI Receiver (AD9399) that will be used in the production hardware.
In addition to HDCP a source may use the ACP packet to convey content-related information regarding the active audio stream. ACP packets received from the source need to be sent to all receivers in the system. The content of the ACP packet is used to program the HDMI transmitter.
This section describes the local user interface, and in particular, the set of messages between the main board and the front panel controller, including the firmware design of the front panel controller. The term “front panel controller” refers to the following components:                LPC2103 ARM7 processor        Noritake MN11216A Vacuum Florescent Display        IR Receiver        IR Transmitter        Buttons physically attached to the front of the unit        LEDs visible on the front of the unit        
These components may be located on the physical front panel board or the main board. They are all controlled by the LPC2103 processor and compose the “logic” front panel of the unit.
The front panel uses a simple control loop in the “main” function to control the system. This loop checks for event indications from the Interrupt Service Routines (ISRs). The only hardware device that this updated outside an ISR is the vacuum florescent display. Display update timing is controlled by timer0 but the updates are carried out in the foreground.
These ISRs run independently of the main loop:                Timer0—display update timer        Timer1—Capture incoming IR data (external capture mode)        Timer2—Drive outgoing IR data (drive external match pin)        UART0—IXP455 communication        
As shown in FIG. 22, UART interrupts are masked during display updates. The UART has a 16 byte deep FIFO which will prevent data loss while the interrupts are masked. The timer capture and match interrupts supporting IR are not masked during display updates.
During normal operation without IR traffic the display update will block for 200 of every 500 micro seconds.
Flash updates need to be done at the end of a full display update to prevent the display from scrolling. A 256 byte flash update requires 1 mS during which time all ISRs must be masked. Empirical testing shows that a 2 mS delay between the end of one display update and the start of the next is not noticeable on the display. Blocking the UART ISRs for a millisecond may cause data lost depending the on the baud rate and how many bytes are in the fifo when interrupts are masked. Blocking the timer capture and mask interrupts for a millisecond will disrupt IR traffic. The flash update can be held off until the IR transmitter is idle but there is no way to process received IR data during a flash write.
Inter-processor communication is shown in FIG. 23. It is done using a UART on the LPC2103 and another UART on the IXP455. The basic message format is:                Leader <STX>        Size        Command        Payload        Trailer <ETX>        
The values of STX and ETX will be contained in some of the binary data involved in the IR messages. The message processor validates messages it removes from the RX queue to correctly assemble messages.
The transmitter units contain a Noritake vacuum florescent display (VFD) on the front panel. The display characteristics are as follows:                Pixel based 112×16 VDF        Smallest updatable unit is a grid, 6×16 pixels        The maximum time to update the complete display is 10 mS per the Noritake specification. We are currently updating the entire display every 20 mS without any noticeable dimming or flicker in the display.        Grid updates are only interrupted by timer 1 and 2 when the IR is running in raw mode. These ISRs must be extremely fast. Interrupting the display for to long can result in visible problems.        
The physical display is updated a single grid at a time. Grid updates are controlled by timer0, match register 0, which expires and interrupts every 500 uS. A grid update requires approximately 200 uS.
As shown in FIGS. 24 and 25, when timer 1 expires it signals the main loop to begin a grid update. The match register reloads automatically each time it expires so the only action required in the ISR is sending the signal and clearing the interrupt.
The IXP455 can update the front panel display by sending text strings to update part of the display or by sending a full frame buffer. Text based updates will be processed by the front panel processor and written to the frame buffer in a 5×7 LCD font. Full frame buffer updates will not be processed by the front panel; they will be displayed as received.
The front panel provides two frame buffers to the IXP455. The IXP455 may write to either frame buffer at any time. The IXP455 may direct the front panel to change which frame buffer is used to update the physical display.
The IXP455 may update the front panel frame buffers by:                Sending a “Display Text String” command to update a portion of the display.        Sending a “Full Display” command followed by a full update to one of the frame buffers.        
The front panel on the receiver unit contains five (5) LEDs in place of the VFD. During boot up of the IXP455 the LEDs will display an “active” pattern to indicate that the unit is alive. Once the Linux kernel has booted and the main application is running the LEDs will be controlled by the IXP455 via FP_LED messages.
The IR Subsystem of the local user interface involves receipt and transmission protocols. IR will be received using a commercial IR receiver, such as those marketed by Sharp Corporation. The receiver demodulates the incoming signal and outputs a waveform representing the received signal.
The output signal from the receiver is connected to an external capture pin on one of the LPC2103's timers. The timer is configured to interrupt on any edge transition on the capture pin. The time of the first edge transition is not recorded but the timer itself will be reset. On subsequent edge transitions the timer value will be recorded and the timer will be reset. The time values will be recorded until it is determined that an entire IR command has been received at which time the timer values will be sent to the IXP455 in an “IR Received” message.
IR is transmitted using an IR LED. A timer running in match mode and an external GPIO pin is used to drive the LED. Upon receipt of an IR message from the man processor the GPIO pin enables the LED and the timers match register will be loaded with the first time value from the IR message. When a match interrupt occurs the GPIO pin will toggle, via the timer logic, and the match register will be loaded with the next value from the IR message. This toggle and reload operation will continue until all timer values contained in the IR message have been used.
The front panel provides the IXP455 with its (1) Firmware Version, (2) Hardware Version, (3) Processor ID, and (4) Boot Loader Version.
The infrared system provides an extension of infrared remote control using an IP based network. The method described extends the range of an infrared (IR) remote control using a wired or wireless IP based network. Although, the method describes extending the range of an IR remote control using an IP based network, the IR remote control could be extended using other types of networks.
Infrared remotes controls use infrared light to control electronic devices. The devices that need to be controlled normally need to be in line of sight with the remote control and at a short distance from the remote, normally 15 feet. This means that an IR remote control will not work with devices that are in another room, too far from the receiver or that are behind obstructions.
Therefore the described method outlines a mechanism to solve these problems and to extent the range of an infrared remote control.
As shown in FIG. 36, the infrared data from the remote control is detected by an infrared receiver that converts the infrared information into an electrical signal that a microcontroller can read. The microcontroller extracts the timing information. The timing information is transmitted using a wired or wireless IP network to another microprocessor that will use the timing data to reconstruct and re transmit the infrared data using an infrared LED. The microcontroller can be substituted by a microprocessor, or a combination of a microcontroller and a microprocessor.
With a fixed carrier infrared receiver, the infrared receiver converts the infrared light signal into an electrical signal that can be read by the microcontroller. With a fixed carrier infrared receiver the carrier is always known. The carrier is removed from the incoming IR signal and only the data is sent to the microprocessor. The data pin from the IR receiver is connected to a general IO pin. The microprocessor will look for low to high and high to low transition in the data and measure the time between those events. This timing data is packed and sent to another microcontroller using an IP based network. The second microcontroller decodes the data packets from the network and extracts the timing data information. With the timing information the microcontroller reconstructs the infrared data and adds the known carrier to it. The data with the carrier is sent to the infrared transmitter circuit and to the electronic device to be controlled. Normally the infrared transmitter consists of an infrared LED and a small transistor amplifier.
With a universal carrier Infrared Receiver, the method is similar to the fixed carrier, but the infrared receiver does not remove the carrier. The output from the IR receiver is connected to a general IO pin. The microprocessor will look for low to high and high to low transition in the data and measure the time between those events. This timing data is packed and sent to another microcontroller using an IP based network. The second microcontroller or process within the primary microcontroller decodes the data packets from the network and extracts the data and carrier timing information. With the data and carrier timing information the microcontroller then reconstruct the data with the carrier. The carrier does not have to be added because it is embedded in the data received. The data with the carrier is sent to the infrared transmitter circuit and to the electronic device to be controlled.
With a universal carrier Infrared Receiver with carrier detect, the method is again similar to the fixed carrier, but the infrared receiver computes the carrier frequency and removes the carrier from the incoming data. The infrared receiver extracts the carrier from the incoming IR signal, computes the carrier frequency, sends the carrier information and the data to the microcontroller. The microcontroller takes the data from the IR receiver and looks for low to high and high to low transition in the data and measure the time between these events. This timing data and carrier are packed and sent to another microcontroller using an IP based network. The second microcontroller decodes the data packets from the network and extracts the data timing information and the carrier. With the data timing information the microcontroller reconstructs the infrared data and adds the carrier to it. The data with the carrier is sent to the infrared transmitter circuit.
The system of FIG. 1 is a system where extending infrared remote control signals is particularly useful. In FIG. 1, video signals from the transmitter are sent via a wireless IP network to receivers that are connected to respective displays, this way video from the source can be displayed on multiple displays. Transmitter 11 and receivers 12 have infrared receivers and can receive control signals from an infrared remote. Thus, using the extension method described above infrared signals can be received at the transmitter and transmitted to the receivers via the IP network.
Now, the system's web user interface is described. The system provides a Web User Interface System that allows the user to configure system settings; display hardware and firmware version, connection status and signal strength etc; and update firmware. The web interface authorizes a single user, e.g. an audio and video (AV) system integrator, to configure hardware and software settings of Tx and Rx(s) via HTML Web pages. A Web browser communicates with embedded Web server using a 10/100 Ethernet or a 802.11a link connected to either the Tx, Rx(s) directly, through a router, or through web proxy via the Tx/Rx. The Ethernet link also transmits audio, video and control data.
FIG. 26 shows the basic blocks of the web interface.
Each Tx or Rx unit contains an embedded Web server. When the AV system integrator enters the URL of the IP address of a Tx or Rx, the embedded Web server in the Tx or Rx serves up the default page of the web interface. The system integrator can then log in as the authorized user. Once the Web server authenticated the access, the system integrator interacts with the system through HTML Web pages.
The web interface allows the AV system integrator to enter configuration parameters to configure the device through HTML Web pages. In addition, the system integrator can query hardware and firmware version as well as device status from the system. The Web pages therefore contain dynamic content. The system uses the Common Gateway Interface (CGI) standard to serve up dynamic Web pages.
The Web browser requests the URL corresponding to a CGI script or program, which follows the CGI protocol to retrieve input data from a dynamic Web page as well as compose dynamic Web page content.
The web interface incorporates the following components:                An embedded Web server, such as is available by open source under the name “Boa Web Server;”        HTML Web pages        CGI programs that use Common Gateway Interface (CGI) standard to interface with the AV core system software. The following websites provide basic information on CGI programs:                    http://cgi.resourceindex.com/Documentation and            http://hoohoo.ncsa.uiuc.edu/cgi                        
The design of WEBUIS should abstract interfaces between CGI programs and the AV core system so that when we expand Web pages and add CGI programs, the interface remain unchanged.
The Flash Memory Strategy is now described.
The flash memory map is split up into 3 regions, shown in FIG. 27:                The Boot region, which contains the boot loader firmware;        The compressed kernel region; and        The compressed image of the CRAMFS region, which contains the application, modules, libraries, FPGA bit stream and utilities.        
Persistent storage is required for the kernel, application, front panel and FPGA update files and configuration. Additionally, the receiver upgrade package must be stored to provide for the ability to upgrade receivers from the transmitter.
The kernel is simply stored compressed in flash. When booted, the kernel extracts an initial CRAMFS (initrd) image from flash for use as its root file system. The CRAMFS image is stored as a compressed ext2 file system. The root file system contains the application, utilities, libraries and required update files.
Application upgrades are handled by downloading the compressed CRAMFS image which is then written to FLASH by the application.
Boot itself may be field upgradeable if an application containing boot is downloaded to the appliance and the application reprograms the boot sector.
Dynamic memory allocation is needed for creating objects and events during runtime. Dynamic memory allocation from the heap is limited to boot-up initialization, not only to prevent potential memory leaks, but also to keep code deterministic in terms of memory usage.
The framework provides a memory pool class that may be utilized by any part of the system that requires objects to be created/destroyed dynamically. A memory pool is nothing more than a fixed number of fixed sized blocks set aside at system start. When an object is created, one of the blocks is used for the object. By knowing the type and maximum number of all objects in the system we can predetermine the memory requirements for the system and we can guarantee memory for each object.
The Video Packet Formats are now described. As shown in FIG. 28, a video frame consists of an A/V subsystem Video Header, an ADV202 Header (assuming an ADV202 Raw Format output), a JPEG2000 Header, attribute data and compressed video data.
To avoid IP fragmentation and the associated performance degradation, video frames are packetized into RTP packets such that each can fit into a single IP packet. Since for each packet, an RTP header, UDP header, IPSec ESP header and IP header will be added, the maximize size of payload in each RTP packet will be equal to the MTU of the network (1500 bytes) minus the total length of those headers which is to be determined.
A. A/V Subsystem Video Header is shown in FIG. 29. It is generated by the A/V subsystem FPGA.
The ADV202 will insert a ADV202 specific header at the beginning of the code stream. FIG. 30 contains information about this ADV202 header.
The JPEG2000 Header is shown in FIG. 31. Information about JPEG2000 markers can be found in the ISO/IEC15444-1 standard. The JPEG2000 compliant header contains main and tile headers from the JPEG2000 standard. It consists of all the parameters needed to decode an image correctly, including the quantization stepsizes. FIG. 31 lists the most common markers that are inserted into the compressed codestream by the ADV202.
The A/V subsystem FPGA will append an Audio Header to an audio frame to allow the receiver to synchronize the audio with the correct video frame. Like video frame, the audio frame is packetized into RTP packets such that each fits in an IP packet. FIG. 32 shows an example audio frame format.
The Audio Header of FIG. 32 contains the fields shown in FIG. 33.
Next, we describe playback synchronization and error control models.
A. Adaptive Synchronization Algorithm
The adaptive synchronization algorithm uses minimal knowledge of network traffic characteristics. The algorithm is immune to clock offset and drift between the transmitter clock and the receiver clock while it ensures the QoS in terms of end-to-end delay, delay jitter, and loss ratio.
The details of the algorithm are summarized here, for a full explanation of the implementation see IEEE journal Vol. 14 No. 7 “Multipoint Multimedia Teleconference System with Adaptive Synchronization”
Instead of having a fixed playout point the application is allowed to adjust it depending on network conditions. This means fewer packets are discarded because they arrive late. Also instead of discarding all data that arrives late we allow packets that arrive only slightly late to be played back. This adds a small amount of distortion but is better than missing data.
The synchronization scheme requires the user to specify the maximum acceptable jitter, JMax and the maximum acceptable packet loss ratio caused by synchronization measures, LMax. At the transmitter each packet carries a timestamp ti,g, indicating its generation time. At the receiver a Playback clock (PBC) and three event counters, namely the wait counter Cw, the nonwait counter Cnw, and the discard counter Cd, with associated thresholds Tw, Tnw, and Td, respectively, are maintained.
The PBC is nothing but a virtual clock at the receiver which emulates the clock at the sender. The motivation to have the PBC is that once the source clock can be reproduced at the sink, the synchronization problem may be readily solved. At the receiver, the PBC is initiated according to the time stamp carried by the first received object, updated by the receiver clock, and adjusted based on the contents of the three counters. The vicinity of a packet's arrival time in reference to the PBC time is partitioned by the wait boundary (Bw) and discard boundary (Bd) in to three regions: the wait region, the nonwait region, and the discard region, shown in FIG. 34.
The arrival time ti,ar, in reference to the PBC, of the ith packet may fall into one of the three regions with respect to its associated two boundaries. The synchronization algorithm conforms to the following rules.                1. If the packet with time stamp tz,g arrives before Bi,w (within the wait region), then it will be played back at Bi,w (waiting until the wait boundary).        2. If the packet with time stamp ti,g arrives after Bi,w but before Bz,d (within the nonwait region), then it is played back immediately.        3. If the packet with time stamp tz,g arrives after Bi,d (within the discard region), then it is discarded.        
The PBC is synchronized to the transmitter clock using the following algorithm.                1. Upon receiving the first successfully arrived packet set the initial PBC time equal to the time stamp carried by this packet.        2. Upon receiving the ith packet, compare its time stamp ti,g with its arrival time ti,ar, (the current PBC time PBC(t)): if ti,g>ti,ar, increase the wait counter by one and do not playback the object until Bw; else if ti,g<ti,g+Emax, increase the nonwait counter by one and playback the packet immediately; otherwise (i.e., ti,ar≧ti,g+Emax) increase the discard counter by one and discard the packet.        3. Check the most recently increased counter: if overflows continue; otherwise go to Step 2.        4. When the nonwait counter or the discard counter overflows: if the wait counter is not full, decrease the PBC:PBC(t)=PBC(t)−Δ         and go to Step 5; otherwise go to Step 5. When the wait counter overflows: if the nonwait counter is not full. increase the PBC:PBC(t)=PBC(t)+Δ        5. Reset all counters and go to Step 2.        
The thresholds of the three counters shown in FIG. 34 are critical to the performance of the synchronization algorithm. In particular, the threshold of wait counter, Tw, governs how sensitive the synchronization scheme is to the network improvement. Its value should span a time interval of at least the order of ten seconds. Otherwise the synchronization would be too sensitive to network improvement; consequently too frequent down shift of the PBC would likely occur.
B. Inter-Stream Synchronization
When inter-stream synchronization is needed a Group (Playback Clock) PBC is required. The Group PBC is set to the slowest of all PBC's. This Group PBC dominates the playback of all media in the synchronization group. Meanwhile, each medium in the intermedia synchronization does its own synchronization as if it was not in the group, but the discard decision is made in reference to the Group PBC.
An example of a group playback clock schematic is shown in FIG. 35.
C. Error Control
There are a number of characteristics of continuous media streams such as audio and video.                Strict timing requirements—If the data is not delivered before a certain point in time it has to be discarded.        Some tolerance of loss—The amount of loss that can be tolerated depends on the medium, the encoding techniques and human perception.        Periodicity—Video or audio should be delivered at a fixed rate. When transmitting continuous data across a network this periodicity is normally lost.        
Data transmitted across networks are normally subject to delay, delay jitter, resequencing of packets, and loss of packets.
The RTP protocol uses the packet sequence number to reorder packets in a stream. Buffering packets at the receiver overcomes problems related to network delay jitter. However as RTP uses UDP multicasting to deliver continuous video and audio streams packet loss will occur.
There are several methods for dealing with packet loss of video or audio data in order to provide and acceptable quality of service (QoS).                Automatic Repeat Request (ARQ)        Forward Error Correction (FEC)        Hybrid Error Control (ARQ/FEC)        Interleaving        Error Concealment        
Using ARQ a lost packet will be retransmitted by the sender. Loss of data can be detected by the sender or by the receiver. Detection by the sender requires that every receiver send an ACK for each received packet. Clearly when multicasting to a number of receivers this consumes significant bandwidth. Detection by the receiver is more efficient in this case. The receiver sends a NAK if a packet sequence number is missed in the stream. If all receiver's miss the same packet this can result multiple NAK's being sent to the sender for the same packet. This can be avoided by multicasting the NAK instead of unicasting so other receivers in the group realize a packet has already been re-requested.
FEC transmits, with the original data, some redundant data, called parities to allow reconstruction of lost packets at the receiver. The redundant data is derived from the original data using Reed-Solomon codes or a scheme which uses the XOR operation. The FEC transmitter sends k packets along with h redundant parity packets. Unless the network drops >h of the h+k packets sent, the receiver can reconstruct the original k information packets. RFC 2733 specifies a RTP payload format for generic forward error correction.
The disadvantage of FEC is that the redundant data consumes bandwidth and the difficulty in choosing the right amount of redundancy for various network conditions. A solution to this is to send redundant data when a retransmission is required instead of the original packet. This is known as Hybrid Error Control.
With ARQ and ARQ/FEC it is important that the retransmitted data or parity packet is received before the playout point otherwise the packet will be discarded. This requires having a jitter buffer at the receiver large enough to provide a delay equal to the network delay plus retransmission of a lost packet. The strict delay requirements of interactive systems usually eliminate the possibility of retransmissions. However in a non-interactive system such as ours a playout delay of 500 ms-1 second should not cause a problem.
When the packet size is smaller than the frame size and end-to-end delay is unimportant, interleaving is a useful technique for reducing the effects of loss. Packets are re-sequenced before transmission so that originally adjacent packets are separated by a guaranteed distance in the transmitted stream and returned to there original order at the receiver. Interleaving disperses the effect of packet losses. Multiple small gaps in the reconstructed stream are less noticeable to the user than a large gap which would occur in a non-interleaved stream.
Techniques for error concealment may be used by the receiver and not require assistance from the sender. These techniques are useful when sender based recovery schemes fail to correct all loss, or when the sender of a stream is unable to participate in the recovery. Error concealment schemes rely on producing a replacement for a lost packet which is similar to the original. Insertion-based schemes are the simplest to implement and repair losses by inserting a fill-in packet. This fill-in for audio data is usually very simple either silence, white noise or the repetition of the previous packet. Silence or noise insertions have poor performance however repetition or repetition with fading is a good compromise when compared to the more complex regenerative concealment methods.
When the system is non-interactive and the transmission is multicast, latency is less important than quality. Bandwidth efficiency is a concern as the transmission link may be wireless. Interleaving is seen to be an effective way of reducing the effect of packet loss coupled with error concealment by repeating the previous packet when a packet is lost.
If interleaving and error concealment do not provide acceptable QoS an ARQ or ARQ/FEC scheme can be substituted.
We now describe the hardware design of the main board. The Main board's purpose is to be the main platform for both the transmitter 11 and receivers 12. This dual purpose for the board is accomplished by providing an expansion connector. The expansion connector will be used to connect a HDMI receiver for the transmitter and a HDMI transmitter for the receiver.
The Main board also provides for the addition of a ZIgbee wireless connection, in order to allow easy control of the appliance. An example main board is shown in FIG. 37, and in this example is based on the Intel XSCALE LXP455 processor running at 266 Mhz. The following features of the IXP455 are implemented in order to complete the board:                32 Mbytes of Flash        64 Mbtes of DDR memory running at 266 Mhz (133 Mhz clock) Expandable to 128 Mbytes        1 Mini-PCI slot(Type 3 connector, 3.3V compatible only) running at 33 Mhz with option for second PCI slot is for wireless radio        1 10/100 Ethernet port        1 Front Panel Serial connection        1 Linux debug port        1 USB 2.0 compliant host port        1 USB 1.1 compliant device port        I2C bus        SPI Port        
In addition to the features directly tied to the IXP455 processor peripherals the following functions are implemented in order to complete the system.                ADV202 JPEG2000 Codecs for real time video compression of up to 1080i        ADV202 to PCI interface implemented inside of an FPGA to allow reuse of video section in multiple platforms        I2S. SPDIF and Audio AD/DA interface via FPGA        Serial Port for Serial Pass through        Serial port for Zigbee expansion board        Serial port for to interface to IR remote controller        DMA engine to move compressed video and audio from codecs into and out of system memory        Second I2C bus implemented in FPGA to allow the buffered access to video chip configuration and audio        
The details of the memory map for the IXP455 processor can be found in two documents, Intel IXP45X and Intel IXP46X Product Line of Network Processors Developer's Manual (referenced above) and Intel IXP45X and Intel IXP46X Product Line of Network Processors Data Sheet (also referenced above), and will not be repeated herein. Some of the address spaces contain individual control registers.
The memory controller takes care of the fact that most devices on the bus are 32 bits wide, For example one word contains address's zero through three. The processor is Big Endian Data so the most significant byte of a thirty two bit word is stored in memory first followed by the second. For example if the following value was read from the PCI Bus 0xFFEEDDCC and stored in DDR starting at location 0x1000, then FF would be stored at 0x1000, followed by EE at 0x1001, DD at 0x1002 and CC at 0x1003.
The FPGA memory map will be the same as the FPGA, in terms of the ADV202 codec's, will be the same as the PCI card.
The processor supports DDR1 266 Mhz (133 Mhz clock) The device is configurable for 128 Mbit, 256 Mbit, 512 Mbit and 1 Gbit DDR memory as long as they are partitioned on 32 Mbyte boundaries.
FIG. 38 details how the DDR memory fits into the memory map of the processor as designed on the main board.
The Expansion Bus memory is mapped for flash, expansion card i/o, and FPGA parallel programming. FIG. 39 shows the expansion bus flash memory configuration.
Expansion Card I/O memory usage is shown in FIG. 40. Twelve data bits(0 . . . 11), CS_N, RD_N, WR_N and ALE go to the expansion bus connector giving a space of 4K bytes of memory mapped space for I/O expansion. The particular chip select used must be configured in Multiplexed address and data mode. Twelve address bits are latched on the falling edge of ALE after which Data lines 0 thought 7 are available for eight bit reads and writes This address space is for future use on projects where switching logic is required, or for future projects.
The expansion bus of the IXP455 is a general purpose bus broken into eight chip selects that each cover a 32 Mbyte chunk of memory. Each chip select is programmable in terms of its timing, data width, multiplexed or non multiplexed address and data. The following tables show how to set up the expansion bus for the Expansion Connector and to program the FPGA.
Chip select 0 flash is set up by the Boot Configuration register and Redboot. The Chip select for the Expansion connector and the FPGA writes should be set up as shown in FIGS. 42 and 43. FIG. 42 shows the set up for the Expansion bus from the IXP455 to the Expansion Connector and FIG. 43 shows the set up for the Expansion bus from the IXP455 to the FPGA programming.
Each chip select is individually configurable so that different memory spaces on the expansion bus may have different data widths and timing. The Timing and control register for Chip Select 0 is shown below the Timing and Control Registers for the other seven chip selects are essentially identical.
The IXP455 is set up as the arbiter on the PCI Bus. There are three slots on the PCI bus as detailed in FIG. 44. The PCI bus uses a standard 32 bit, 33 Mhz, Mini PCI interface and signals.
The processor implements one Ethernet port. The Ethernet port is uses NPEA B. NPEC must also be enabled in order for the Encryption engine to be enabled. The Ethernet PHI is connected Via a MII interface to the IXP455. The Ethernet port is strapped to address 0, It will support 10/100 full or half duplex with auto negotiation. The I/O lines on NPEC must be pulled up, in order for the MII controller to operate properly. NPEA must be soft disabled by writing a one bit 11 and bit 19 of the EXP_UNIT_FUSE_RESET register.
The board supports four serial ports. Two serial ports are available via the processor and two are available across the PCI bus via the PCI interface as detailed in FIG. 45.
The USB host controller supports the EHCI register Interface, Host Function, Low Speed Interface, and Full Speed Interface. The signaling levels are compliant with the 2.0 specification.
The USB controller on the IXP455 supports USB 1.1 Low and Full speed, however the board pull ups are enabled for Full speed mode. Signaling levels are also compliant with the USB 1.1 specification.
An 8K (64K bit) Serial EEPROM is connected to the IXP455 I2C bus. It is hard wired to address seven. The device supports byte and 32 bit page writes.
A rechargeable battery is provided to maintain at least 100 hours of real time clock after main board power down. During power down, the Real time clock draws 10 uA max on the VBAT pin for the battery.
The Main board uses a programmable clock synthesizer to produce the clocks required for the processor, PCI, DDR, Expansion bus, ADV202's, FPGA, and Ethernet. The programmable clock synthesizer has 8 different registers that allow a different set of clock frequencies to be produced depending of the state three configuration inputs to the chip. These registers are set in the design by a set of external resistors. The clock synthesizer also produces a spread spectrum clock with a −2% 34 Khz modulation on the processor and PCI clock in order to reduce emissions. Using the configuration registers of the Clock synthesizer, the spread spectrum may be turned on or off. The input to the synthesizer is a 25 Mhz clock produced by an external oscillator. FIG. 46 shows the configuration of the clock synthesizers registers.
The main board clock register setup is shown in FIG. 47.
FIG. 48 defines the processor GPIO pins on the IXP455 and their functions.
The IXP455 processor has many options that are only available at boot time. The processor reads the values on the expansion bus address lines during boot in order to determine how these options are set. The address lines are internally pulled up with 47 KOHM resistors. If the address pin is left floating the processor will read a value of one on the address line. Any line that needs to a zero at boot is pulled down with a 4.7 Kohm resistor. The options are read into Configuration Register 0. FIG. 49 shows the boot configuration as defined on the main board. FIG. 50 shows the user defined values board revision. New Revisions can be added if software flow needs to change, i.e., there is a change in memory size etc.
Continuing with the hardware schematic of FIG. 37, each of the ADV202 parts will have a combined address and data bus. Each bus contains the following signal sets. Indicated directions are relative to the FPGA.                Data[31:0]—Bidirectional        Address[3:0]—Output        CS_n, WE_n, RD_n, DACK_n*—Outputs        INTR_n, ACK_n*, DREQ_n*—Inputs These signals are used by the ADV202 to negotiate DMA transactions. The FPGA assumes worst-case bus timing and pulls/pushes data from the ADV202s as fast as possible.        
Another signal, SCOMM5, will be a single FPGA output that will be routed to both ADV202s scomm pins.
Expansion Support is via the expansion board connections of FIG. 37. Those connections are shown in detail in FIGS. 51A and 51B.
This now describes the so-called Avocent Audio Visual Protocol (AAVP). It is intended for the communication among audio visual products. It is used to establish and manage an IP-based network for the transmission of real-time audio-visual data over a wired or wireless medium. Instead of being a monolithic protocol, AAVP is a collection of related protocols organized in planes and layers.
FIG. 52 illustrates the model of this protocol. The transmitter shown is the source of audio-visual data generated from the program source. The receiver is the sync of such data. A transmitter and one or more receivers form a network. This protocol describes the messages exchanged between the transmitter and receivers over the network.
The functions of this protocol can be divided into Control Plane and Data Plane. In the Control Plane, there are four types of messages, namely REQ (Request), CFM (Confirm), IND (Indication) and RESP (Response). A receiver sends REQ to the transmitter which responds with CFM. Reversely, the transmitter can send IND to a receiver which in some cases responds with RESP. The purpose of the Control Plane is to establish real-time transport sessions in the Data Plane to transfer data packets across the network.
The physical medium of the network can be wired (e.g. Ethernet) or wireless (e.g. 802.11a).
This section explains the functions of Control Plane and Data Plane. The Control Plane is further divided into Link Control and Session Control. Each plane is implemented as multiple layers. The concept of planes and layers is illustrated in FIG. 53.
1. Control Plane
A. Link Control                This establishes the communication link between a transmitter and a receiver. It enables the transmitter and receiver to communicate with each other. The protocol used for this is called Avocent Link Initiation Protocol (ALIP) and is described in more detail below. A receiver probes for any transmitters by sending a probe request message. A transmitter responds with a probe confirm message. The receiver then joins the network by sending a join request message. The transmitter can accept or deny the request based on MAC address filtering. The transmitter may assign an IP address to the receiver if the receiver does not have a manual IP address. The above process applies to both wired and wireless media. For wired medium, this is straight-forward. For wireless medium, it may involve scanning different radio channels for a transmitter.        
B. Session Control                This establishes an audio-visual session between a transmitter and a receiver. It enables the receiver to receive and play back audio-visual data coming from the transmitter. The protocol used for this is called Avocent Session Initiation Protocol (ASIP) and is described in greater detail below. The transmitter informs the receiver of the UDP ports used by the RTP streams, the multicast address, as well as the audio/video parameters.        
2. Data Plane                This transfers RTP and RTCP packets to convey real-time audio-visual data. The RTP protocol is extended to support reliable multicast which may be necessary for audio data as audio packet loss is more perceptible.        
The Link Protocol: ALIP
ALIP is used to establish and maintain the communication link between a transmitter and a receiver in a network. It also passes control and information messages between transmitter and receivers. If there are multiple receivers, there is one ALIP link between the transmitter and each receiver. The concept of ALIP links is illustrated in FIG. 54. Some ALIP messages are carried by UDP packets since they may be broadcast before an IP address is assigned.
The ALIP message format is as follows. Each ALIP message is made up of a header and a payload. The payload follows immediately after the header. All multi-byte fields are transmitted in network byte order (big-endian). The message header has a fixed length of 12 bytes and consists of the fields shown in FIG. 55.
The message payload contains fields specific to the message type. They are listed in the next section. Type codes are shown in the parentheses following the type names. The status codes are shown in FIG. 56 and are used in various ALIP messages to indicate either a reason for a request or the failure reason in a response.
1. Message Type: ALIP_PROBE_REQ (0x0001)
A receiver broadcasts this message to probe for a transmitter. On a wireless network, it uses this message to discover a transmitter on a particular radio channel. If no response is received within the timeout period ALIP_PROBE_REQ_TO, it retries on the same radio channel for ALIP_PROBE_RETRY_CNT time(s). When retries fail, it scans the next radio channel. On a wired network, it always retries on the same physical medium. This message is broadcast because the receiver does not know the IP address of a transmitter yet.ALIP_PROBE_REQ_TO=200 ms (or other suitable timing)ALIP_PROBE_RETRY_CNT=2Destination IP address=255.255.255.255
2. Message Type: ALIP_PROBE_CFM (0x0002)
A transmitter broadcasts this message in response to ALIP_PROBE_REQ. It allows a receiver to learn about the presence and properties of the transmitter. The parameters of this message are shown in FIG. 57. It is broadcasted because the transmitter does not know the IP address of the receiver.
Destination IP address=255.255.255.255
3. Message Type: ALIP_JOIN_REQ (0x0003)
The configuration of this message type is shown in FIG. 58. A receiver broadcasts this message to request to join the network. This message is broadcasted because the receiver may not have an IP address yet. If no response is received within the timeout period ALIP_JOIN REQ_TO, it retries for ALIP_JOIN_RETRY_CNT time(s). When retries fail, it regards the request failed.
ALIP_JOIN_REQ_TO=200 ms (or other suitable timing)
ALIP_JOIN_RETRY_CNT=2
Destination IP address=255.255.255.255
4. Message Type: ALIP_JOIN_CFM (0x0004)
The configuration of this message type is shown in FIG. 59. A transmitter broadcasts this message in response to ALIP_JOIN_REQ. This message is broadcast because the receiver may not have an IP address yet.
Destination IP address=255.255.255.255
5. Message Type: ALIP_POLL_REQ (0x0005)
The configuration of this message type is shown in FIG. 60. A receiver sends this message periodically (once every ALIP_POLL_PERIOD) to poll the transmitter in the network it has joined. If it does not receive a response within the timeout period ALIP_POLL_REQ_TO, it retries for ALIP_POLL_RETRY_CNT time(s). If a response is still not received after retries, the receiver detaches itself from the network.
On the other hand, a transmitter uses this message to check if a receiver previously joined the network is active. If it does not receive this message from a receiver once in the period 2*ALIP_POLL_PERIOD, it detaches that receiver from the network.
ALIP_POLL_PERIOD=2 s (or other suitable timing)
ALIP_POLL_TIMEOUT=200 ms (or other suitable timing)
ALIP_POLL_RETRY_CNT=2 (or other suitable count)
Destination IP address=Transmitter IP address
6. Message Type: ALIP_POLL_CFM (0x0006)
The configuration of this message type is shown in FIG. 61. A transmitter sends this message as a response to ALIP_POLL_REQ from a receiver.
Destination IP address=Receiver IP address
Using the above message types, a normal link establishment is shown in FIG. 62.
A scenario where the transmitter provides no response to a probe request is shown in FIG. 63.
A scenario where the transmitter provides no response to a poll request is shown in FIG. 64.
A scenario where the receiver provides no poll request is shown in FIG. 65.
A scenario where a join request fails because a receiver MAC address is not approved is shown in FIG. 66.
This now describes the so-called Avocent Session Initiation Protocol (ASIP). ASIP is used to establish an audio-visual session between a transmitter and a receiver. It enables the receiver to receive and play back audio-visual data coming from the transmitter. If there are multiple receivers, there is one ASIP session between the transmitter and each receiver. The concept of ASIP sessions is illustrated in FIG. 67. ASIP messages are carried by TCP streams which provide reliability transport. TCP is possible because ASIP messages are unicast and the IP addresses of the transmitter and receiver in a session are known to each other (with thanks to ALIP).
The ASIP message format is as follows. Each ASIP message is made up of a header and a payload. The payload follows immediately after the header. All multi-byte fields are transmitted in network byte order (big-endian). The message header has a fixed length of 12 bytes and consists of the fields shown in FIG. 68.
The message payload contains fields specific to the message type. They are listed in the next section. Type codes are shown in the parentheses following the type names. The status codes are shown in FIG. 69 and are used in various ASIP messages to indicate either a reason for a request or the failure reason in a response.
1. Message Type: ASIP_INIT_SESS_REQ (0x0001)
The configuration of this message type is shown in FIG. 70. A receiver sends this message to initiate a session to play the default program source. If it does not receive the response within the timeout period ASIP_INIT_SESS_REQ_TO, it regards the request failed. After request failure, the retry strategy is application specific.
ASIP_INIT_SESS_REQ_TO=1 s
Destination IP address=Transmitter IP address
The format of the 128-byte “EDID” field is defined in Section 3.1 (page 9 of 32) of “VESA Enhanced Extended Display Identification Data Standard Release A, Revision 1 Feb. 9, 2000” published by VESA (Video Electronics Standards Association).
The format of the 128-byte “EDID Extension” is defined in Section 7.5 (page 79 of 117) of “A DTV Profile for Uncompressed High Speed Digital Interfaces EIA/CEA-861-B May 2002” published by CEA (Consumer Electronics Association) and EIA (Electronic Industries Alliance).
2. Message Type: ASIP_INIT_SESS_CFM (0x0002)
The configuration of this message type is shown in FIG. 71. A transmitter sends this message to a receiver in response to ASIP_INIT_SESS_REQ. It carries time synchronization data, audio/video parameters and RTP settings for the receiver to configure itself for play-back. If not already started, play-back starts immediately after this message has been sent.
The format of the 15-byte AVI is defined in Section 6.1.3 (page 60 of 117) of “A DTV Profile for Uncompressed High Speed Digital Interfaces EIA/CEA-861-B May 2002” published by CEA and EIA. For reference, it is also listed in FIG. 72.
The format of the 12-byte AAI is defined in Section 6.3 (page 65 of 117) of “A DTV Profile for Uncompressed High Speed Digital Interfaces EIA/CEA-861-B May 2002” published by CEA and EIA. For reference, it is also listed in FIG. 73.
3. Message Type: ASIP_TEARDOWN_REQ (0x0003)
The configuration of this message type is shown in FIG. 74. A receiver sends this message to tear down an established ASIP session. If no response is received from the transmitter within the period ASIP_TEARDOWN_REQ_TO, it regards the request successful.
4. Message Type: ASIP_TEARDOWN_CFM (0x0004)
The configuration of this message type is shown in FIG. 75. A transmitter sends this message in response to ASIP_TEARDOWN_REQ. After receiving a teardown request, the transmitter removes the receiver from the active list. If there are no more active receivers, it stops the encoding hardware.
5. Message Type: ASIP_TEARDOWN_IND (0x0005)
The configuration of this message type is shown in FIG. 76. A transmitter sends this message to forcefully teardown a session with a receiver.
6. Message Type: ASIP_ANNOUNCE_IND (0x0006)
A transmitter sends this message to notify a receiver that an A/V source has become available. It awakes the receiver if it has entered the sleep state so that it may initiate a session again.
7. Message Type: ASIP_AUTH_REQ (0x0007)
A transmitter sends this message to request for HDCP authentication data of a receiver.
8. Message Type: ASIP_AUTH_CFM (0x0008)
The configuration of this message type is shown in FIG. 77. A receiver sends this message in response to ASIP_AUTH_REQ. It contains HDCP authentication data of the receiver. In case the receiver is an HDMI repeater, this message contains the authentication data of all HDMI sinks attached to it.
9. Message Type: ASIP_SDP_IND (0x0009)
The configuration of this message type is shown in FIG. 78. A transmitter sends this message to notify a receiver of the Source Product Description of the source.
The 28-byte SPD contains the fields shown in FIG. 79.
10. Message Type: ASIP_ISCR1_IND (0x000A)
The configuration of this message type is shown in FIG. 80. A transmitter sends this message to notify a receiver of the International Recording Code 1 of the current track.
The 18-byte ISRC1 contains the fields shown in FIG. 81.
11. Message Type: ASIP_ISCR2_IND (0x000B)
The configuration of this message type is shown in FIG. 82. A transmitter sends this message to notify a receiver of the International Recording Code 2 of the current track.
The 18-byte ISRC1 contains the fields shown in FIG. 83.
12. Message Type: ASIP_ACP_IND (0x000C)
The configuration of this message type is shown in FIG. 84. A transmitter sends this message to notify a receiver of the Audio Content Protection of the source.
The 30-byte ACP contains the fields shown in FIG. 85.
13. Message Type: ASIP_AVMUTE_IND (0x000D)
Using the above message types, a normal session establishment and tear down is shown in FIG. 87.
A scenario where there was authentication failure is shown in FIG. 88.
A scenario where a new session is established due to changes in the source is shown in FIG. 89.
A scenario where there is an interruption of the program source is shown in FIG. 65.
A scenario showing control messages in a session is shown in FIG. 91.
This now describes the real-time transport protocol (RTP) and real time transport control protocol (RTCP) employed by the present system 10. An RTP session is an association among a set of participants communicating with RTP. While RTP carries real-time data across the network, RTCP monitors the quality of service and conveys information about the participants in an on-going RTP session.
In this protocol, two RTP sessions are used, with one for audio data and the other for video data. All participants in each RTP session communicate using a multicast IP address and two adjacent UDP ports. The even (2n) port is used by RTP and the odd (2n+1) port is used by RTCP. The purpose of the Control Plane (ALIP and ASIP) is to establish the audio and video RTP sessions in the Data Plane. The concept of RTP sessions is illustrated in FIG. 92.
The format of the RTP header customized for this protocol is shown in FIG. 93 and the fields for the header are explained in FIG. 94.
Some custom RTCP packets are employed as well.
1. Custom Packet: Synchronization
To synchronize the clocks between a transmitter and a receiver, the RTCP packet shown in FIG. 95 is used. A description of the fields is shown in FIG. 96. We define a custom packet instead of using the pre-defined Sender Report because the one we define is smaller and more efficient to process. The packet type (PT) of the custom RTCP packets is 205 and the subtype for Sync is 0.
2. Custom Packet: Retransmission
In order to support reliable multicast in RTP, the custom RTCP packets shown in FIG. 97 are defined. An explanation of the fields for Single NACKs is shown in FIG. 98 and for Sequential NACKs in FIG. 99. The packet type (PT) of the custom RTCP packets is 205 and the subtypes for. Single NACKs and Sequential NACKs are 1 and 2 respectively. Collectively, they are called NACKs.
While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.