1. Field of the Invention
The present invention relates to voice data transmission systems and methods in which voice data is converted into a packet and then transmitted in packet form, and more particularly, to the voice data transmission system and method which can minimize a transmission delay time to thereby remove an unnatural recepetion voice as a head-part truncated voice.
2. Description of the Related Art
FIG. 1 shows, in network form, an example of systems of the type referred to wherein voice data is converted into a packet or packets and then transmitted on a packet basis. In the drawing, the illustrated network includes multiplex lines 1, packet exchanges 2A to 2C, packet terminals 3A to 3C, exchanges 4A to 4C, and telephone sets 5A to 5C.
FIG. 2 shows in block diagram form the interior arrangement of one of the packet exchanges 2A to 2C. This arrangement includes terminal interfaces TINF, respectively connected with the associated packet terminals, a line interface LINF which forms an interface with the multiplex line 1, a controller CONT, a bus, access controller ARB, an interrupt control bus, BUS1, a control bus, BUS2, an access control bus, BUS3, and a data bus, BUS4. Each of the terminal interfaces TINF, when receiving a calling packet from one of the packet terminals, sends an interrupt command to the controller CONT through the interrupt control bus BUS1. The controller CONT, when confirming the interrupt command, gets access to a memory (not shown) provided within the terminal interface TINF in question through the control bus BUS2 and confirms the calling data, such as the caller number, and the window size. Thereafter, the controller CONT outputs to the access control bus BUS 3 an access request to the data bus BUS4 to transmit a connection request packet to the party packet terminal which forms an opposing node. The controller CONT, when acquiring data bus access authority, then sends a connection request to the line interface LINF through the data bus BUS4. The line interface LINF, when receiving the connection request, prepares a connection request packet having the same format as a data packet and then transmits it to the multiplex line 1. The line interface LINF, when receiving a connection approval or disapproval packet from the opposing node, from the party packet terminal, sends the received packet to the controller CONT. When the controller CONT receives the connection approval packet through the control bus BUS2, the controller prepares a connection table in a memory (not shown) provided in the line interface LINF and the related terminal interface TINF, and then sends the connection approval packet to the associated terminal interface TINF. The terminal interace TINF, when receiving the connection approval packet, transmits it to the corresponding packet terminal and is thereafter put in its data transmission phase. In the data transmission phase, the terminal interface TINF sends a data packet to the line interface LINF through the data bus BUS4, in which case such a header H as shown in FIG. 3 is attached to a data D with use of the connection table prepared by the controller CONT. A combination of the header H and the data D is sent thereto as the data packet. The line inteface LINF, when receiving the data packet, stores it in a buffer provided therein and then transmits it to the multiplex line 1. In the operation of the line interface LINF, a data packet in its data transmission phase is repeated as in the terminal interface TINF. In the case of a disconnection, the receipt of a connection disapproval packet, the same operation as in the connection request is carried out except that the connection table is deleted.
FIG. 4 is a block diagram showing a prior art arrangement of a voice terminal interface which converts a voice signal into one packet or a plurality of packets. In FIG. 4, a signal data processing part is omitted for abbreviation of explanation. The voice terminal interface of FIG. 4 includes an analog interface 6A, an encoder 7 for encoding an input signal, for example, on a PCM coding basis or on a high-efficiency compression coding basis, basis. The voice terminal interface also includes a memory 8 for storing one or more blocks of codes, a voice presence/silence detector 9, a packet assembler 10 for converting codes received from the memory 8 into packets as the data part D shown in FIG. 3 and then for sending the packet to the data bus BUS4. The controller 11 performs bus access control and informs the packet assembler 10 of such data as a time stamp (not shown) in the header H, and a memory pointer controller 20.
Explanation will next be made of the signal reception section of the voice terminal interface of FIG. 4. The memory 14 functions as a fluctuation absorbing delay buffer for compensating for differences in transmission delay between signals transmitted within the network. A transmission delay time to be compensated for by the memory 14 is set to be larger than a 99% delay within the network, and the memory 14 has a capacity that allows the compensation of, for example, usually N times the blocking time. Thus the storage of N blocks is allowed. A packet disassembler 12 judges whether or not a packet received from the data bus BUS4 is destined for its own address and if so, deletes the header H from the received packet and then writes it in the memory 14. A controller 13, when the memory 14 stores the N blocks therein, outputs a decoding command signal 19 to a decoder 15 to start the decoding operation of the N blocks. When the packet disassembler 12 does not receive a packet from the data bus BUS4, the controller 13, after the contents of the memory 14 have been fully decoded, controls the switch 17 so that a low level of white noise is sent from a white noise generator 16 to an analog interface 6B.
Referring to FIG. 5, there is shown a timing chart for explaining the operation of the voice packet terminal of FIG. 4. In FIG. 5, (A) shows the time series of blocks corresponding to voice-presence parts in an input voice signal, and (B) is a chart showing the voice-presence detection timing of the voice/silence detector 9. The reason why it is impossible to detect the presence of a block voice in the input voice signal (A) in synchronism with the beginning one of blocks "1" to "13" in the block voice is that, as shown by voiceless consonant signal waveforms in FIG. 6, (A) to (C) and by voiced consonant signal waveforms in FIG. 6 (D) to (E), the head part of a voice at the beginning of an utterance is small in amplitude so that it is technically difficult to judge such a very weak signal as the presence of a voice. Also, from the viewpoint of enhancing noise-resisting properties, it is not preferable to regard such a very weak signal as the presence of a voice. Accordingly, voice detection timing takes place as delayed by a specific time, which is 40 ms or more with respect to the actual voice starting time point. In order to prevent voice head part truncation, occurs when there is a missing head part of the voice signal caused by a failure of the transmission of the beginning part of the voice signal due to a timing lag in voice detection, a predetermined number of blocks prior to voice detection are regarded as the voice presence blocks. These blocks are attached to the voice block after voice detection and are then transmitted, as shown in FIG. 5, waveform (C). The waveform (C) of FIG. 5 has the time scale illustrated because the multiplex line 1 has a high bit rate. Since voice data converted into packets within the network is transmitted on a packet basis in an order such that the packets are converted from the voice data, buffer queue lengths different at various points in the network. Voice data changing momentarily will cause, on the signal reception side, the fluctuation of the transmission delay shown in FIG. 5, waveform (D). When the voice signal (D) is decoded without being subjected to any compensation for such fluctuation, underrun or overrun phenomenon occurs in the voice signal as shown by marks * in FIG. 5, waveform (E). This is undesirable from the viewpoint of a natural listening sense. For the purpose of absorbing such fluctuation, it is common practice to employ a method for storing N blocks of a voice signal in the memory 14 and then decoding them as shown by a decoded voice signal in FIG. 5, waveform (F). In this Figure in which reference symbol t1 denotes a delay time (N.times.a block time) for fluctuation absorption and t2 denotes a total delay time from a voice absence state to a voice presence state. The time t2 is expressed by the following equation. EQU t2=T.alpha.+T.beta.+t1
where, T.alpha. represents lag time in the voice detection, and T.beta. represents a signal transmission delay within the network.
As has been explained in the foregoing, in the prior art voice data transmission system, a voice signal is subjected, on the signal transmission side, to a detection of its voice-part and then to an attachment of a predetermined number of blocks to the voice detection block to be transmitted. The voice signal is further subjected, on the signal reception side, to an insertion of the fluctuation absorbing delay time t1 for compensation for fluctuation at signal reception time, which causes the total delay time t2 to become large. Further, since a header H, or the like, is attached to a packet, the length of one packet cannot be made too short from the viewpoint of transmission efficiency. Thus, the block time is greatly affected by the delay time t1 because fluctuation absorption cannot be made small, correspondingly. As a result, the prior art system has a problem in that conversation becomes unnatural and an echo controller must be provided for removing any echo.