The arrangement of a conventional push-to-talk (to be abbreviated to “PTT” hereinafter) communication system will be described with reference to FIG. 1. The conventional PTT communication system comprises a server apparatus (PTT communication infrastructure) 1 which controls communication between terminals, and speech PTT terminals 2a to 2c which transmit speech. The server apparatus 1 exists in the mobile communication network established by a mobile communication carrier. The mobile communication carrier provides an access network between the server apparatus 1 and the speech PTT terminals 2a to 2c held by users, and the server apparatus 1 can communicate general speech data with each speech PTT terminal.
The server apparatus 1 comprises two functional units, i.e., a call control server 1a and a real-time communication server 1b. 
The call control server 1a manages a group (to be referred to as a PTT group hereinafter) comprising the speech PTT terminals 2a to 2c which perform PTT communication, call operation for the PTT group, talk right arbitration, and the like.
The real-time communication server 1b delivers speech data transmitted from one speech PTT terminal (e.g., the terminal 2a) to other speech PTT terminals (e.g., the terminals 2b and 2c) belonging to the PTT group under the control of the call control server 1a. This allows the users to communicate with other users belonging to the PTT group by using the speech PTT terminals 2a to 2c. 
Reference 1 (Japanese Patent Laid-Open No. 2001-350473) has proposed a technique of converting digital image information comprising a plurality of pixel groups each having a predetermined size into speech information by sequentially applying a predetermined algorithm to each predetermined quantization information which each pixel of the image information has along a predetermined time axis direction and converting the quantization information of each pixel into specific speech information.
Reference 2 (Japanese Patent Laid-Open No. 2002-44249) has proposed a technique in which a communication data conversion server has a personal information database which stores information concerning the image, character, and speech formats of cellular phones and a format conversion system which converts the image, character, and speech formats, and cellular phones are allowed to transmit/receive information even if they have different image, character, and speech formats.
Reference 3 (Japanese Patent Laid-Open No. 2002-135854) has proposed a technique of shortening the effective access time of speech dispatch communication by removing speech from a selected data packet before a dispatch server delivers data to active mobile units in a dispatch group.