A video conference system includes an endpoint device that exchanges audio-visual information with participants and their personal/user devices, such as smartphones, laptops, and the like, in a room during a conference session and transmits/receives such audio-visual information over a network to/from remote endpoint devices. Identifying those participants and their user devices that are in physical proximity to the endpoint device helps setup the conference session. “Pairing” is a means by which the endpoint device and each user device can ensure that they are in physical proximity to each other. Once the endpoint device and a given user device are paired, they may share confidential information during the conference session over a primary, secure (e.g., encrypted) channel between the devices. In one conventional pairing technique, the endpoint device generates and then transmits an ultrasonic signal as a proximity probe to user devices over a secondary channel. A disadvantage of this technique is that many user devices are not ultrasound capable, i.e., not configured to receive and process the ultrasound signal.