A substantial amount of useful information can be derived from determining the direction a user is looking at different points in time, and this information can be used to enhance the user's interaction with a variety of computational systems. Therefore, it is not surprising that a vast amount of gaze tracking research using a vision based approach (i.e., tracking the eyes using any of several various means) has already been undertaken. However, understanding a user's gazing direction only gives semantic information on one dimension of the user's interest and does not take into account contextual information that is mostly given by speech. In other words, the combination of gaze tracking coupled with speech tracking would provide richer and more meaningful information in a variety of different user applications.