Annotating social media with geographic information is essential for modern information retrieval. The ability to geospatially index a large volume of social media (e.g., Twitter™) data is valuable for several emerging research directions. Indeed, social media analytics have proven useful for understanding regional flu trends (see the List of Incorporated Cited Literature References, Literature Reference Nos. 2 and 3), linguistic patterns (as described in Literature Reference No. 4), elections (as described in Literature Reference No. 5), social unrest (as described in Literature Reference No. 6), and disaster response (as described in Literature Reference No. 7). These approaches, however, depend on the physical locations of Twitter™ users, which are only sparsely available in public data.
Complex natural language processing techniques may bring the fraction of geocodable users higher, but they are language dependent and require computationally expensive training steps which have not been tested at scale (see Literature Reference No. 9). Further, recent work in the social sciences has established that online social ties are often made over short geographic distances (as described in Literature Reference Nos. 10, 11, and 12). Because of this, it is possible to infer the location of a social media (e.g., Twitter™) user by examining the locations of their online contacts.
Previous work on geocoding online social network participants has focused on coverage and global error estimates. Thus, a continuing need exists for providing a per-user error estimate, making it possible to discard location estimates when the expected error is large.