Most electronic mail (e-mail) providers provide a filtering service to remove or flag junk e-mail, known as spam, from a user's mailbox. Some filtering processes rely on rules that, when applied to an e-mail message, identify one or more characteristics of spam. For example, rules may look for names of pharmaceutical products, sexual content, or gibberish in the body of an e-mail message, and may remove messages that contain such content. As many e-mail providers increasingly serve a multi-national set of customers, more languages may appear in the e-mail traffic managed by the providers. Spam-filtering rules are generally language-specific, and adding more rules for additional languages typically does not scale well. Further, some languages use different character sets, including non-Roman alphabets. Some conventional rules use the reputation of a message's originating internet protocol (IP) address or uniform resource locator (URL) to identify spam. However, such reputation information may be sparse, particularly with respect to foreign countries. It is with respect to these and other considerations that the present improvements have been needed.