1. Field of the Invention
The present invention relates to the field of content syndication and more particularly to processing multi-lingual syndication feeds.
2. Description of the Related Art
Content syndication refers to the selective broadcasting of content fragments over a data communications network to a multiplicity of subscribers. Arising in part due to the overwhelming volume of content available for access throughout the global Internet, content syndication provides the ability for subscribers to identify content which is desirable. Once a subscriber has identified desirable content, an aggregation mechanism can periodically retrieve content fragments that are consistent with the identified content from specified content sources and can combine the retrieved content into a cohesive, singular document for review by the subscriber. The Really Simple Syndication (RSS) format and the Atom format represent two exemplary content syndication implementations.
Notably, the RSS format is a content syndication format that has become increasingly popular. RSS is an XML-based format that allows the syndication of content ranging from lists of hyperlinks to blog postings. To enable the syndication of content, a Web site can publish an RSS feed, or channel. Once a feed becomes available, content browsers can regularly fetch the RSS feed to receive the most recently published content in the channel.
Presently, many commercial news services now provide news in the RSS format. Additionally most of the major Web sites for producing personal and professional web logs, known in the art as “blogs” also are making content available in the RSS format. The reason for the explosive growth of RSS is that having an application neutral format for content enables Web sites, Web services and other aggregator applications to be written to merge content from a diverse number of sources into a custom experience containing just what the user prefers. Since individuals can now select the source of their news updates at a low level of granularity, many now produce content both in a native format and in an RSS format.
The language of most RSS content is in the English language. Yet, the English language is not the native language of the majority of Internet users. It is to be noted, however, that a great deal of interesting content is published through RSS feeds, not in English, but in other lingual languages. For example, because of the simplicity of publishing an RSS new stream, some of the most interesting news related RSS feeds of the day, originate from political hot spots in the world and contain the personal views of events in lingual languages such as Arabic, Hebrew, French, Spanish, Italian, German, Chinese, Japanese, Korean and Russian, to name only a few.