Data streamed over electronic networks have become an increasingly vital source of information. Users around the world routinely consume data streams to receive updates on subjects of import to the users, ranging from the personal lives of friends or celebrities to information regarding events of national or international import. Because data streams can draw upon input from individual users virtually anywhere, streaming services are often the very first to convey information concerning events as they occur, often scooping professional news services. However, the decentralized nature of data streams that gives them their power also makes them prone for abuse. Unscrupulous people can choke a data stream with useless information such as automatically generated spam produced for commercial or political purposes, or for the sake of sheer mischief. Traditional spam filters that search incoming messages for words associated with spam are too easily defeated by superficial variations in message content and by the combinatoric explosion inherent in maintaining vast databases of unwanted terms. Even worse, such methods can misidentify useful messages as spam based on content, creating a kind of censorship and undermining the very usefulness of the streaming services.
There is thus a need for an efficient, accurate, and relatively inevitable technology for removing spam from data streaming services.