Bayesian poisoning: Difference between revisions
m fix |
No edit summary |
||
Line 1: | Line 1: | ||
'''Bayesian poisoning''' is a technique used by [[Spam (electronic)|spammers]] to attempt to degrade the effectiveness of [[spam filter]]s that rely on [[bayesian filtering]]. Bayesian filtering relies on [[Bayesian probability]] to determine whether an incoming mail is spam or is not spam ("ham"). The spammer hopes that by adding random, or even carefully selected words that are unlikely to appear in a normal spam message they will cause the spam filter to believe the message to be legitimate. Spammers also hope to in addition cause the spam filter to have a higher false positive rate because a user who trains their spam filter on a poisoned message will be indicating to the filter that the words added by the spammer are a good indication of spam. |
|||
{{disputed}}{{expertVerify|November 2006}} |
|||
'''Bayesian poisoning''' is a technique used by [[Spam (electronic)|spammers]] to degrade the effectiveness of [[spam filter]]s that rely on [[bayesian filtering]]. Bayesian filtering relies on [[Bayesian probability]] to determine whether an incoming mail is spam or is not spam ("ham"). |
|||
At the Spam Conference held at MIT in 2004 [[John Graham-Cumming]] presented two possible attacks on [[POPFile]]'s Bayesian engine [http://www.jgc.org/SpamConference011604.pps]. One was unsuccessful and the other worked, but was impractical. In doing this he identified two types of poisoning attack: passive (where words are added without any feedback to the spammer) and active (where the spammer gets feedback after the spam has been received). |
|||
A spammer practicing Bayesian poisoning will send out emails with large amounts of legitimate text (gathered from legitimate news or literary sources). If spam filters are trained using these emails, there is a much greater chance that they will also mark incoming non-spam emails as spam, due to similiarities between those and the literary sources used to generate the poisoned emails. |
|||
The passive method of adding random words to a small spam was ineffective as a method of attack: only 0.04% of the modified spam messages were delivered. The active attack involved adding random words to a small spam and using a web bug to determine whether the spam was received. If it was, another Bayesian system was trained using the same poison words. After sending 10,000 spams to a single user he determined a small set of words that could be used to get a spam through. |
|||
Of course, the simple countermeasure of disabling remote images (web bugs) in emails eliminates this problem. |
|||
At the [[CEAS]] conference in 2004, Wittel and Wu presented a paper [http://www.ceas.cc/papers-2004/slides/170.pdf] in which they showed that the passive addition of random words to spam was ineffective against [[CRM-114]], but effective against [[SpamBayes]] with 100 words added per spam. |
|||
They also showed that a smarter passive attack, adding common English words, was still ineffective against CRM-114, but was even more effective against SpamBayes. They needed to add only 50 words to a spam to get it past SpamBayes. |
|||
However, Wittel and Wu's testing has been criticized due to the minimal header information that was present in the emails they were using; most Bayesian spam filters make extensive use of header information and other message metadata in determining the likelihood that a message is spam. A discussion of the SpamBayes results and some counter evidence can be found in the SpamBayes mailing list archive [http://mail.python.org/pipermail/spambayes-dev/2004-September/thread.html#3065]. |
|||
All of these attacks are type I attacks: attacks that attempt to get spam delivered. A type II attack attempts to cause false positives by turning previously innocent words into spammy words in the Bayesian database. |
|||
Also in 2004 Stern, Mason and Shepherd wrote a technical report at [[Dalhousie University]] [http://www.cs.dal.ca/research/techreports/2004/CS-2004-06.shtml], in which they detailed a passive type II attack. They added common English words to spam messages used for training and testing a spam filter. |
|||
In two tests they showed that these common words decreased the spam filter's precision (the percentage of messages classified as spam that really are spam) from 84% to 67% and from 94% to 84%. Examining their data shows that the poisoned filter was biased towards believing messages were more likely to be spam than ham, thus increasing the false positive rate. |
|||
They proposed two countermeasures: ignoring common words when performing classification, and smoothing probabilities based on the trustworthiness of a word. A word has a trustworthy probability if an attacker is unlikely to be able to guess whether it is part of an individual's vocabulary. Thus common words are untrustworthy and their probability would be smoothed to 0.5 (making them neutral). |
|||
At the 2005 CEAS conference Lowd and Meek presented a paper [http://www.ceas.cc/papers-2005/125.pdf] in which they demonstrated that passive attacks adding random or common words to spam were ineffective against a naïve Bayesian filter. (In fact, they showed, as John Graham-Cumming demonstrated back in 2004, that adding random words improves the spam filtering accuracy.) |
|||
They demonstrated that adding hammy words - words that are more likely to appear in ham than spam - was effective against a naïve Bayesian filter, and enabled spam to slip through. They went on to detail two active attacks (attacks that require feedback to the spammer) that were very effective against the spam filters. Of course, preventing any feedback to spammers (such as non-delivery reports, SMTP level errors or web bugs) defeats an active attack trivially. |
|||
They also showed that retraining the filter was effective at preventing all the attack types, even when the retraining data had been poisoned. |
|||
The published research shows that adding random words to spam messages is ineffective as a form of attack, but that active attacks are very effective and that adding carefully chosen words can work in some cases. To defend against these attacks it is vital that no feedback is received by spammers and that statistical filters are retrained regularly. |
|||
The research also shows that continuing to investigate attacks on statistical filters is worthwhile. Working attacks have been demonstrated and countermeasures are required to ensure that statistical filters remain accurate. |
|||
==External links== |
==External links== |
||
* http://www.virusbtn.com/spambulletin/archive/2006/02/sb200602-poison |
|||
* [http://www.netscape.com/viewstory/2006/08/21/what-is-the-effect-of-bayesian-poisoning/ What is the effect of Bayesian poisoning] |
|||
* [http://www.zdziarski.com/papers/boudville.txt Dispelling more Bayesian filtering myths] |
|||
{{computer-stub}} |
|||
[[Category:Spam filtering]] |
[[Category:Spam filtering]] |
Revision as of 10:54, 24 November 2006
Bayesian poisoning is a technique used by spammers to attempt to degrade the effectiveness of spam filters that rely on bayesian filtering. Bayesian filtering relies on Bayesian probability to determine whether an incoming mail is spam or is not spam ("ham"). The spammer hopes that by adding random, or even carefully selected words that are unlikely to appear in a normal spam message they will cause the spam filter to believe the message to be legitimate. Spammers also hope to in addition cause the spam filter to have a higher false positive rate because a user who trains their spam filter on a poisoned message will be indicating to the filter that the words added by the spammer are a good indication of spam.
At the Spam Conference held at MIT in 2004 John Graham-Cumming presented two possible attacks on POPFile's Bayesian engine [1]. One was unsuccessful and the other worked, but was impractical. In doing this he identified two types of poisoning attack: passive (where words are added without any feedback to the spammer) and active (where the spammer gets feedback after the spam has been received).
The passive method of adding random words to a small spam was ineffective as a method of attack: only 0.04% of the modified spam messages were delivered. The active attack involved adding random words to a small spam and using a web bug to determine whether the spam was received. If it was, another Bayesian system was trained using the same poison words. After sending 10,000 spams to a single user he determined a small set of words that could be used to get a spam through.
Of course, the simple countermeasure of disabling remote images (web bugs) in emails eliminates this problem.
At the CEAS conference in 2004, Wittel and Wu presented a paper [2] in which they showed that the passive addition of random words to spam was ineffective against CRM-114, but effective against SpamBayes with 100 words added per spam.
They also showed that a smarter passive attack, adding common English words, was still ineffective against CRM-114, but was even more effective against SpamBayes. They needed to add only 50 words to a spam to get it past SpamBayes.
However, Wittel and Wu's testing has been criticized due to the minimal header information that was present in the emails they were using; most Bayesian spam filters make extensive use of header information and other message metadata in determining the likelihood that a message is spam. A discussion of the SpamBayes results and some counter evidence can be found in the SpamBayes mailing list archive [3].
All of these attacks are type I attacks: attacks that attempt to get spam delivered. A type II attack attempts to cause false positives by turning previously innocent words into spammy words in the Bayesian database.
Also in 2004 Stern, Mason and Shepherd wrote a technical report at Dalhousie University [4], in which they detailed a passive type II attack. They added common English words to spam messages used for training and testing a spam filter.
In two tests they showed that these common words decreased the spam filter's precision (the percentage of messages classified as spam that really are spam) from 84% to 67% and from 94% to 84%. Examining their data shows that the poisoned filter was biased towards believing messages were more likely to be spam than ham, thus increasing the false positive rate.
They proposed two countermeasures: ignoring common words when performing classification, and smoothing probabilities based on the trustworthiness of a word. A word has a trustworthy probability if an attacker is unlikely to be able to guess whether it is part of an individual's vocabulary. Thus common words are untrustworthy and their probability would be smoothed to 0.5 (making them neutral).
At the 2005 CEAS conference Lowd and Meek presented a paper [5] in which they demonstrated that passive attacks adding random or common words to spam were ineffective against a naïve Bayesian filter. (In fact, they showed, as John Graham-Cumming demonstrated back in 2004, that adding random words improves the spam filtering accuracy.)
They demonstrated that adding hammy words - words that are more likely to appear in ham than spam - was effective against a naïve Bayesian filter, and enabled spam to slip through. They went on to detail two active attacks (attacks that require feedback to the spammer) that were very effective against the spam filters. Of course, preventing any feedback to spammers (such as non-delivery reports, SMTP level errors or web bugs) defeats an active attack trivially.
They also showed that retraining the filter was effective at preventing all the attack types, even when the retraining data had been poisoned.
The published research shows that adding random words to spam messages is ineffective as a form of attack, but that active attacks are very effective and that adding carefully chosen words can work in some cases. To defend against these attacks it is vital that no feedback is received by spammers and that statistical filters are retrained regularly.
The research also shows that continuing to investigate attacks on statistical filters is worthwhile. Working attacks have been demonstrated and countermeasures are required to ensure that statistical filters remain accurate.