P2P Blacklist Sharing
Can there be any such thing without the fear of poison and/or interception from spammers? I fear not but would like to at least take a look at the possibility.
A simple method of doing this would be to provide a feed of spam words from every blog. Provide a page inside your blogging tool that allows a user to add “Spam Word Sharing” sites and then update manually when needed. The recipient blog will grab the feed, check for time updated, and if new words are found, add them to its own list of words. The inherent problem of this distributed method is that spammers will be able to look at the list and then modify the information they use in their spams. The upside of this method is that spammers cannot POSSIBLY look at the spam words of each and every blog unless they write some sort of intelligent spammer tool (which is NOT beyond them by any means)
Another means is to have a few centralized sources for the spam words. This would reduce the number of places that people have to go to get the information. This would, however, bring up the age old problem of announcing the presence of other such sources for synchronization. There are hundreds of different ways thats these neighbors can be programmatically announced etc, but they are all very cumbersome to code and easy to break into. This method also makes it easier for spammers to get a hold of the list and poison it or go around it.
I have also thought of the bayesian concept since I did develop a Perceptron based Bayesian Spam filter for real email which worked pretty well (it was an educational venture). Traditionally, in weblog comment spam, we tend to concentrate on a large number of words, phrases, IPs etc (at least I have) without trying to store any intelligence about them. A simple example are the words texas and holdem. Seperately they are innocent, but together they are a surefire spam combination unless your site is about poker and in which case, you have a difficult spam problem anyways. So, if spam systems were developed which stored word intelligence that got modified with each spam comment, this intelligence would be smaller in size, easier to transport and much easier to share. The drawback of this schema is poison from spammers and rapid changes in content.
So, to summarize, we need a “spam information sharing scheme” that is selectively public, is relatively small in size and can easily to integerated into present infrastructures.
What do you think?
