Drupal Spam add-on module

Drupal: 

The Spam add-on module for Drupal provides several ways to reduce unwanted commercial advertising thrown at your site through comments and other sources. There are several options for identifying unwanted content and several ways to handle the unwanted content. The module has good documentation compared to the average free open source add-on module.

Download

Download the module from drupal.org/project/spam. Version 6.x-1.2 from June 9, 2011, is 57 KiloBytes.

Documentation

There is an outline of the module at drupal.org/project/spam with links to the documentation and the issues queue. Jump direct to drupal.org/node/498092 for the documentation.

Modules

Version 6.x-1.2 of the Spam module contains the following modules. Go to Administer, Site building, Modules, to activate the modules. Go to Administer, Site configuration, Spam, to change the settings.

  1. Spam API is the first module you activate. It is the core of the Spam module and provides services to the other modules through an API. All the other Spam modules depend on the Spam API module. Activate Spam API.
  2. Switch on at least one other module to detect spam. You can switch on any mixture that works for your site.
    • Spam Bayesian filter provides a bayesian filter and is used by the Spam URL filter. Bayesian filters are commonly used to filter email. There is one in the Thunderbird email client.
    • Spam Custom filter lets you create custom spam filter rules.
    • Spam Duplicate filter detects duplicate spam.
    • Spam node age filter deletes comments when they are added to old nodes. The problem is a comment might be valid depending on the type of node. You might have pages that are valid for years and other pages that are current for only a few weeks. it might be easier to use a module that switched off new comments on nodes that are archived.
    • Spam Surbl filter uses the external service Surbl to filter URLs. Surbl provides a free service for low volume usage and a paid service for high volume usage. If you like the Surbl service, this is an easy connection to the service.
    • Spam URL filter checks URLs in the content and uses the Spam Bayesian filter to remember which URLs are bad. The Surbl service is an alternative if you do not want to spend the time to teach a bayesian filter.

 

Spam API

You can switch the Spam filter on and off for comments and for each content type. You can change the message presented to users when a post is blocked. You can change the overall threshold required to block a post.

The Spam API module creates eight database tables. Clearly there is a big database overhead required to use the Spam module. The overhead occurs only when filtering comments or content. If your content is created by trusted people then leave the Spam module switched off for content and use the Spam module only for comments. You will then minimise the overhead of the Spam module.

If some content types are created by the public or external organisations, switch the spam module on for those content types.

Spam Bayesian filter

The Spam Bayesian filter creates one database table and will add little overhead when first used. After a while the database table will grow with everything learned by the bayesian filter and the filtering will slow down. You might need to clean out the database table on a regular schedule.

I switched on the Bayesian filter and the URL filter then entered a comment with a bad URL

Spam Custom filter

The Spam Custom filter creates one database table with one entry for each custom filter.

Spam Duplicate filter

The Spam Duplicate filter creates one database table with one entry for each content item.

Spam node age filter

The Spam node age filter does not create database tables.

Spam Surbl filter

The Spam Surbl filter does not create database tables because everything is stored externally.

Spam URL filter

The Spam URL filter does not create database tables because it uses the Bayesian filter module to remember the bad URLs. The Bayesian filter table will grow with ever URL you reject.

Tests

Create an unpublished comment. Go into comments administration and mark the comment as spam, then mark the comment as not spam. The comment will be published. You then have to manually change the status back to unpublished.