Wednesday, 7 November 2007

Preventing Spam on Mediawiki at SourceForge

Mail spam is annoying. But spam on a wiki is even more annoying. If you've spent your free time writing documentation, there's no way you want to see junk text or link spam inserted into the middle of your flowing prose. In an earlier article, I described how to install a Mediawiki wiki on SourceForge. After a while you will start to get your first spam. And so the battle begins...

Wikis don't look very impressive as a website, but for an open source project they are a great way of maintaining documentation. Nobody likes writing documentation, which is why it needs to be made as easy as possible. For example, if a user mails the project mailing list with a question about installing that software, it usually means that the documentation needs to be updated and this can be done in about a minute. (On the other hand, if a project has a frequently-asked questions page, it means that they couldn't be bothered improving the documentation.)

Fortunately, spam can simply be controlled using permissions (thanks to David Wild for some of these suggestions). To begin with, make sure you keep an eye on the Recent Changes feed on your wiki (use an RSS reader). (As an aside, if your Recent Changes feed is broken, a possible cause is that you have installed an extension and included a rouge blank line after the final "?>" - this happened to me.)

Next you need to edit the permissions in LocalSettings.php to disable anonymous edits and disable account creation except by users who already have accounts. Of course, the admin is always allowed to do whatever. I'll explain the rationale for this below, but first here are the settings:
# No anonymous editing allowed
$wgGroupPermissions['*']['edit'] = false;
$wgGroupPermissions['user']['edit'] = true;
$wgGroupPermissions['sysop']['edit'] = true;
# Only users with accounts can create accounts
$wgGroupPermissions['*']['createaccount'] = false;
$wgGroupPermissions['user']['createaccount'] = true;
$wgGroupPermissions['sysop']['createaccount'] = true

On one of the wikis I am involved with on SourceForge, we had already disabled anonymous edits. However, anyone could create an account. At some point the spammers upgraded their spam software and were then able to create accounts on the wiki. Since the RSS feed for recent changes was not working (for the reason described above), it was three days and about 1000 spam accounts later before I realised the problem. At that stage it was too late to implement the solution I described above. Instead, I created a new group 'human', added the 10 or so real accounts to that group and gave them permissions while simultaneously removing all permissions from the regular 'user' account.
# Only 'humans' can edit
$wgGroupPermissions['*']['edit'] = false;
$wgGroupPermissions['user']['edit'] = false;
$wgGroupPermissions['human']['edit'] = true;
$wgGroupPermissions['sysop']['edit'] = true;
# Only 'humans' can create accounts
$wgGroupPermissions['*']['createaccount'] = false;
$wgGroupPermissions['user']['createaccount'] = false;
$wgGroupPermissions['human']['createaccount'] = true;
$wgGroupPermissions['sysop']['createaccount'] = true;

Bye-bye spam. Of course, I still had to revert about 30 edits...:-/

Image credit: Spam wall by freezelight


Geoff Hutchison said...

Is there any way to remove the spam from the wiki history? I've reverted edits, but the entries are still in the archive.

I'd really like it to be gone for good!

Noel O'Boyle said...

MediaWiki, possibly wisely, doesn't allow the version history to be edited. It's not even possible to delete a user account (at least with the version we have installed).

Of course, the data is stored in the underlying SQL tables, which can be altered directly, but only a mediawiki guru could say whether your installation will be self-consistent and functional afterwards...feeling lucky? :-)

Unknown said...

On Wikipedia Oversigters can remove page history, there is some extension some were, take a look on the mediawiki wiki

Unknown said...

Here is a link

Unknown said...

Also to anyone who might be reading this, if you want to remove the section editing tabs for anons add:
$wgDefaultUserOptions ['editsection'] = false;
to LocalSettings.php

... said...

Administrators can delete edits so they don't show up in the normal history page.

1. "Nobody likes writing documentation, which is why it needs to be made as easy as possible."
2. "On one of the wikis I am involved with on SourceForge, we had already disabled anonymous edits. However, anyone could create an account."

But #2 conflicts with #1. Restricting the number of people who can make edits makes it much less likely that they will actually make edits. The whole point of a wiki is that regular users can be passing by and make 2 or 3 small changes, which all add up over time. Disabling anonymous editing and making a whitelist of users is extremely counterproductive. You need to use things like CAPTCHAs to prevent spam, while keeping anonymous editing open.

Noel O'Boyle said...

Just to make it clear again, you can only install an ancient version of MediaWiki on SF, so many of the snazzy anti-spam enhancements (such as CAPTCHA) and other extensions don't work.

"The whole point of a wiki is that regular users can be passing by and make 2 or 3 small changes."

I would rather say that the whole point of a wiki is to enable collaborative editing of (HTML) documents. Although regular users might correct spellings, typographical errors, and so on, (and for sure, I've prevented this) it's up to the developers to actually write the documentation in the first place, and believe me, this is not something that comes naturally.

Since I run several MediaWiki installations, a blog, mailing lists and so on, I need to spend my time wisely and to take a zero-tolerance approach to spam. For example, you probably needed a Google a/c to comment on this blog. This reduces the numbers of comments, but cuts spam to close to zero.

Anonymous said...

You can also directly edit EditPage.php. It took me a while to figure out an easy way to do this. It will prevent any spam URLs or keywords that you want to prevent. Check out the link on my name to read the article on how to alter the file.

Rajat said...

Here is a full wiki permission guide. The worst thing that goes unnoticed by admins are the talk pages. Since they are not watched, many spammers use them for link spamming.