Opened 6 years ago

Closed 6 years ago

#1588 closed new feature (fixed)

Introduce message sending limit

Reported by: planetcruiser Owned by: planetcruiser
Priority: critical Milestone: 0.5.9 - bugfixing
Component: BW Mail Keywords:
Cc:

Description (last modified by planetcruiser)

Issue:

  • We have abusers that sign up and send hundreds of scam/spam messages within hours

Solution:

  • Limit number of messages people can send per time frame (per minute, per 15 minutes, per hour, per day), possibly in combination with member status and number of friends
  • Make limits easily configurable (default.ini?), so we can fine-tune later

Clues:

  • We should identify a typical abuser profile and set limits accordingly. I.e. a typical abuser probably has 0 comments/friends, just signed up, has not received any messages,
  • Maybe we can have a score system to calculate the likelihood of an abuser (i.e. no comments: 10 points, no messages received: 10 points, newly sign-up 5 points)
  • If a flag mechanism for users exists, we should also consider this
  • Messages could be held for moderation instead of being rejected
  • Akismet-like text recognition could also be implemented - there is so much anti-spam software out there! :)
  • A captcha however is no solution IMHO, because it will only protect us from automated message sending, which we don't have. Solving our simple captcha takes 2 seconds, nothing that will stop a dedicated spammer

Related tickets:

Change History (19)

comment:1 Changed 6 years ago by planetcruiser

  • Owner set to coroa
  • Status changed from new to assigned

comment:2 Changed 6 years ago by planetcruiser

  • Description modified (diff)

comment:3 Changed 6 years ago by planetcruiser

2 days to go for this milestone release.. any luck?

comment:4 Changed 6 years ago by coroa

  • Milestone changed from 0.5.7 - bugfixing to 0.5.8 - bugfixing

not ready yet ... i'm moving it to the next milestone.

comment:5 follow-up: Changed 6 years ago by coroa

i'm playing with implementing a class along the lines of
https://code.google.com/p/phpspamdetection/source/browse/trunk/SpamChecker.class.php
which is based on the quite readible article http://www.paulgraham.com/spam.html. for training the class we could just use our already existing messages, and there is also another resource provided in the links of the class.

but i'd rather like to have some moderation working to be able to check the performance of the filter,
i know that there is some moderation stuff in the bw directory. does anybody know how it approx. works? or hints in general?

also i'm undecided if the spammer scoring mechanism (number of comments, number of sent messages marked as spam) would better fit into some member entity or an extra function in the messages model?

comment:6 Changed 6 years ago by coroa

ah and the other spam check mechanisms like akismet i found, work mostly off-site on their servers, meaning we would have to send them our complete messages. i don't think we should do this with the private messages. (for public forums it is alright, the messages being published anyway, but that's not the case here).

which other projects for spam detection are there, which work preferably offline?

comment:7 follow-up: Changed 6 years ago by crumbking

I just found this one:

http://wiki.apache.org/spamassassin/SpamAssassin

http://wiki.apache.org/spamassassin/[[BR]]

Maybe this is the way to go in the long run?

comment:8 in reply to: ↑ 7 ; follow-up: Changed 6 years ago by planetcruiser

Replying to crumbking:

I just found this one:

http://wiki.apache.org/spamassassin/SpamAssassin

we deploy spamassassin (SA) at ecobytes and it works great. but this is more for high-traffic scenarios and not trivial to set up and to maintain. it's a service plugged into our mail transport agent (postfix). i have no idea if you can wire this up to a php application. and if yes, i would assume it's going to quite a task. i think this is overkill for bw.

part of SA is bayesian filtering. this might be sufficient for bw. http://nasauber.de/opensource/b8/ offers an implementation of it. we could feed b8 with messages that have been marked as spam by other users.

comment:9 in reply to: ↑ 5 ; follow-up: Changed 6 years ago by planetcruiser

Replying to coroa:

i'm playing with implementing a class along the lines of
https://code.google.com/p/phpspamdetection/source/browse/trunk/SpamChecker.class.php
which is based on the quite readible article http://www.paulgraham.com/spam.html. for training the class we could just use our already existing messages, and there is also another resource provided in the links of the class.

ah, looks like another bayesian filter, although it's not called like that. ;) how does it perform?

but i'd rather like to have some moderation working to be able to check the performance of the filter,
i know that there is some moderation stuff in the bw directory. does anybody know how it approx. works? or hints in general?

i doubt anyone here knows. maybe ask the dev list. what do you mean by "moderation" anyway? volunteers deciding if something is spam or not?

also i'm undecided if the spammer scoring mechanism (number of comments, number of sent messages marked as spam) would better fit into some member entity or an extra function in the messages model?

i'd split it up nicely. for example:

  • $member->getNumberOfComments(), calls $commentsModel->getCommentsByUserID($member->id) - might already exists, i didn't check the code
  • $member->getSpamMessages(), calls $messages->getSpamMessagesByUserId($member->id)
  • $spamAnalyser->classify($textBlob): own module for text analyser so we can use it for references, blog comments and forum posts later

and then tie it all together in $message->isSpam() which calls $message->calculateSpamProbability()

i like many small methods which have a good chance of being reused elsewhere, because they do what their name suggests and because they are nicely documented. :)

not sure if that answers your question.. :)

comment:10 in reply to: ↑ 9 Changed 6 years ago by coroa

Replying to planetcruiser:

Replying to coroa:

i'm playing with implementing a class along the lines of
https://code.google.com/p/phpspamdetection/source/browse/trunk/SpamChecker.class.php

ah, looks like another bayesian filter, although it's not called like
that. ;) how does it perform?

right, a bayesian filter. performance: hmm .. hard to say, there are no
real sources, the paulgraham site speaks about using a training set of
approx. 4000 ham and 4000 spam giving him out of 0.5% of undetected spam
mails and 0 false positives. it's not said on how large a test-set and
this is probably highly specific to his mails.

i know that there is some moderation stuff in the bw directory. does
anybody know how it approx. works? or hints in general?

what do you mean by "moderation" anyway? volunteers deciding if
something is spam or not?

yes, but only in one direction: i'd like to be able to have a second
opinion on what the spam filter decides to be spam, so we don't have any
final false positives.

not sure if that answers your question.. :)

it did mostly.

comment:11 in reply to: ↑ 8 ; follow-up: Changed 6 years ago by coroa

Replying to planetcruiser:

part of SA is bayesian filtering. this might be sufficient for
bw. http://nasauber.de/opensource/b8/ offers an implementation of
it. we could feed b8 with messages that have been marked as spam by
other users.

just found that b8 is using the technique described by the same graham
article plus quite some improvements [1]. This suggests it being
supperior to the phpspamdetection class i proposed.

Fine i'll look into integrating b8 then.

[1] http://nasauber.de/opensource/b8/readme.php#how-does-it-work

comment:12 in reply to: ↑ 11 ; follow-up: Changed 6 years ago by planetcruiser

Replying to coroa:

Fine i'll look into integrating b8 then.

what's the status here? feedback needed or did you just not find time yet? release is planned on sunday, but maybe i should move this to monday or tuesday, because my weekend is a little busy

comment:13 in reply to: ↑ 12 Changed 6 years ago by coroa

Replying to planetcruiser:

Replying to coroa:

Fine i'll look into integrating b8 then.

what's the status here? feedback needed or did you just not find time yet? release is planned on sunday, but maybe i should move this to monday or tuesday, because my weekend is a little busy

sorry, i've been busy the last two weeks. i'm moving back to berlin on wednesday. the week after, i should have more time again.

comment:14 Changed 6 years ago by planetcruiser

  • Milestone changed from 0.5.8 - bugfixing to 0.5.9 - bugfixing

see last comment

comment:15 follow-up: Changed 6 years ago by planetcruiser

any luck here? otherwise i would just raise the sending limits a little and implement a check for "is reply to a message?"

comment:16 in reply to: ↑ 15 Changed 6 years ago by planetcruiser

Replying to planetcruiser:

any luck here? otherwise i would just raise the sending limits a little and implement a check for "is reply to a message?"

ok, i will do that now so we can release today

comment:17 Changed 6 years ago by planetcruiser

  • Owner changed from coroa to planetcruiser
  • Status changed from assigned to accepted

comment:18 Changed 6 years ago by planetcruiser

  • Description modified (diff)

added related ticket #1595

comment:19 Changed 6 years ago by planetcruiser

  • Resolution set to fixed
  • Status changed from accepted to closed

limits raised to 10/50 via:
https://gitorious.org/bewelcome/rox/commit/c8229ba589681c556de11e30465b331258939341

reply check via:
https://gitorious.org/bewelcome/rox/commit/1998de2c5bd5104b6ca9fcaa689024b189afa836

deployed and tested on alpha - finally closing this one. :]

if we are to implement a more sophisticated spam detection, this should go into a separate ticket.

Note: See TracTickets for help on using tickets.