Add layered spam defense with Bayesian filter and first-post confirmation by sdeibel · Pull Request #963 · ASKBOT/askbot-devel

sdeibel · 2026-04-09T03:41:16Z

Adds a multi-layered spam defense system:

Dual Bayesian classifier (spam + ham models) with lazy loading, thread-safe operation, and fail-open design
First-post email confirmation for watched users: post is held until the user clicks a confirmation link
Optional moderator queue after email confirmation
Silent deletion mode for obvious spam (spam-only, no ham match)
Incremental learning: spam model updated when posts are marked as spam, ham model updated when posts are approved by moderators
Management commands for training and cleanup

All features are opt-in via livesettings with safe defaults (disabled). Replaces the existing spam checker calls in ask/answer/comment views with the dual Bayesian check while preserving the original spam checker as a fallback.

…tion Adds a multi-layered spam defense system: - Dual Bayesian classifier (spam + ham models) with lazy loading, thread-safe operation, and fail-open design - First-post email confirmation for watched users: post is held until the user clicks a confirmation link - Optional moderator queue after email confirmation - Silent deletion mode for obvious spam (spam-only, no ham match) - Incremental learning: spam model updated when posts are marked as spam, ham model updated when posts are approved by moderators - Management commands for training and cleanup All features are opt-in via livesettings with safe defaults (disabled). Replaces the existing spam checker calls in ask/answer/comment views with the dual Bayesian check while preserving the original spam checker as a fallback.

evgenyfadeev

Needs some discussion before acting on this PR.

evgenyfadeev · 2026-04-12T18:25:14Z

+settings.register(
+    livesettings.BooleanValue(
+        SPAM_DEFENSE,
+        'FIRST_POST_MODERATE_AFTER_CONFIRMATION',


There is a measure that all posts of "watched" users are pre-moderated, if the moderation mode is "premoderation"; it's not clear how this would be compatible with the proposed "FIRST_POST_MODERATE_AFTER_CONFIRMATION" - what if the moderation mode is "premoderation" and this setting is "False"?

evgenyfadeev · 2026-04-12T18:44:44Z

+    # Spam only, no ham match
+    if is_first_post and user.is_watched():
+        if askbot_settings.BAYESIAN_SPAM_SILENT_DELETE:
+            user.delete()


This might be too eager to allow the machine delete user accounts, unless the spam classification is ultra-reliable. Also - I've noted in the PR 964 - perhaps it would be better to delete accounts automatically after some time - not instantly?

evgenyfadeev · 2026-04-12T18:46:43Z

+    confirmation = PostConfirmation(post=post, user=user)
+    confirmation.save()
+
+    post.approved = False


Not a huge issue - a minor nitpick. I mostly commented this for myself - I can later resolve these.

I think this method should not modify the post - the function is called "_send_first_post_confirmation" and modification of post attributes would be an unexpected side-effect.

evgenyfadeev · 2026-04-12T18:46:57Z

+    # Also mark thread unapproved for questions
+    if post.post_type == 'question':
+        post.thread.approved = False
+        post.thread.save(update_fields=['approved'])


same comment here.

evgenyfadeev · 2026-04-12T18:47:05Z

+    revision = post.get_latest_revision()
+    if revision:
+        revision.approved = False
+        revision.save(update_fields=['approved'])


same comment here.

evgenyfadeev · 2026-04-12T18:48:00Z

+        recipient_list=[user.email],
+    )
+
+    request.user.message_set.create(


this is also a non-email related side effect (but the function could be renamed "notify ..." then it would be ok.

evgenyfadeev · 2026-04-12T18:55:19Z

Q: Is the idea to have users confirm their first post that adding friction would reduce the amount of spam? The email address confirmation is already implemented - so it seems that this measure is aiming to make it harder to make/automate the first post. This is easily bypassable if spammers can already automate email confirmations.

Q: How does this Bayesian filter compare to what there is on askbot hosting - you were using it for a while so I'd guess you'd know that? I'm curious how the efficacy of a simpler classifier compares with a transformer based model (I've used a small model (a pre-trained bert transformer with a grafted classifier head, which I later fine-tuned on several thousand samples of spam and ham; this fine-tuning took 30 minutes on CPU and the accuracy on my test set was around 99%). I don't have problems sharing it along with the weights.

Issue: This change AFAIK does not seem to allow using alternative spam (and ham - not existent in the master branch - it would be a new feature) checkers, unlike the pre-existing implementation. The spam checker must be replaceable by configuration.

Issue: Spam and Ham classifications serve distinct purposes, actons on spam and ham classifiers are different (spam - delete and save in spam samples, ham - accept or place on moderation queue) and I think it would be good to decouple these two concerns.

sdeibel · 2026-04-14T17:07:04Z

Q1: Yes, because bots were getting through signup and submitting spam which we then had to moderate. I was trying to reduce the moderator burden. Bots don't seem to be written to do the second confirmation. Yes, they probably could be later, particularly when powered by AI, but my focus was just to get our site working and manageable so I didn't go further than that.

Q2: Sorry, I don't feel like I have enough data to compare the two Bayesian filters. So far seems to be working, but our site is not that high traffic when it comes to real people posting real content, and the rest is so far going away on its own. I should probably add more monitoring, although again I was trying to make it low maintenance and figured people with problems would contact us by email.

Issue 1: Good point. I was focusing on getting our site working. If there's a good way to support other spam solutions, that would be fine. I thought that was already there, but maybe ended up implementing this too independently.

Issue 2: Hmm, I'm seeing these as one system. It's based on how I've been filtering email for 30 years and it has always worked feed both filters and have them work together to make the decision of what is spam or not. Without the ham filter, it really doesn't work well, as I suspect it also wouldn't without the spam filter, of course.

Sorry for my slow replies; I'm very busy at the moment.

evgenyfadeev reviewed Apr 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add layered spam defense with Bayesian filter and first-post confirmation#963

Add layered spam defense with Bayesian filter and first-post confirmation#963
sdeibel wants to merge 1 commit into
ASKBOT:masterfrom
sdeibel:pr/05-spam-defense

sdeibel commented Apr 9, 2026

Uh oh!

evgenyfadeev left a comment

Uh oh!

evgenyfadeev Apr 12, 2026

Uh oh!

evgenyfadeev Apr 12, 2026 •

edited

Loading

Uh oh!

evgenyfadeev Apr 12, 2026 •

edited

Loading

Uh oh!

evgenyfadeev Apr 12, 2026

Uh oh!

evgenyfadeev Apr 12, 2026

Uh oh!

evgenyfadeev Apr 12, 2026 •

edited

Loading

Uh oh!

evgenyfadeev commented Apr 12, 2026 •

edited

Loading

Uh oh!

sdeibel commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sdeibel commented Apr 9, 2026

Uh oh!

evgenyfadeev left a comment

Choose a reason for hiding this comment

Uh oh!

evgenyfadeev Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

evgenyfadeev Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

evgenyfadeev Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

evgenyfadeev Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

evgenyfadeev Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

evgenyfadeev Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

evgenyfadeev commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sdeibel commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

evgenyfadeev Apr 12, 2026 •

edited

Loading

evgenyfadeev Apr 12, 2026 •

edited

Loading

evgenyfadeev Apr 12, 2026 •

edited

Loading

evgenyfadeev commented Apr 12, 2026 •

edited

Loading