NoBully

NoBully is a browser-based cyberbullying and harmful-language detection project. It scans text from web pages, sends the page text to a local FastAPI backend, runs the text through trained machine learning models, and then either allows the page, blurs detected harmful words, or blocks the page when the configured safety thresholds are exceeded.

The project has three main parts:

backend/ contains the Python API, model inference code, training utilities, moderation dashboard, and saved model files.
extension/ contains the Chromium browser extension that scans pages and communicates with the backend.
backend/modpage/ contains a local dashboard for reviewing recent moderation events.

How it works

The browser extension runs on visited pages.
The content script collects visible text from the page.
The background service worker sends the text to the backend /analyze endpoint.
The backend loads the trained BERT, LSTM, and polish-layer models.
The backend returns a safety result with toxicity, severity, flagged words, and a block decision.
The extension blurs matched harmful words or replaces the page with a block screen.
The backend stores recent moderation events in memory for the dashboard.

Examples

Architecture

browser page
   |
   v
extension/content.js
   |
   v
extension/background.js
   |
   v
backend/brain/api_server.py
   |
   v
backend/brain/execute.py
   |
   v
backend/brain/models/

Main features

Real-time visible-page scanning through a Chromium extension.
Local FastAPI backend for text analysis.
Custom trained BERT model for toxicity classification.
LSTM classifier used together with the BERT model.
Polish layer that helps reduce false positives in harmless contexts.
Configurable severity and negative-word thresholds.
Word blurring for detected harmful terms.
Full-page blocking when content passes the configured danger threshold.
Popup UI for enabling/disabling the filter, changing the API URL, testing the API, and seeing the last result.
Local moderation dashboard for scan history and basic statistics.

Repository structure

NoBully/
├── README.md
├── backend/
│   ├── README.md
│   ├── requirements.txt
│   ├── brain/
│   │   ├── README.md
│   │   ├── api_server.py
│   │   ├── execute.py
│   │   ├── filterHTML.js
│   │   ├── getdataCPP.cpp
│   │   ├── polish.py
│   │   ├── train.py
│   │   └── models/
│   │       ├── README.md
│   │       ├── lstm_classifier.pt
│   │       ├── polish_layer.pt
│   │       └── bred_bert/
│   │           ├── README.md
│   │           ├── config.json
│   │           ├── model.safetensors
│   │           ├── tokenizer.json
│   │           └── tokenizer_config.json
│   └── modpage/
│       ├── README.md
│       ├── dashboard.html
│       ├── dashboard.js
│       └── icons/
│           ├── README.md
│           ├── icon16.png
│           ├── icon32.png
│           ├── icon48.png
│           └── icon128.png
└── extension/
    ├── README.md
    ├── background.js
    ├── content.js
    ├── manifest.json
    ├── popup.css
    ├── popup.html
    └── popup.js

Requirements

Python 3.11 or newer
pip
A Chromium-based browser such as Chrome, Edge, Brave, or Chromium
Enough disk space and memory to load the included PyTorch models

Backend setup

From the repository root:

cd backend
python -m venv .venv

On Windows PowerShell:

.venv\Scripts\Activate.ps1

On macOS or Linux:

source .venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Start the backend:

python -m uvicorn brain.api_server:app --reload --host 127.0.0.1 --port 8000

The backend should be available at:

http://127.0.0.1:8000

Check that it is running:

http://127.0.0.1:8000/health

Expected response:

{"status": "ok"}

Extension setup

Open your Chromium browser.
Go to chrome://extensions or the equivalent extensions page.
Enable developer mode.
Click Load unpacked.
Select the extension/ folder.
Click the NoBully extension icon.
Make sure the API URL is set to:

http://127.0.0.1:8000/analyze

Click the test button in the popup to confirm that the backend is online.

Using the project

Start the backend first, then load or enable the extension. After that, open any website. The extension will scan visible text, send it to the backend, and apply the result.

When harmful content is detected, NoBully can:

blur matched words on the page
store the result as the latest analysis
block the page if the result exceeds the threshold
add the scan result to the local moderation history

Backend API

GET `/health`

Checks whether the backend is online.

POST `/analyze`

Analyzes a page snapshot or plain text.

Example request:

{
  "text": "text to analyze",
  "url": "https://example.com",
  "title": "Example page",
  "fast_mode": true,
  "severity_threshold_percent": 65,
  "negative_word_threshold": 30
}

Example response fields:

{
  "blocked": false,
  "block_reasons": [],
  "severity_percent": 0,
  "toxicity_percent": 0,
  "flagged_words": [],
  "negative_word_count": 0,
  "negative_word_matches": [],
  "blur_words": [],
  "threshold_percent": 65,
  "negative_word_threshold": 30,
  "analyzed_character_count": 0,
  "analyzed_word_count": 0,
  "chunks_analyzed": 0,
  "message": "Page allowed",
  "moderator_warning_sent": false,
  "moderator_warning_error": null
}

GET `/dashboard`

Opens the local moderation dashboard.

http://127.0.0.1:8000/dashboard

GET `/moderator/api/history`

Returns recent moderation events stored in memory.

GET `/moderator/api/stats`

Returns basic dashboard statistics.

DELETE `/moderator/api/history`

Clears the in-memory moderation history.

Model files

The backend uses saved models from backend/brain/models/.

bred_bert/ stores the transformer model and tokenizer files.
lstm_classifier.pt stores the LSTM classifier checkpoint.
polish_layer.pt stores the additional correction layer used to reduce false positives.

Do not delete these files unless you plan to retrain or replace the models.

Training

The training code is in backend/brain/train.py. It trains the BERT model, trains the LSTM model, and then trains the polish layer.

The training script expects dataset folders such as:

backend/brain/data/
backend/brain/curated_data/

These folders may not be included in the repository. Add the required CSV files before running training.

Run training from the backend/brain folder:

cd backend/brain
python train.py

The polish layer can also be trained separately:

python polish.py

Moderator dashboard

The moderation dashboard is served by the backend at:

http://127.0.0.1:8000/dashboard

It shows recent page scans, block/safe/caution status, toxicity values, flagged word counts, and summary statistics. The history is stored in memory, so it resets when the backend process restarts.

Configuration

The extension popup lets the user configure:

whether the filter is enabled
the backend API URL
the severity threshold percentage
the negative-word threshold

Default API URL:

http://127.0.0.1:8000/analyze

Default severity threshold:

Default negative-word threshold:

Troubleshooting

The popup says the API is offline

Make sure the backend is running:

cd backend
python -m uvicorn brain.api_server:app --reload --host 127.0.0.1 --port 8000

Then check:

http://127.0.0.1:8000/health

The extension does not scan pages

Reload the extension from the browser extensions page, refresh the website, and make sure the filter is enabled in the popup.

The backend cannot find model files

Make sure these paths exist:

backend/brain/models/bred_bert/
backend/brain/models/lstm_classifier.pt
backend/brain/models/polish_layer.pt

The dashboard is empty

Open some pages while the extension and backend are running. The dashboard only shows events from the current backend session.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NoBully

How it works

Examples

Architecture

Main features

Repository structure

Requirements

Backend setup

Extension setup

Using the project

Backend API

GET `/health`

POST `/analyze`

GET `/dashboard`

GET `/moderator/api/history`

GET `/moderator/api/stats`

DELETE `/moderator/api/history`

Model files

Training

Moderator dashboard

Configuration

Troubleshooting

The popup says the API is offline

The extension does not scan pages

The backend cannot find model files

The dashboard is empty

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
backend		backend
extension		extension
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

NoBully

How it works

Examples

Architecture

Main features

Repository structure

Requirements

Backend setup

Extension setup

Using the project

Backend API

GET /health

POST /analyze

GET /dashboard

GET /moderator/api/history

GET /moderator/api/stats

DELETE /moderator/api/history

Model files

Training

Moderator dashboard

Configuration

Troubleshooting

The popup says the API is offline

The extension does not scan pages

The backend cannot find model files

The dashboard is empty

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

GET `/health`

POST `/analyze`

GET `/dashboard`

GET `/moderator/api/history`

GET `/moderator/api/stats`

DELETE `/moderator/api/history`