test: add regression tests for lexical baseline models by federetyk · Pull Request #42 · techwolf-ai/workrb

federetyk · 2026-02-22T14:26:53Z

Addresses #41

Description

This PR adds regression tests for all configuration variants of the four lexical baseline models introduced in #36: BM25Model, TfIdfModel, EditDistanceModel, and RandomRankingModel. Each variant is evaluated on the English-only split of the JobTitleSimilarityRanking task, and the resulting metrics are compared against pre-recorded expected values with a small tolerance window. The full regression suite runs in ~12 seconds on a mid-range laptop CPU.

These tests complement the existing unit tests in test_lexical_baselines.py, which verify output shapes and types but do not exercise the evaluation pipeline or assert metric correctness. This PR was suggested by @Mattdl in #36.

Changes:

Add tests/test_lexical_baselines_regression.py

Checklist

Added new tests for new functionality
Tested locally with example tasks
Code follows project style guidelines
Documentation updated
No new warnings introduced

Evaluate all 9 lexical baseline model variants on the JobTitleSimilarityRanking task (English, test split, 105 queries x 2,619 targets) and assert that MAP, RP@5, RP@10, and MRR match pre-recorded expected values within abs=1e-6 tolerance. Covers BM25 (lower/cased), TfIdf (word-lower/word-cased/char-lower/ char-cased), EditDistance (lower/cased), and RandomRanking (seed=42). Addresses techwolf-ai#41

…stability BM25 and TfIdf scoring may produce slightly different floating-point results across Python versions and numpy builds. The previous abs=1e-6 tolerance was too tight for reproducibility.

Mattdl

Looks great! Closing PR.

federetyk added 2 commits February 22, 2026 14:59

test: widen regression test tolerance to abs=1e-3 for cross-platform …

9a32710

…stability BM25 and TfIdf scoring may produce slightly different floating-point results across Python versions and numpy builds. The previous abs=1e-6 tolerance was too tight for reproducibility.

Mattdl approved these changes Feb 23, 2026

View reviewed changes

Mattdl merged commit 97d799b into techwolf-ai:main Feb 23, 2026
2 checks passed

Mattdl mentioned this pull request Feb 23, 2026

[FEATURE] Add Regression Tests for Lexical Baseline Models #41

Closed

7 tasks

federetyk deleted the feat/regression-tests-lexical-baselines branch February 23, 2026 23:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add regression tests for lexical baseline models#42

test: add regression tests for lexical baseline models#42
Mattdl merged 2 commits intotechwolf-ai:mainfrom
federetyk:feat/regression-tests-lexical-baselines

federetyk commented Feb 22, 2026

Uh oh!

Mattdl left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

federetyk commented Feb 22, 2026

Description

Checklist

Uh oh!

Mattdl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants