Skip to content

feat: improved categorization helper to include common bigrams#455

Closed
ErikBjare wants to merge 1 commit intomasterfrom
dev/better-category-suggestions
Closed

feat: improved categorization helper to include common bigrams#455
ErikBjare wants to merge 1 commit intomasterfrom
dev/better-category-suggestions

Conversation

@ErikBjare
Copy link
Copy Markdown
Member

@ErikBjare ErikBjare commented Jun 22, 2023

Came up with this while thinking about the bug in ActivityWatch/activitywatch#905, with the help of ChatGPT: https://chat.openai.com/share/0c7588ee-b13b-41e3-b169-c1e587fcbb72

Ran a quick test while developing, but never actually tested it in the UI.

  • Test it

@codecov
Copy link
Copy Markdown

codecov bot commented Jun 22, 2023

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 24.17%. Comparing base (1f194ff) to head (c93f546).
⚠️ Report is 206 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #455   +/-   ##
=======================================
  Coverage   24.17%   24.17%           
=======================================
  Files          27       27           
  Lines        1580     1580           
  Branches      263      263           
=======================================
  Hits          382      382           
  Misses       1173     1173           
  Partials       25       25           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ErikBjare ErikBjare changed the title feat: added improved categorization helper that includes bigrams if commonly present feat: improved categorization helper to include common bigrams Jun 22, 2023
@ErikBjare
Copy link
Copy Markdown
Member Author

@TimeToBuildBob Take over this in a new PR, add tests etc.

TimeToBuildBob added a commit to TimeToBuildBob/aw-webui that referenced this pull request Apr 14, 2026
…ort and tests

Moves the common-phrase detection logic from CategoryBuilder.vue into a
standalone src/util/categorization.ts module and adds 10 unit tests.

Changes vs original PR ActivityWatch#455:
- Extracts function to util module for testability
- Uses Map<string, WordEntry> (consistent with existing CategoryBuilder code)
- Filters bigram components against ignored_words and length <= 2
- Removes debug console.log statements
- Full TypeScript types

Closes ActivityWatch#455
@TimeToBuildBob
Copy link
Copy Markdown
Contributor

Done — took this over in #803.

Changes from the original:

  • Extracted to src/util/categorization.ts for testability
  • Added 10 unit tests (all passing)
  • Filters bigram components against ignored_words and length ≤ 2 (original only filtered single words)
  • Removed debug console.log statements
  • Full TypeScript types, consistent Map<string, WordEntry> usage

@ErikBjare ErikBjare closed this Apr 14, 2026
ErikBjare pushed a commit that referenced this pull request Apr 14, 2026
…ort and tests (#803)

* feat(categorization): extract findCommonPhrases util with bigram support and tests

Moves the common-phrase detection logic from CategoryBuilder.vue into a
standalone src/util/categorization.ts module and adds 10 unit tests.

Changes vs original PR #455:
- Extracts function to util module for testability
- Uses Map<string, WordEntry> (consistent with existing CategoryBuilder code)
- Filters bigram components against ignored_words and length <= 2
- Removes debug console.log statements
- Full TypeScript types

Closes #455

* fix(categorization): snapshot word durations before bigram promotion loop

Without this, promoting a bigram (e.g. 'Alpha Beta') reduces constituent
word durations in-place.  A later bigram that shares the middle word
(e.g. 'Beta Gamma') then sees Beta.duration=0, so the check becomes
10/0 = Infinity > 0.5 and the weak bigram is incorrectly promoted.

Fix: build an originalDurations snapshot before the Step 3 loop and use
it for all threshold comparisons; mutations to entry.duration still happen
(for accurate display) but no longer corrupt subsequent checks.

Also adds a regression test that fails on the unfixed code.

* fix(categorization): replace non-null assertions with explicit undefined guards

* style(categorization): fix prettier line-length warning in guard clause
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants