Skip to content

Add multi-language support#413

Open
dimon222 wants to merge 23 commits intoWFCD:masterfrom
dimon222:feature/multilang_support
Open

Add multi-language support#413
dimon222 wants to merge 23 commits intoWFCD:masterfrom
dimon222:feature/multilang_support

Conversation

@dimon222
Copy link
Member

@dimon222 dimon222 commented Feb 26, 2026

Add support of all languages except of Thai, Japanese and Turkish.
Add boilerplate for future testing framework to monitor improvements/regressions of OCR

Summary by CodeRabbit

  • New Features

    • Major multi-language OCR support (many new locales), centralized language processor, numbers-only OCR mode, and a headless CLI test runner with bundled test maps/data and result reporting.
  • Bug Fixes

    • Safer, locale-aware name resolution and OCR filtering, improved error handling, resilience to missing data, and reduced false matches.
  • Documentation

    • Extensive test framework docs, coverage guides, sample maps/data, and runnable test scripts.
  • Chores

    • Version bumped to 9.8.0.0.
  • UI

    • Persistent SnapIt overlay, ordered error-report packaging, safer UI-thread handling, expanded locale selector.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 26, 2026

📝 Walkthrough

Walkthrough

Adds a language-processing framework with per-locale processors and a factory, integrates language-aware normalization into Data, OCR, and Tesseract services, introduces a headless OCR test harness with test models and runner, expands locale UI options, bumps assembly version to 9.8.0.0, and adds test docs and scripts.

Changes

Cohort / File(s) Summary
Entrypoint & Test Harness
WFInfo/CustomEntrypoint.cs, WFInfo/Tests/TestProgram.cs, WFInfo/Tests/OCRTestRunner.cs, WFInfo/Tests/TestModels.cs, WFInfo/Tests/OCRTestRunner.cs
Adds test execution detection in the entrypoint and a headless test harness (TestProgram + OCRTestRunner) with test models, suite execution, result persistence, and headless/mock service implementations.
Language Processing Core
WFInfo/LanguageProcessing/LanguageProcessor.cs, WFInfo/LanguageProcessing/LanguageProcessorFactory.cs
Introduces abstract LanguageProcessor base with normalization, whitelist, and Levenshtein utilities; adds LanguageProcessorFactory for thread-safe, cached per-locale processor creation and lookup.
Language Processors (many locales)
WFInfo/LanguageProcessing/... (English, Korean, Japanese, Chinese, Cyrillic, European, Thai, Turkish, Polish, etc.)
Adds concrete processors for multiple locales (en, ko, ja, zh-hans, zh-hant, ru, uk, de, es, pt, fr, it, th, tr, pl) implementing locale, blueprint removals, whitelists, normalization, and distance logic (including Korean/Thai-aware algorithms).
Data & OCR Integration
WFInfo/Data.cs, WFInfo/Ocr.cs, WFInfo/Services/TesseractService.cs
Integrates LanguageProcessorFactory into Data and OCR paths, adds per-locale GetLocaleNameData overload, locale-specific fallback files, stronger i18n resilience, language-driven part validation and filtering, Tesseract whitelist integration, and new Tesseract API methods (SetNumbersOnlyMode/ResetToDefaultMode).
Tests, Docs & Test Data
tests/*, tests/data/*, tests/map.json, tests/run_tests.bat
Adds test map and three scenario JSONs, run_tests.bat script, comprehensive test documentation (README, BUILD_INSTRUCTIONS, COVERAGE_FEATURES, EXTERNAL_DATA_STRUCTURE, TEST_EXECUTION_FLOW, usage example), and coverage/feature docs.
Settings & Assembly
WFInfo/Settings/SettingsWindow.xaml, WFInfo/Properties/AssemblyInfo.cs
Expands locale ComboBox with many new languages and bumps AssemblyVersion/AssemblyFileVersion to 9.8.0.0.
UI & Misc
WFInfo/SnapItOverlay.xaml.cs, WFInfo/errorDialogue.xaml.cs, WFInfo/Main.cs
Minor UI behavior and dispatcher-safety refactors: persistent rectangle handling, prioritized ZIP file ordering, and safer RunOnUIThread usage.
Build / Packaging
WFInfo/WFInfo.csproj, .gitattributes
Adds MSBuild targets for Update XML and publish ZIP; simplifies .gitattributes to a single rule.
Tests / Utilities
TestCharacterRanges.cs, tests/* docs
Adds test helper for character range inspection and multiple test/supporting docs and scripts.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as WFInfo.exe<br/>(CustomEntrypoint)
    participant TestProg as TestProgram.RunTests
    participant Suite as OCRTestRunner.RunTestSuite
    participant LPFactory as LanguageProcessorFactory
    participant LP as LanguageProcessor<br/>(locale)
    participant Tesseract as TesseractService
    participant Data as Data.cs
    participant Results as TestSuiteResult

    CLI->>TestProg: args (map.json)
    TestProg->>LPFactory: Initialize(settings)
    TestProg->>Suite: RunTestSuite(map)
    Suite->>Suite: Load scenarios from map.json
    loop per scenario
        Suite->>LPFactory: GetProcessor(language)
        LPFactory-->>LP: locale-specific processor
        Suite->>Tesseract: Init with LP.CharacterWhitelist
        Suite->>Tesseract: OCR image -> raw text
        Tesseract->>LP: NormalizeForPatternMatching(raw text)
        LP-->>Data: normalized text
        Data->>LP: CalculateLevenshteinDistance(candidates)
        LP-->>Data: best match
        Suite->>Results: record TestResult
    end
    Suite-->>TestProg: TestSuiteResult
    TestProg->>Results: Save JSON + print summary
Loading
sequenceDiagram
    participant App as WFInfo Main
    participant LPFactory as LanguageProcessorFactory
    participant LP as LanguageProcessor
    participant OCR as OCR module
    participant Tesseract as TesseractService
    participant Data as Data.cs

    App->>LPFactory: Initialize(settings)
    App->>Tesseract: Init()
    Tesseract->>LPFactory: GetCurrentProcessor()
    LPFactory-->>LP: processor instance
    Tesseract->>LP: request CharacterWhitelist
    LP-->>Tesseract: whitelist
    App->>OCR: Run OCR flow
    OCR->>Tesseract: Extract text
    Tesseract-->>OCR: raw OCR output
    OCR->>LP: IsPartNameValid(text)
    alt valid
        OCR->>Data: GetLocaleNameData(text, useLevenshtein)
        Data->>LP: CalculateLevenshteinDistance
        LP-->>Data: distances -> matched name
        Data-->>OCR: localized name
    else invalid
        OCR->>OCR: skip item
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐇 I hopped through code in search of strings and signs,

Tamed accents, kanji, hangul—aligned the lines.
Tests now march in map‑made rows, heads held high,
Whitelists hum, processors whisper "locale: hi!"
A carrot for CI, and tiny victory signs.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.62% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add multi-language support' directly and clearly summarizes the primary change throughout the changeset, which adds comprehensive language processor implementations and multi-language OCR support.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (11)
tests/README.md-8-8 (1)

8-8: ⚠️ Potential issue | 🟡 Minor

Category vocabulary is inconsistent (fissure vs reward).

Line 8 and Lines 203-205 present fissure as a distinct category, while Lines 69-72 only define reward, inventory, and snapit (and says fissure is included in reward). Please normalize this so map.json authors know exactly which category values are valid.

Also applies to: 69-72, 203-205

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/README.md` at line 8, The README has inconsistent category vocabulary:
sometimes using "fissure" as its own category and other times saying fissures
are included under "reward"; normalize and document the single canonical
category set for map.json authors. Pick one canonical value (e.g., use "reward"
and treat fissures as a subtype, or use "fissure" as a distinct category), then
update the "Category Testing" section and the other occurrences that mention
reward/inventory/snapit/fissure so they all list the same canonical category
names and explain whether "fissure" is a subtype of "reward" or a standalone
category; ensure the README's category examples and any mention of map.json
"category" values (reward, inventory, snapit, fissure) are consistent and
unambiguous.
tests/usage_example.md-17-20 (1)

17-20: ⚠️ Potential issue | 🟡 Minor

Align the docs with the currently shipped test entrypoint and language scope.

Line [19] uses WFInfo.Tests.exe, while the provided runner script invokes WFInfo.exe.
Line [50] says “all 15 supported languages,” which conflicts with the stated exclusion of Thai/Japanese/Turkish in this PR.

Also applies to: 50-50

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/usage_example.md` around lines 17 - 20, Update the documentation to
reference the actual shipped test runner name and correct language support:
replace any occurrence of "WFInfo.Tests.exe" with "WFInfo.exe" (the provided
runner script), and change the phrase "all 15 supported languages" to the
accurate supported-language statement reflecting the PR's exclusions (explicitly
note that Thai, Japanese, and Turkish are excluded) so the examples and
language-count text match the current release behavior.
tests/TEST_EXECUTION_FLOW.md-79-88 (1)

79-88: ⚠️ Potential issue | 🟡 Minor

Add a language identifier to the fenced code block.

The block starting on Line 79 is missing a language tag, which triggers markdownlint MD040.

Suggested fix
-```
+```text
 1. User runs: WFInfo.exe map.json data/ results.json
 2. CustomEntrypoint.Main() detects test arguments
 3. Redirects to TestProgram.RunTests(args)
 ...
 8. Exit with appropriate code (0=success, 1=partial failure, etc.)
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @tests/TEST_EXECUTION_FLOW.md around lines 79 - 88, The fenced code block
containing the numbered steps (the test execution flow) is missing a language
identifier which triggers markdownlint MD040; update the opening to include a language tag (e.g., "text") so it becomestext, leaving the content
unchanged; target the fenced block that lists steps 1–8 (the example starting
with "User runs: WFInfo.exe map.json data/ results.json") to resolve the lint
warning.


</details>

</blockquote></details>
<details>
<summary>WFInfo/Tests/OCRTestRunner.cs-171-283 (1)</summary><blockquote>

`171-283`: _⚠️ Potential issue_ | _🟡 Minor_

**Test environment modifies global state without cleanup.**

`SetupTestEnvironment` modifies `ApplicationSettings.GlobalSettings` and `OCR.uiScaling` without saving or restoring the original values. This can cause test interference when running multiple tests sequentially.

Consider saving and restoring original settings, or implementing a test-scoped settings mechanism.


<details>
<summary>💡 Suggested approach</summary>

```csharp
private (string locale, WFtheme theme, double scaling) SaveCurrentEnvironment()
{
    var settings = ApplicationSettings.GlobalSettings;
    return (settings.Locale, settings.ThemeSelection, OCR.uiScaling);
}

private void RestoreEnvironment((string locale, WFtheme theme, double scaling) saved)
{
    var settings = ApplicationSettings.GlobalSettings;
    settings.Locale = saved.locale;
    settings.ThemeSelection = saved.theme;
    OCR.uiScaling = saved.scaling;
    _tesseractService.ReloadEngines();
}
```

Then wrap test execution with save/restore calls.
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Tests/OCRTestRunner.cs` around lines 171 - 283, SetupTestEnvironment
mutates global state (ApplicationSettings.GlobalSettings and OCR.uiScaling)
without restoring it; save the current values before changing them and restore
after the test finishes to avoid test interference. Implement helper methods
(e.g., SaveCurrentEnvironment and RestoreEnvironment) that capture
settings.Locale, settings.ThemeSelection and OCR.uiScaling, call
SaveCurrentEnvironment at the start of the test/setup, run the changes in
SetupTestEnvironment, and ensure RestoreEnvironment is called in a
finally/teardown path (including calling _tesseractService.ReloadEngines after
restore) so the original global state is always reinstated.
```

</details>

</blockquote></details>
<details>
<summary>WFInfo/Ocr.cs-1270-1270 (1)</summary><blockquote>

`1270-1270`: _⚠️ Potential issue_ | _🟡 Minor_

**Character whitelist for number detection includes non-numeric characters.**

This whitelist is used for OCR number detection (`tessedit_char_whitelist`) but includes many alphabetic characters like `âàéèêëïîôùûçÀÉÈÊËÏÎÔÙÛÇäöüßÄÖÜßñáéíóúüÁÉÍÓÚÜçãõẽẽÇÃÕĘĘ`. This appears to be incorrectly copied from a language character set.

For counting item quantities, only digits should be needed.


<details>
<summary>🔧 Suggested fix</summary>

```diff
                 //set OCR to numbers only
-                _tesseractService.FirstEngine.SetVariable("tessedit_char_whitelist", "0123456789âàéèêëïîôùûçÀÉÈÊËÏÎÔÙÛÇäöüßÄÖÜßñáéíóúüÁÉÍÓÚÜçãõẽẽÇÃÕĘĘ");
+                _tesseractService.FirstEngine.SetVariable("tessedit_char_whitelist", "0123456789");
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Ocr.cs` at line 1270, The tessedit_char_whitelist used for numeric OCR
currently contains many non-digit characters; update the call to
_tesseractService.FirstEngine.SetVariable("tessedit_char_whitelist", ...) so the
whitelist contains only the numeric digits (e.g. "0123456789") required for
quantity detection; locate the existing call to SetVariable in WFInfo/Ocr.cs and
replace the long language charset string with the digits-only whitelist
(optionally include decimal/comma if your counts may contain them).
```

</details>

</blockquote></details>
<details>
<summary>WFInfo/Tests/OCRTestRunner.cs-337-369 (1)</summary><blockquote>

`337-369`: _⚠️ Potential issue_ | _🟡 Minor_

**Unused filtered image variable.**

`filteredImage` is created on line 344 but never used. The method then performs OCR on the original `image` instead.


<details>
<summary>🔧 Either use the filtered image or remove unused variable</summary>

```diff
         private List<PartMatchResult> ProcessSnapIt(Bitmap image)
         {
             var results = new List<PartMatchResult>();
             
             try
             {
-                // Use existing SnapIt logic - simulate process
-                var filteredImage = OCR.ScaleUpAndFilter(image, WFtheme.AUTO, out _, out _);
-                
                 // Since FindAllParts is private, we'll simulate basic OCR on whole image
                 var ocrText = OCR.GetTextFromImage(image, _tesseractService.FirstEngine);
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Tests/OCRTestRunner.cs` around lines 337 - 369, In ProcessSnapIt,
filteredImage is created but never used; update the OCR call to use the
processed bitmap instead of the original image (replace
OCR.GetTextFromImage(image, _tesseractService.FirstEngine) with
OCR.GetTextFromImage(filteredImage, _tesseractService.FirstEngine)), ensuring
any disposal or ownership semantics for filteredImage are respected, or if you
intend not to use the filtered result, remove the filteredImage assignment
entirely and its out parameters to eliminate the unused variable.
```

</details>

</blockquote></details>
<details>
<summary>WFInfo/LanguageProcessing/PolishLanguageProcessor.cs-21-21 (1)</summary><blockquote>

`21-21`: _⚠️ Potential issue_ | _🟡 Minor_

**Incomplete Polish character whitelist.**

The character ranges are missing several Polish characters:
- `ś` (U+015B) - only `Ś` (U+015A) is included
- `ź` (U+0179) and `Ź` (U+017A) 
- `ż` (U+017B) and `Ż` (U+017C)
- `ó` (U+00F3) and `Ó` (U+00D3)


<details>
<summary>🔧 Suggested fix</summary>

```diff
-        public override string CharacterWhitelist => "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" + GenerateCharacterRange(0x0104, 0x0107) + GenerateCharacterRange(0x0118, 0x0119) + GenerateCharacterRange(0x0141, 0x0144) + GenerateCharacterRange(0x015A, 0x015A); // Polish with ranges
+        public override string CharacterWhitelist => "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" + 
+            GenerateCharacterRange(0x0104, 0x0107) +  // Ą, ą, Ć, ć
+            GenerateCharacterRange(0x0118, 0x0119) +  // Ę, ę
+            GenerateCharacterRange(0x0141, 0x0144) +  // Ł, ł, Ń, ń
+            GenerateCharacterRange(0x015A, 0x015B) +  // Ś, ś
+            GenerateCharacterRange(0x0179, 0x017C) +  // Ź, ź, Ż, ż
+            GenerateCharacterRange(0x00D3, 0x00D3) +  // Ó
+            GenerateCharacterRange(0x00F3, 0x00F3);   // ó
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/PolishLanguageProcessor.cs` at line 21, The
CharacterWhitelist in PolishLanguageProcessor currently omits several Polish
characters; update the override of CharacterWhitelist (and the use of
GenerateCharacterRange) to include the missing code points for ś (U+015B), ź/Ź
(U+0179/U+017A), ż/Ż (U+017B/U+017C) and ó/Ó (U+00F3/U+00D3) — either by
expanding the existing GenerateCharacterRange calls to cover those code points
or by appending those specific characters explicitly to the string so the
whitelist contains both upper- and lower-case forms; modify the
CharacterWhitelist property accordingly.
```

</details>

</blockquote></details>
<details>
<summary>WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs-72-88 (1)</summary><blockquote>

`72-88`: _⚠️ Potential issue_ | _🟡 Minor_

**Redundant uppercase replacements after `ToLowerInvariant()`.**

Same issue as Polish processor - uppercase replacements will never match after `ToLowerInvariant()`.


<details>
<summary>🔧 Suggested fix</summary>

```diff
         private static string NormalizeTurkishCharacters(string input)
         {
-            // Convert Turkish characters to standard equivalents for comparison
             return input.ToLowerInvariant()
-                .Replace('ğ', 'g')
-                .Replace('Ğ', 'G')
-                .Replace('ş', 's')
-                .Replace('Ş', 'S')
-                .Replace('ç', 'c')
-                .Replace('Ç', 'C')
-                .Replace('ö', 'o')
-                .Replace('Ö', 'O')
-                .Replace('ü', 'u')
-                .Replace('Ü', 'U')
-                .Replace('ı', 'i')
-                .Replace('İ', 'I');
+                .Replace('ğ', 'g')
+                .Replace('ş', 's')
+                .Replace('ç', 'c')
+                .Replace('ö', 'o')
+                .Replace('ü', 'u')
+                .Replace('ı', 'i');
         }
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs` around lines 72 - 88,
The NormalizeTurkishCharacters method performs .ToLowerInvariant() before doing
uppercase Replace calls (e.g., 'Ğ'->'G', 'Ş'->'S'), which makes those
replacements dead; update NormalizeTurkishCharacters to either remove all
uppercase Replace calls and perform only lowercase character replacements, or
move the .ToLowerInvariant() call to the end so replacements can handle both
cases—apply the change in the NormalizeTurkishCharacters method to ensure
correct normalization of Turkish characters.
```

</details>

</blockquote></details>
<details>
<summary>WFInfo/LanguageProcessing/PolishLanguageProcessor.cs-72-92 (1)</summary><blockquote>

`72-92`: _⚠️ Potential issue_ | _🟡 Minor_

**Redundant uppercase replacements after `ToLowerInvariant()`.**

The method calls `ToLowerInvariant()` first, which converts all characters to lowercase. The subsequent replacements for uppercase Polish characters (e.g., `'Ą'`, `'Ę'`, `'Ć'`) will never match.


<details>
<summary>🔧 Suggested fix - remove redundant replacements</summary>

```diff
         private static string NormalizePolishCharacters(string input)
         {
-            // Convert Polish characters to standard equivalents for comparison
             return input.ToLowerInvariant()
-                .Replace('ą', 'a')
-                .Replace('Ą', 'A')
-                .Replace('ę', 'e')
-                .Replace('Ę', 'E')
-                .Replace('ć', 'c')
-                .Replace('Ć', 'C')
-                .Replace('ł', 'l')
-                .Replace('Ł', 'L')
-                .Replace('ś', 's')
-                .Replace('Ś', 'S')
-                .Replace('ź', 'z')
-                .Replace('Ź', 'Z')
-                .Replace('ż', 'z')
-                .Replace('Ż', 'Z')
-                .Replace('ó', 'o')
-                .Replace('Ó', 'O');
+                .Replace('ą', 'a')
+                .Replace('ę', 'e')
+                .Replace('ć', 'c')
+                .Replace('ł', 'l')
+                .Replace('ś', 's')
+                .Replace('ź', 'z')
+                .Replace('ż', 'z')
+                .Replace('ó', 'o');
         }
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/PolishLanguageProcessor.cs` around lines 72 - 92,
The NormalizePolishCharacters method calls ToLowerInvariant() so the subsequent
.Replace calls for uppercase Polish letters (e.g.,
'Ą','Ę','Ć','Ł','Ś','Ź','Ż','Ó') are redundant; remove all uppercase
.Replace(...) invocations and keep only the lowercase replacements (or
alternatively remove ToLowerInvariant() and keep both cases if case preservation
is required) — update the NormalizePolishCharacters method to use only the
lowercase replacements that match after ToLowerInvariant() so the normalization
logic is correct and non-redundant.
```

</details>

</blockquote></details>
<details>
<summary>WFInfo/Tests/TestProgram.cs-145-152 (1)</summary><blockquote>

`145-152`: _⚠️ Potential issue_ | _🟡 Minor_

**Potential `NullReferenceException` when accessing `MissingParts` and `ExtraParts`.**

The code accesses `.Count` on `MissingParts` and `ExtraParts` without null checks. If `TestResult` doesn't initialize these lists, this will throw.


<details>
<summary>🛡️ Proposed fix with null checks</summary>

```diff
 foreach (var failed in failedTests)
 {
     Console.WriteLine($"  {failed.TestCaseName}: {failed.ErrorMessage}");
-    if (failed.MissingParts.Count > 0)
+    if (failed.MissingParts?.Count > 0)
         Console.WriteLine($"    Missing: {string.Join(", ", failed.MissingParts)}");
-    if (failed.ExtraParts.Count > 0)
+    if (failed.ExtraParts?.Count > 0)
         Console.WriteLine($"    Extra: {string.Join(", ", failed.ExtraParts)}");
 }
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Tests/TestProgram.cs` around lines 145 - 152, The loop printing failed
tests can throw NullReferenceException because it uses failed.MissingParts.Count
and failed.ExtraParts.Count without null checks; update the loop in
TestProgram.cs to guard those accesses (e.g., replace checks with
(failed.MissingParts?.Count ?? 0) > 0 and (failed.ExtraParts?.Count ?? 0) > 0)
and pass a non-null sequence to string.Join (e.g., string.Join(", ",
failed.MissingParts ?? Enumerable.Empty<string>()) and similarly for
ExtraParts), or alternatively ensure TestResult initializes MissingParts and
ExtraParts in its constructor; modify the code around the foreach (var failed in
failedTests) block accordingly.
```

</details>

</blockquote></details>
<details>
<summary>WFInfo/Tests/TestProgram.cs-168-176 (1)</summary><blockquote>

`168-176`: _⚠️ Potential issue_ | _🟡 Minor_

**Provide a working mock implementation for `FirstEngine` in `TestTesseractService`.**

The `OCRTestRunner` calls `FirstEngine` three times (lines 313, 347, 378), so returning `NotImplementedException` will crash the tests. `SecondEngine` and `Engines` remain unused and don't need implementation.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Tests/TestProgram.cs` around lines 168 - 176, Replace the throwing
property with a cached, usable Tesseract engine: add a private field (e.g.
_firstEngine) and implement FirstEngine to lazily instantiate and return a
Tesseract.TesseractEngine (for example new
Tesseract.TesseractEngine("./tessdata", "eng", Tesseract.EngineMode.Default)) so
calls from OCRTestRunner succeed; keep the instance cached and return the same
object on subsequent accesses to avoid repeated construction.
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🧹 Nitpick comments (19)</summary><blockquote>

<details>
<summary>WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs (1)</summary><blockquote>

`77-77`: **Remove duplicated replacement in kana normalization.**

Line [77] applies `.Replace('ヶ', 'ケ')` twice.


<details>
<summary>🧹 Small cleanup</summary>

```diff
-            result = result.Replace('ヶ', 'ケ').Replace('ヵ', 'カ').Replace('ヶ', 'ケ');
+            result = result.Replace('ヶ', 'ケ').Replace('ヵ', 'カ');
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs` at line 77, In
JapaneseLanguageProcessor.cs remove the duplicated Replace call so the kana
normalization only replaces 'ヶ' with 'ケ' once; locate the chain that currently
calls .Replace('ヶ', 'ケ').Replace('ヵ', 'カ').Replace('ヶ', 'ケ') and change it to a
single .Replace('ヶ', 'ケ').Replace('ヵ', 'カ') (within the method in class
JapaneseLanguageProcessor where the Replace chain is used).
```

</details>

</blockquote></details>
<details>
<summary>WFInfo/LanguageProcessing/CyrillicLanguageProcessor.cs (1)</summary><blockquote>

`112-113`: **Ukrainian blueprint normalization line is currently dead code.**

After `ToLower(_culture)` on Line 106, replacing `"(Кресленник)"` never matches. Consider removing or changing to lowercase literal for clarity.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/CyrillicLanguageProcessor.cs` around lines 112 -
113, The Replace call using the mixed-case literal
normalized.Replace("(Кресленник)", "(кресленник)") is dead because normalized is
already lowercased via ToLower(_culture); update the CyrillicLanguageProcessor
normalization by either removing that Replace line entirely or changing the
search literal to the lowercase form "(кресленник)" so it actually matches;
locate the usage of normalized and the Replace call (the normalized.Replace(...)
statement after ToLower(_culture)) and apply the change in that method.
```

</details>

</blockquote></details>
<details>
<summary>WFInfo/LanguageProcessing/EnglishLanguageProcessor.cs (1)</summary><blockquote>

`2-2`: **Unused import.**

`System.Text.RegularExpressions` is imported but not used in this file.


<details>
<summary>🧹 Remove unused import</summary>

```diff
 using System;
-using System.Text.RegularExpressions;
 using WFInfo.Settings;
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/EnglishLanguageProcessor.cs` at line 2, Remove the
unused using directive for System.Text.RegularExpressions from the top of the
EnglishLanguageProcessor.cs file; locate the using statement (using
System.Text.RegularExpressions;) and delete it so the file only keeps required
namespaces for the EnglishLanguageProcessor class.
```

</details>

</blockquote></details>
<details>
<summary>WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs (1)</summary><blockquote>

`2-2`: **Unused import.**

`System.Text.RegularExpressions` is imported but not used.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs` at line 2, Remove the
unused using directive for System.Text.RegularExpressions from the
TurkishLanguageProcessor.cs file; locate the top-of-file import statement "using
System.Text.RegularExpressions;" in the TurkishLanguageProcessor class and
delete it (or alternatively, if regex is actually needed later, use it in the
relevant method such as any regex-based text processing method), ensuring no
unused using remains.
```

</details>

</blockquote></details>
<details>
<summary>WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs (1)</summary><blockquote>

`3-3`: **Unused import.**

`System.Text.RegularExpressions` is imported but not used in this file.


<details>
<summary>🧹 Remove unused import</summary>

```diff
 using System;
 using System.Text;
-using System.Text.RegularExpressions;
 using WFInfo.Settings;
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs` at line 3, Remove the
unused using directive System.Text.RegularExpressions from the top of
ThaiLanguageProcessor.cs; locate the using statement "using
System.Text.RegularExpressions;" and delete it (or let your IDE
auto-organize/remove unused usings) so the file no longer imports an unused
namespace.
```

</details>

</blockquote></details>
<details>
<summary>tests/EXTERNAL_DATA_STRUCTURE.md (1)</summary><blockquote>

`9-21`: **Add language identifier to fenced code block.**

The file structure diagram code block lacks a language identifier. While this is a visual representation rather than code, adding `text` or `plaintext` satisfies linting rules and improves consistency.


<details>
<summary>📝 Suggested fix</summary>

```diff
-```
+```text
 tests/
 ├── map.json              # Main test map (scenario references)
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @tests/EXTERNAL_DATA_STRUCTURE.md around lines 9 - 21, The fenced code block
showing the tests/ directory tree in EXTERNAL_DATA_STRUCTURE.md is missing a
language identifier; update that fenced block (the triple-backtick block that
begins before "tests/") to include a language tag such as "text" or "plaintext"
(e.g., change totext) so the linting rules pass and formatting is
consistent.


</details>

</blockquote></details>
<details>
<summary>WFInfo/LanguageProcessing/PolishLanguageProcessor.cs (1)</summary><blockquote>

`29-37`: **Consider extracting shared `GenerateCharacterRange` helper.**

This method is duplicated across multiple language processors (also in `EuropeanLanguageProcessor.cs` and `ThaiLanguageProcessor.cs`). Consider moving it to the base `LanguageProcessor` class or a shared utility.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/PolishLanguageProcessor.cs` around lines 29 - 37,
The GenerateCharacterRange method is duplicated across PolishLanguageProcessor,
EuropeanLanguageProcessor and ThaiLanguageProcessor; remove the duplicates and
extract a single shared implementation into the base LanguageProcessor (or a new
internal utility class) as a static or protected helper (e.g., protected static
string GenerateCharacterRange(int start, int end) in LanguageProcessor) so all
subclasses call the shared method; update each processor to call the centralized
method and delete the duplicate implementations to avoid divergence.
```

</details>

</blockquote></details>
<details>
<summary>WFInfo/Tests/OCRTestRunner.cs (1)</summary><blockquote>

`545-548`: **Consider extracting `IDataService` interface to a separate file.**

The interface is defined at the bottom of the test runner file. For better organization and discoverability, consider moving it to `WFInfo/Services/IDataService.cs` or similar location where other service interfaces reside.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Tests/OCRTestRunner.cs` around lines 545 - 548, Move the IDataService
interface out of the bottom of OCRTestRunner.cs into its own file containing the
public interface IDataService so it can be discovered and reused; create a new
file with the appropriate namespace, paste the interface signature (string
GetPartName(string name, out int low, bool suppressLogging, out bool
multipleLowest)), update OCRTestRunner (and any other consumers) to reference
the new interface file via the correct namespace/import, and ensure the
project/compile includes the new file.
```

</details>

</blockquote></details>
<details>
<summary>tests/BUILD_INSTRUCTIONS.md (1)</summary><blockquote>

`45-52`: **Add language identifier to fenced code block.**

The code block ending at line 52 lacks a language identifier. This should be `bash` to match the surrounding context.


<details>
<summary>📝 Suggested fix</summary>

```diff
-```
+```bash
 # Create new test project
 dotnet new console -n WFInfo.Tests -f net48

 # Copy test files to new project
 # Copy Tests/ folder to WFInfo.Tests/
 # Add necessary references to WFInfo.Tests.csproj
 ```
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @tests/BUILD_INSTRUCTIONS.md around lines 45 - 52, The fenced code block that
shows shell commands is missing a language identifier; update the opening fence
from tobash so the block is marked as bash (i.e., add the "bash"
identifier to the triple-backtick that precedes the lines starting with "#
Create new test project"), ensuring syntax highlighting and consistency with
surrounding blocks.


</details>

</blockquote></details>
<details>
<summary>WFInfo/Ocr.cs (1)</summary><blockquote>

`1006-1023`: **Zone fragmentation threshold logic has redundant fallback.**

When `zones.Count > 12`, the code creates a new single-zone list but then immediately sets `snapThreads = 1` in the next condition (`zones.Count > 8`). The comment says "Keep the zones but process them single-threaded" but the zones were just replaced with a single zone.

Consider simplifying:


<details>
<summary>♻️ Suggested simplification</summary>

```diff
                 zones = DivideSnapZones(filteredImage, filteredImageClean, rowHits, colHits);
-                // Fallback to single-threaded for large layouts to avoid threading issues
-                if (zones.Count > 12) // Too many zones means fragmentation is occurring
+                if (zones.Count > 12) // Too many zones - fragmentation; fallback to single zone
                 {
-                    // Fallback to single-threaded for large layouts to avoid threading issues
                     zones = new List<Tuple<Bitmap, Rectangle>>();
                     zones.Add( Tuple.Create(filteredImageClean, new Rectangle(0, 0, filteredImageClean.Width, filteredImageClean.Height) ) );
                     snapThreads = 1;
-                    // Keep the zones but process them single-threaded
                 }
-                else if (zones.Count > 8) // Large but reasonable number of zones
+                else if (zones.Count > 8) // Large layout - use single thread
                 {
-                    // Large but reasonable number of zones
                     snapThreads = 1;
                 }
                 else
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Ocr.cs` around lines 1006 - 1023, The current logic mutates zones then
re-reads zones.Count which can be confusing and leads to redundant/ambiguous
behavior; capture the original count into a local (e.g. int originalZoneCount =
zones.Count) and base the branching on that value: if originalZoneCount > 12
then replace zones with the single full-image tuple (using filteredImageClean
and Rectangle) and set snapThreads = 1; else if originalZoneCount > 8 set
snapThreads = 1; else set snapThreads = 4. This preserves the intended "replace
zones only when >12" behavior and avoids reading the modified zones.Count when
deciding snapThreads.
```

</details>

</blockquote></details>
<details>
<summary>WFInfo/Tests/TestModels.cs (2)</summary><blockquote>

`69-87`: **Dictionary and List properties should be initialized.**

Similar to `TestResult`, the collection properties in `TestSuiteResult` should be initialized to prevent null reference issues.


<details>
<summary>♻️ Initialize collection properties</summary>

```diff
 public class TestSuiteResult
 {
     public string TestSuiteName { get; set; }
     public DateTime StartTime { get; set; }
     public DateTime EndTime { get; set; }
-    public List<TestResult> TestResults { get; set; }
+    public List<TestResult> TestResults { get; set; } = new List<TestResult>();
     public int TotalTests { get; set; }
     public int PassedTests { get; set; }
     public int FailedTests { get; set; }
     public double OverallAccuracy { get; set; }
     public double PassRate { get; set; }
-    public Dictionary<string, double> LanguageAccuracy { get; set; }
-    public Dictionary<string, double> ThemeAccuracy { get; set; }
-    public Dictionary<string, double> CategoryAccuracy { get; set; }
-    public Dictionary<string, TestCoverage> CategoryCoverage { get; set; }
-    public Dictionary<string, TestCoverage> LanguageCoverage { get; set; }
+    public Dictionary<string, double> LanguageAccuracy { get; set; } = new Dictionary<string, double>();
+    public Dictionary<string, double> ThemeAccuracy { get; set; } = new Dictionary<string, double>();
+    public Dictionary<string, double> CategoryAccuracy { get; set; } = new Dictionary<string, double>();
+    public Dictionary<string, TestCoverage> CategoryCoverage { get; set; } = new Dictionary<string, TestCoverage>();
+    public Dictionary<string, TestCoverage> LanguageCoverage { get; set; } = new Dictionary<string, TestCoverage>();
     public TestCoverage OverallCoverage { get; set; }
     public string ErrorMessage { get; set; }
 }
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Tests/TestModels.cs` around lines 69 - 87, The collection properties
on TestSuiteResult are not initialized which can cause NREs; update the
TestSuiteResult class to initialize TestResults (List<TestResult>),
LanguageAccuracy, ThemeAccuracy, CategoryAccuracy (Dictionary<string,double>),
CategoryCoverage, LanguageCoverage (Dictionary<string,TestCoverage>) either via
inline property initializers or in a parameterless constructor so they default
to empty collections; leave scalar properties like
TotalTests/OverallCoverage/ErrorMessage as-is unless you also want to set a
sensible default.
```

</details>

---

`46-58`: **Collection properties should be initialized to prevent `NullReferenceException`.**

`ExpectedParts`, `ActualParts`, `MissingParts`, and `ExtraParts` are not initialized. Callers accessing these without initialization will throw. Consider initializing them inline.


<details>
<summary>♻️ Initialize collection properties</summary>

```diff
 public class TestResult
 {
     public string TestCaseName { get; set; }
     public string ImagePath { get; set; }
     public bool Success { get; set; }
-    public List<PartMatchResult> ExpectedParts { get; set; }
-    public List<PartMatchResult> ActualParts { get; set; }
-    public List<string> MissingParts { get; set; }
-    public List<string> ExtraParts { get; set; }
+    public List<PartMatchResult> ExpectedParts { get; set; } = new List<PartMatchResult>();
+    public List<PartMatchResult> ActualParts { get; set; } = new List<PartMatchResult>();
+    public List<string> MissingParts { get; set; } = new List<string>();
+    public List<string> ExtraParts { get; set; } = new List<string>();
     public double AccuracyScore { get; set; }
     public long ProcessingTimeMs { get; set; }
     public string ErrorMessage { get; set; }
 }
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Tests/TestModels.cs` around lines 46 - 58, The collection properties
on the TestResult class (ExpectedParts, ActualParts, MissingParts, ExtraParts)
are not initialized and can be null; update the TestResult definition to
initialize these properties inline to empty lists (e.g., ExpectedParts = new
List<PartMatchResult>(), ActualParts = new List<PartMatchResult>(), MissingParts
= new List<string>(), ExtraParts = new List<string>), keeping the property types
and names unchanged so callers can safely enumerate or add items without null
checks.
```

</details>

</blockquote></details>
<details>
<summary>WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs (2)</summary><blockquote>

`85-93`: **`GenerateCharacterRange` is duplicated across processor classes.**

This utility method is also duplicated in `ChineseLanguageProcessor.cs`. Consider moving it to the base `LanguageProcessor` class or a shared utility.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs` around lines 85 - 93,
The GenerateCharacterRange method is duplicated (also present in
ChineseLanguageProcessor); move this utility into the shared base so there's a
single implementation: add GenerateCharacterRange as a protected static method
on the LanguageProcessor base class (or a shared utility class used by both
processors), delete the duplicate in EuropeanLanguageProcessor.cs and
ChineseLanguageProcessor.cs, and update any references to call
LanguageProcessor.GenerateCharacterRange (or the new utility) to ensure
consumers still compile and behavior is unchanged.
```

</details>

---

`57-77`: **Redundant uppercase diacritic handling.**

Lines 69-76 handle uppercase accented characters, but line 60 already converts the input to lowercase via `ToLowerInvariant()`. The uppercase replacements will never match.


<details>
<summary>♻️ Remove redundant uppercase handling</summary>

```diff
 protected static string NormalizeEuropeanCharacters(string input)
 {
     // Convert common European diacritics to standard equivalents for comparison
     return input.ToLowerInvariant()
         .Replace('à', 'a').Replace('á', 'a').Replace('â', 'a').Replace('ã', 'a').Replace('ä', 'a').Replace('å', 'a')
         .Replace('è', 'e').Replace('é', 'e').Replace('ê', 'e').Replace('ë', 'e')
         .Replace('ì', 'i').Replace('í', 'i').Replace('î', 'i').Replace('ï', 'i')
         .Replace('ò', 'o').Replace('ó', 'o').Replace('ô', 'o').Replace('õ', 'o').Replace('ö', 'o')
         .Replace('ù', 'u').Replace('ú', 'u').Replace('û', 'u').Replace('ü', 'u')
         .Replace('ñ', 'n')
         .Replace('ç', 'c')
-        .Replace('ÿ', 'y')
-        .Replace('À', 'A').Replace('Á', 'A').Replace('Â', 'A').Replace('Ã', 'A').Replace('Ä', 'A').Replace('Å', 'A')
-        .Replace('È', 'E').Replace('É', 'E').Replace('Ê', 'E').Replace('Ë', 'E')
-        .Replace('Ì', 'I').Replace('Í', 'I').Replace('Î', 'I').Replace('Ï', 'I')
-        .Replace('Ò', 'O').Replace('Ó', 'O').Replace('Ô', 'O').Replace('Õ', 'O').Replace('Ö', 'O')
-        .Replace('Ù', 'U').Replace('Ú', 'U').Replace('Û', 'U').Replace('Ü', 'U')
-        .Replace('Ñ', 'N')
-        .Replace('Ç', 'C')
-        .Replace('Ÿ', 'Y');
+        .Replace('ÿ', 'y');
 }
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs` around lines 57 - 77,
The NormalizeEuropeanCharacters method lowercases the input with
ToLowerInvariant(), so the subsequent Replace calls for uppercase accented
characters (the replacements using 'À','Á',...,'Ÿ') are unreachable; remove
those redundant uppercase Replace chains from NormalizeEuropeanCharacters while
keeping the initial ToLowerInvariant() and all lowercase diacritic replacements
intact to preserve behavior; locate the uppercase replacement sequences in the
NormalizeEuropeanCharacters method and delete them.
```

</details>

</blockquote></details>
<details>
<summary>WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs (2)</summary><blockquote>

`21-21`: **Large character whitelist may impact performance.**

`GenerateCharacterRange(0x4E00, 0x9FAF)` generates a string with ~20,000 characters. If this whitelist is used frequently (e.g., in Tesseract configuration), consider caching the generated string as a static readonly field.


<details>
<summary>♻️ Cache the character whitelist</summary>

```diff
+private static readonly string ChineseCharacterRange = GenerateCharacterRange(0x4E00, 0x9FAF);
+private static readonly string _characterWhitelist = ChineseCharacterRange + "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";

-public override string CharacterWhitelist => GenerateCharacterRange(0x4E00, 0x9FAF) + "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
+public override string CharacterWhitelist => _characterWhitelist;
```
</details>


Also applies to: 92-92

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs` at line 21, The
CharacterWhitelist property in ChineseLanguageProcessor currently builds a
~20k-char string each access via GenerateCharacterRange(0x4E00, 0x9FAF) which
can hurt performance; change this to compute and store the whitelist once (e.g.,
as a static readonly string field) and have the CharacterWhitelist getter return
that cached field combined with the ASCII alphanumerics, ensuring
GenerateCharacterRange is only called during static initialization.
```

</details>

---

`11-147`: **Significant code duplication between Simplified and Traditional Chinese processors.**

Both processors share nearly identical implementations for:
- `GenerateCharacterRange` (also duplicated in `EuropeanLanguageProcessor.cs`)
- `NormalizeForPatternMatching`
- `IsPartNameValid`
- `NormalizeChineseCharacters`

Consider extracting a `ChineseLanguageProcessorBase` class to reduce duplication and maintenance burden, similar to how `EuropeanLanguageProcessorBase` consolidates European languages.


<details>
<summary>♻️ Proposed base class extraction</summary>

```csharp
public abstract class ChineseLanguageProcessorBase : LanguageProcessor
{
    protected ChineseLanguageProcessorBase(IReadOnlyApplicationSettings settings) : base(settings) { }

    protected static string GenerateCharacterRange(int start, int end)
    {
        var chars = new char[end - start + 1];
        for (int i = 0; i <= end - start; i++)
        {
            chars[i] = (char)(start + i);
        }
        return new string(chars);
    }

    public override string NormalizeForPatternMatching(string input)
    {
        if (string.IsNullOrEmpty(input)) return input;
        string normalized = input.ToLower(_culture).Trim();
        normalized = normalized.Replace("prime", " prime ");
        normalized = RemoveAccents(normalized);
        var parts = normalized.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
        return string.Join(" ", parts);
    }

    public override bool IsPartNameValid(string partName)
    {
        return !string.IsNullOrEmpty(partName) && partName.Replace(" ", "").Length >= 4;
    }

    protected static string NormalizeChineseCharacters(string input)
    {
        return NormalizeFullWidthCharacters(input).ToLowerInvariant();
    }
}
```

Then both processors simply inherit and provide locale-specific overrides for `Locale`, `BlueprintRemovals`, and `CharacterWhitelist`.
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs` around lines 11 - 147,
There’s duplicated logic in SimplifiedChineseLanguageProcessor and
TraditionalChineseLanguageProcessor (GenerateCharacterRange,
NormalizeForPatternMatching, IsPartNameValid, NormalizeChineseCharacters);
extract a ChineseLanguageProcessorBase (similar to
EuropeanLanguageProcessorBase) that implements GenerateCharacterRange,
NormalizeForPatternMatching, IsPartNameValid, and NormalizeChineseCharacters,
have both SimplifiedChineseLanguageProcessor and
TraditionalChineseLanguageProcessor inherit that base and only override Locale,
BlueprintRemovals, and CharacterWhitelist (using the base
GenerateCharacterRange) to remove the duplicated implementations.
```

</details>

</blockquote></details>
<details>
<summary>WFInfo/LanguageProcessing/LanguageProcessorFactory.cs (2)</summary><blockquote>

`68-88`: **Consider caching the supported locales array.**

`GetSupportedLocales()` allocates a new array on every call. Since this is static data, caching it would reduce allocations.


<details>
<summary>♻️ Proposed optimization</summary>

```diff
+private static readonly string[] _supportedLocales = new[]
+{
+    "en", "ko", "ja", "zh-hans", "zh-hant", "th", "ru", "uk", "tr", "pl", "fr", "de", "es", "pt", "it"
+};

 public static string[] GetSupportedLocales()
 {
-    return new[]
-    {
-        "en",        // English
-        "ko",        // Korean
-        "ja",        // Japanese
-        "zh-hans",   // Simplified Chinese
-        "zh-hant",   // Traditional Chinese
-        "th",        // Thai
-        "ru",        // Russian
-        "uk",        // Ukrainian
-        "tr",        // Turkish
-        "pl",        // Polish
-        "fr",        // French
-        "de",        // German
-        "es",        // Spanish
-        "pt",        // Portuguese
-        "it"         // Italian
-    };
+    return _supportedLocales;
 }
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/LanguageProcessorFactory.cs` around lines 68 - 88,
GetSupportedLocales currently allocates a new string[] on each call; add a
private static readonly cached field in LanguageProcessorFactory (e.g. private
static readonly string[] _supportedLocales = { ... }) and change
GetSupportedLocales() to return that cached array (return _supportedLocales;) so
callers reuse the same instance and avoid repeated allocations; ensure the field
is initialized inline (static readonly) for thread-safe one-time setup.
```

</details>

---

`22-28`: **Consider clearing the processor cache when re-initializing with new settings.**

If `Initialize()` is called multiple times (e.g., when settings change), cached processors will retain old settings since the cache isn't cleared. This could cause stale behavior.


<details>
<summary>♻️ Proposed fix to clear cache on re-initialization</summary>

```diff
 public static void Initialize(IReadOnlyApplicationSettings settings)
 {
     if (settings == null)
         throw new ArgumentNullException(nameof(settings));

+    lock (_lock)
+    {
+        _processors.Clear();
+    }
     _settings = settings;
 }
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/LanguageProcessorFactory.cs` around lines 22 - 28,
The Initialize method sets _settings but does not clear the processor cache, so
when Initialize(IReadOnlyApplicationSettings settings) is called with new
settings any cached processors will remain stale; update Initialize to
reset/clear the processor cache (e.g., call Clear() on the cache collection or
reassign a new dictionary/list) after setting _settings so any cached entries
(the processor cache field such as _processors or _processorCache) are dropped
and new processors will be created with the new settings.
```

</details>

</blockquote></details>
<details>
<summary>WFInfo/Tests/TestProgram.cs (1)</summary><blockquote>

`137-143`: **Consider using LINQ for filtering failed tests.**

The manual loop can be simplified with LINQ's `Where()` clause for better readability.


<details>
<summary>♻️ Simplified with LINQ</summary>

```diff
 Console.WriteLine("Failed Tests:");
-var failedTests = new System.Collections.Generic.List<TestResult>();
-foreach (var test in results.TestResults)
-{
-    if (!test.Success)
-        failedTests.Add(test);
-}
+var failedTests = results.TestResults.Where(t => !t.Success).ToList();
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Tests/TestProgram.cs` around lines 137 - 143, Replace the manual loop
that builds failedTests by using LINQ to filter results.TestResults;
specifically, set failedTests to results.TestResults.Where(t =>
!t.Success).ToList() (or use System.Linq extension methods) so you remove the
foreach and make the intent clearer while preserving the failedTests
List<TestResult> variable.
```

</details>

</blockquote></details>

</blockquote></details>

---

<details>
<summary>ℹ️ Review info</summary>

**Configuration used**: defaults

**Review profile**: CHILL

**Plan**: Pro

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between b167a2c3b03e02500c062c49eaf644622ff648f4 and 8e8a880519fbcffb9a11d3b01a5cf194fa66a6a1.

</details>

<details>
<summary>📒 Files selected for processing (31)</summary>

* `WFInfo/CustomEntrypoint.cs`
* `WFInfo/Data.cs`
* `WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs`
* `WFInfo/LanguageProcessing/CyrillicLanguageProcessor.cs`
* `WFInfo/LanguageProcessing/EnglishLanguageProcessor.cs`
* `WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs`
* `WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs`
* `WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs`
* `WFInfo/LanguageProcessing/LanguageProcessor.cs`
* `WFInfo/LanguageProcessing/LanguageProcessorFactory.cs`
* `WFInfo/LanguageProcessing/PolishLanguageProcessor.cs`
* `WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs`
* `WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs`
* `WFInfo/Ocr.cs`
* `WFInfo/Properties/AssemblyInfo.cs`
* `WFInfo/Services/TesseractService.cs`
* `WFInfo/Settings/SettingsWindow.xaml`
* `WFInfo/Tests/OCRTestRunner.cs`
* `WFInfo/Tests/TestModels.cs`
* `WFInfo/Tests/TestProgram.cs`
* `tests/BUILD_INSTRUCTIONS.md`
* `tests/COVERAGE_FEATURES.md`
* `tests/EXTERNAL_DATA_STRUCTURE.md`
* `tests/README.md`
* `tests/TEST_EXECUTION_FLOW.md`
* `tests/data/test1.json`
* `tests/data/test2.json`
* `tests/data/test3.json`
* `tests/map.json`
* `tests/run_tests.bat`
* `tests/usage_example.md`

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
WFInfo/CustomEntrypoint.cs (1)

89-91: Tighten test-mode detection to avoid accidental activation.

Line 91 treats any first argument containing "map" as test mode, which can misroute normal launches when paths/args include that substring.

Suggested adjustment
-            if (args.Length >= 2 &&
-                (args[0].EndsWith(".json", StringComparison.OrdinalIgnoreCase) ||
-                args[0].IndexOf("map", StringComparison.OrdinalIgnoreCase) >= 0))
+            if (args.Length >= 2 &&
+                args[0].EndsWith(".json", StringComparison.OrdinalIgnoreCase))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/CustomEntrypoint.cs` around lines 89 - 91, The current test-mode
detection in the if condition (inspecting args[0]) treats any argument
containing "map" as test mode; update the check in CustomEntrypoint.cs so it
only triggers for explicit test-mode tokens (e.g., args[0].Equals("map",
StringComparison.OrdinalIgnoreCase) or args[0].Equals("-map",
StringComparison.OrdinalIgnoreCase) or args[0].Equals("--map",
StringComparison.OrdinalIgnoreCase)) or for well-defined prefixes like
args[0].StartsWith("map:", StringComparison.OrdinalIgnoreCase) rather than using
IndexOf("map") >= 0; modify the if condition that references args to use these
exact-token/prefix checks so normal paths containing "map" are not misdetected.
WFInfo/LanguageProcessing/CyrillicLanguageProcessor.cs (2)

68-76: Duplicate GenerateCharacterRange method in both classes.

The identical helper method is defined in both RussianLanguageProcessor (lines 68-76) and UkrainianLanguageProcessor (lines 138-146). Consider moving it to the LanguageProcessor base class as a protected static method to eliminate duplication.

♻️ Proposed refactor

Move to LanguageProcessor.cs:

protected static string GenerateCharacterRange(int start, int end)
{
    var chars = new char[end - start + 1];
    for (int i = 0; i <= end - start; i++)
    {
        chars[i] = (char)(start + i);
    }
    return new string(chars);
}

Then remove the private copies from both Cyrillic processors.

Also applies to: 138-146

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/CyrillicLanguageProcessor.cs` around lines 68 - 76,
Duplicate private method GenerateCharacterRange exists in
RussianLanguageProcessor and UkrainianLanguageProcessor; move a single protected
static GenerateCharacterRange(int start, int end) implementation into the
LanguageProcessor base class and remove the private copies from both Cyrillic
processors, then update any internal calls in RussianLanguageProcessor and
UkrainianLanguageProcessor to use the inherited protected static method (no
signature changes required).

93-93: Ukrainian-specific ranges are redundant.

The ranges 0x0490-0x0491, 0x0406-0x0407, 0x0456-0x0457, and 0x0492-0x0493 are all within 0x0400-0x04FF (Cyrillic block). These additional GenerateCharacterRange calls add duplicate characters to the whitelist.

Consider simplifying to just the two main ranges plus symbols:

♻️ Proposed simplification
-        public override string CharacterWhitelist => GenerateCharacterRange(0x0400, 0x04FF) + GenerateCharacterRange(0x0500, 0x052F) + GenerateCharacterRange(0x0490, 0x0491) + GenerateCharacterRange(0x0406, 0x0407) + GenerateCharacterRange(0x0456, 0x0457) + GenerateCharacterRange(0x0492, 0x0493) + "0123456789:-()"; // Cyrillic + Ukrainian specific
+        public override string CharacterWhitelist => GenerateCharacterRange(0x0400, 0x04FF) + GenerateCharacterRange(0x0500, 0x052F) + "0123456789:-()"; // Cyrillic + Cyrillic Supplement (includes Ukrainian-specific characters)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/CyrillicLanguageProcessor.cs` at line 93, The
CharacterWhitelist property in CyrillicLanguageProcessor.cs currently includes
redundant Ukrainian-specific ranges that are already covered by
GenerateCharacterRange(0x0400, 0x04FF); remove the extra GenerateCharacterRange
calls for 0x0490-0x0491, 0x0406-0x0407, 0x0456-0x0457, and 0x0492-0x0493 so
CharacterWhitelist becomes GenerateCharacterRange(0x0400, 0x04FF) +
GenerateCharacterRange(0x0500, 0x052F) + "0123456789:-()"; update the
CharacterWhitelist getter in the CyrillicLanguageProcessor class accordingly to
avoid duplicate characters.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@WFInfo/LanguageProcessing/CyrillicLanguageProcessor.cs`:
- Around line 106-112: In CyrillicLanguageProcessor, the
Replace("(Кресленник)","(кресленник)") call is dead because input is already
lowercased by input.ToLower(_culture) into the variable normalized; remove that
Replace line (or if the intent was to transform the original casing before
lowercasing, move the Replace to operate on input before calling
input.ToLower(_culture)). Target the block where normalized is assigned (the
input.ToLower(_culture).Trim() line and the subsequent Replace) and either
delete the Replace or relocate it before the ToLower(_culture) call.

---

Nitpick comments:
In `@WFInfo/CustomEntrypoint.cs`:
- Around line 89-91: The current test-mode detection in the if condition
(inspecting args[0]) treats any argument containing "map" as test mode; update
the check in CustomEntrypoint.cs so it only triggers for explicit test-mode
tokens (e.g., args[0].Equals("map", StringComparison.OrdinalIgnoreCase) or
args[0].Equals("-map", StringComparison.OrdinalIgnoreCase) or
args[0].Equals("--map", StringComparison.OrdinalIgnoreCase)) or for well-defined
prefixes like args[0].StartsWith("map:", StringComparison.OrdinalIgnoreCase)
rather than using IndexOf("map") >= 0; modify the if condition that references
args to use these exact-token/prefix checks so normal paths containing "map" are
not misdetected.

In `@WFInfo/LanguageProcessing/CyrillicLanguageProcessor.cs`:
- Around line 68-76: Duplicate private method GenerateCharacterRange exists in
RussianLanguageProcessor and UkrainianLanguageProcessor; move a single protected
static GenerateCharacterRange(int start, int end) implementation into the
LanguageProcessor base class and remove the private copies from both Cyrillic
processors, then update any internal calls in RussianLanguageProcessor and
UkrainianLanguageProcessor to use the inherited protected static method (no
signature changes required).
- Line 93: The CharacterWhitelist property in CyrillicLanguageProcessor.cs
currently includes redundant Ukrainian-specific ranges that are already covered
by GenerateCharacterRange(0x0400, 0x04FF); remove the extra
GenerateCharacterRange calls for 0x0490-0x0491, 0x0406-0x0407, 0x0456-0x0457,
and 0x0492-0x0493 so CharacterWhitelist becomes GenerateCharacterRange(0x0400,
0x04FF) + GenerateCharacterRange(0x0500, 0x052F) + "0123456789:-()"; update the
CharacterWhitelist getter in the CyrillicLanguageProcessor class accordingly to
avoid duplicate characters.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8e8a880 and c7fc11c.

📒 Files selected for processing (4)
  • .gitattributes
  • WFInfo/CustomEntrypoint.cs
  • WFInfo/LanguageProcessing/CyrillicLanguageProcessor.cs
  • WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs
🚧 Files skipped from review as they are similar to previous changes (1)
  • WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

♻️ Duplicate comments (2)
tests/run_tests.bat (2)

19-20: ⚠️ Potential issue | 🟠 Major

Quote SET assignments to avoid metacharacter injection and path breakage.

Line [19] and Line [20] still use unquoted set forms. In cmd, this can misparse values containing &, |, (, ) and break valid paths.

Suggested hardening patch
-set TEST_IMAGES_DIR=%~1
-if "%TEST_IMAGES_DIR%"=="" set TEST_IMAGES_DIR=data
+set "TEST_IMAGES_DIR=%~1"
+if "%TEST_IMAGES_DIR%"=="" set "TEST_IMAGES_DIR=data"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/run_tests.bat` around lines 19 - 20, Change the unquoted SET
assignments to quoted form to prevent cmd metacharacter injection and
path-breaking: replace set TEST_IMAGES_DIR=%~1 with set "TEST_IMAGES_DIR=%~1"
and replace set TEST_IMAGES_DIR=data (used when TEST_IMAGES_DIR is empty) with
set "TEST_IMAGES_DIR=data" (or change the if-block to if "%TEST_IMAGES_DIR%"==""
set "TEST_IMAGES_DIR=data"); reference the TEST_IMAGES_DIR environment variable
and the SET statements in the batch script when making this change.

32-36: ⚠️ Potential issue | 🟠 Major

Make results filename generation locale-safe and always quote the output path.

Line [32]/Line [36] depend on %date%/%time% substring positions (locale-dependent), and the output filename is unquoted at execution. Single-digit hours can introduce a leading space and split arguments.

Suggested robust patch
-echo Output: test_results_%date:~-4,4%%date:~-10,2%%date:~-7,2%_%time:~0,2%%time:~3,2%%time:~6,2%.json
+set "HH=%time:~0,2%"
+if "%HH:~0,1%"==" " set "HH=0%HH:~1,1%"
+set "RESULT_FILE=test_results_%date:~-4,4%%date:~-10,2%%date:~-7,2%_%HH%%time:~3,2%%time:~6,2%.json"
+echo Output: %RESULT_FILE%
@@
-..\bin\Release\net48\WFInfo.exe map.json "%TEST_IMAGES_DIR%" test_results_%date:~-4,4%%date:~-10,2%%date:~-7,2%_%time:~0,2%%time:~3,2%%time:~6,2%.json
+"..\bin\Release\net48\WFInfo.exe" "map.json" "%TEST_IMAGES_DIR%" "%RESULT_FILE%"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/run_tests.bat` around lines 32 - 36, The current echo and WFInfo.exe
invocation rely on locale-specific %DATE%/%TIME% substrings and leave the output
filename unquoted; replace those lines to generate a locale-safe timestamp
(e.g., call PowerShell: for /f "usebackq delims=" %%T in (`powershell -NoProfile
-Command "Get-Date -Format 'yyyyMMdd_HHmmss'"`) do set TIMESTAMP=%%T) and then
use "%TIMESTAMP%" when building filenames, and always wrap the output
path/filename and "%TEST_IMAGES_DIR%" in quotes in both the echo and the
WFInfo.exe call so single-digit hours or spaces don't split arguments (update
the echo line and the ..\bin\Release\net48\WFInfo.exe map.json
"%TEST_IMAGES_DIR%" "test_results_%TIMESTAMP%.json" accordingly).
🧹 Nitpick comments (4)
WFInfo/errorDialogue.xaml.cs (1)

50-74: Consider using standard foreach loops and clearer path construction.

The .Where(...).ToList().ForEach(...) pattern creates unnecessary intermediate lists. Standard foreach loops are more idiomatic in C# and avoid the allocation. Additionally, using Path.GetDirectoryName(startPath) would be clearer than the \..\ navigation.

♻️ Suggested refactor
-                    // Priority files: debug.log and settings JSON files
-                    var priorityFiles = new List<string>
-                    {
-                        startPath + @"\..\debug.log",
-                        startPath + @"\..\settings.json"
-                    };
-
-                    // Other data files
-                    var otherDataFiles = new List<string>
-                    {
-                        startPath + @"\..\eqmt_data.json",
-                        startPath + @"\..\market_data.json",
-                        startPath + @"\..\market_items.json",
-                        startPath + @"\..\name_data.json",
-                        startPath + @"\..\relic_data.json"
-                    };
-
-                    // Add priority files first
-                    priorityFiles.Where(path => File.Exists(path)).ToList().ForEach(filename => zip.AddFile(filename, ""));
-
-                    // Add other data files
-                    otherDataFiles.Where(path => File.Exists(path)).ToList().ForEach(filename => zip.AddFile(filename, ""));
-
-                    // Add debug folder files last
-                    files.Select(file => file.FullName).ToList().ForEach(filename => zip.AddFile(filename, ""));
+                    var appPath = Path.GetDirectoryName(startPath);
+                    
+                    // Priority files: debug.log and settings JSON files
+                    var priorityFiles = new[]
+                    {
+                        Path.Combine(appPath, "debug.log"),
+                        Path.Combine(appPath, "settings.json")
+                    };
+
+                    // Other data files
+                    var otherDataFiles = new[]
+                    {
+                        Path.Combine(appPath, "eqmt_data.json"),
+                        Path.Combine(appPath, "market_data.json"),
+                        Path.Combine(appPath, "market_items.json"),
+                        Path.Combine(appPath, "name_data.json"),
+                        Path.Combine(appPath, "relic_data.json")
+                    };
+
+                    // Add priority files first
+                    foreach (var path in priorityFiles.Where(File.Exists))
+                        zip.AddFile(path, "");
+
+                    // Add other data files
+                    foreach (var path in otherDataFiles.Where(File.Exists))
+                        zip.AddFile(path, "");
+
+                    // Add debug folder files last
+                    foreach (var file in files)
+                        zip.AddFile(file.FullName, "");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/errorDialogue.xaml.cs` around lines 50 - 74, Replace the LINQ
.Where(...).ToList().ForEach(...) allocations with simple foreach loops: build
the parent directory using Path.GetDirectoryName(startPath) and construct each
candidate path with Path.Combine (referencing priorityFiles and otherDataFiles
construction) then use if (File.Exists(path)) zip.AddFile(path, ""); do the same
for the files collection by iterating files (the FileInfo list) and calling
zip.AddFile(file.FullName, ""); update the code locations that call zip.AddFile
to use these foreach loops and Path.Combine instead of string "\..\”
concatenation.
WFInfo/SnapItOverlay.xaml.cs (1)

81-92: Reuse ResetRectangle() inside closeOverlay() to prevent state drift.

closeOverlay() duplicates the same rectangle reset/removal logic already in ResetRectangle(). Consolidating this avoids future divergence.

♻️ Proposed refactor
 public void closeOverlay()
 {
-    rectangle.Width = 0;
-    rectangle.Height = 0;
-    rectangle.RenderTransform = new TranslateTransform(0, 0);
-    rectangle.Visibility = Visibility.Hidden;
-    
-    // Properly clean up canvas by removing the rectangle
-    if (canvas.Children.Contains(rectangle))
-    {
-        canvas.Children.Remove(rectangle);
-    }
+    ResetRectangle();
     
     Topmost = false;
     isEnabled = false;

     // Force immediate hide without delay to prevent rectangle persistence
     Hide();
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/SnapItOverlay.xaml.cs` around lines 81 - 92, The closeOverlay method
duplicates rectangle cleanup logic; replace its manual reset/remove steps with a
single call to ResetRectangle() to avoid drift—update closeOverlay() to call
ResetRectangle() (which handles setting rectangle.Width/Height, RenderTransform,
Visibility and removing from canvas) and remove the duplicated block that
manipulates rectangle and canvas directly.
WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs (1)

64-64: Remove duplicate kana replacement call.

Line 64 repeats .Replace('ヶ', 'ケ') twice; keep one for clarity.

🧹 Proposed cleanup
-            result = result.Replace('ヶ', 'ケ').Replace('ヵ', 'カ').Replace('ヶ', 'ケ');
+            result = result.Replace('ヶ', 'ケ').Replace('ヵ', 'カ');
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs` at line 64, The
chained string replacements on the variable result contain a duplicated call to
.Replace('ヶ', 'ケ'); update the code in JapaneseLanguageProcessor (where result
is manipulated) to remove the redundant .Replace('ヶ', 'ケ') call so each kana
mapping appears only once (keep one .Replace('ヶ', 'ケ') alongside the existing
.Replace('ヵ', 'カ')).
WFInfo/LanguageProcessing/LanguageProcessor.cs (1)

295-299: Add a null/empty guard in NormalizeFullWidthCharacters.

At Line 297, input.Length will throw if this helper ever receives null. This utility is shared enough to warrant defensive handling.

🛡️ Proposed fix
 protected static string NormalizeFullWidthCharacters(string input)
 {
+    if (string.IsNullOrEmpty(input)) return input;
     var result = new System.Text.StringBuilder(input.Length);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/LanguageProcessor.cs` around lines 295 - 299,
NormalizeFullWidthCharacters should guard against null/empty inputs before using
input.Length; add a check at the start of the method (e.g., if
string.IsNullOrEmpty(input)) and return the original input (or String.Empty when
input is null) so you don't call input.Length on null. This keeps the current
behavior for empty strings and prevents NullReferenceException while leaving the
rest of the method (the StringBuilder usage and loop over chars) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/README.md`:
- Line 78: Replace the incorrect Japanese script string in the language list
entry for "Japanese (`japanese`)" by changing the displayed text from "日本语" to
the correct "日本語" in README.md so the language label uses the proper kanji;
update the single line that contains "**Japanese** (`japanese`) - 日本语" to use
"**Japanese** (`japanese`) - 日本語".

In `@tests/run_tests.bat`:
- Around line 10-15: Replace the silent "goto :eof" exits in the setup failure
branches with explicit non-zero exit codes so CI sees failures; locate the
failing-paths that currently end with "goto :eof" (the branches that echo
"ERROR: map.json not found..." and the other setup failure branch) and change
them to use "exit /b 2" for the first failure and "exit /b 3" for the other so
the script returns distinct non-zero exit codes instead of possibly 0.

In `@tests/usage_example.md`:
- Line 50: The language list in the sentence "automated regression testing for
all supported languages (English, Korean, Japanese, Chinese
Simplified/Traditional, French, Ukrainian, Italian, German, Spanish, Portuguese,
Polish, Russian)" incorrectly includes Japanese; update that parenthetical (or
the surrounding sentence) so it excludes Japanese and matches the PR objective
"except Thai, Japanese, and Turkish" — e.g., remove "Japanese" from the language
list or add a clarifying clause "except Thai, Japanese, and Turkish" to the
sentence.

In `@WFInfo/CustomEntrypoint.cs`:
- Around line 89-104: The flag-style map arguments (-map, --map, map) are being
allowed but not normalized before calling TestProgram.RunTests, so when args[0]
is a flag the test runner treats the flag as the JSON path; update the code in
CustomEntrypoint to detect if args[0] is one of the map flags and, if so,
remove/shift that element (e.g., replace args with args.Skip(1).ToArray() or
construct a new array where args[1] becomes args[0] and the rest follow) so that
TestProgram.RunTests receives the actual JSON map path as args[0]; ensure this
normalization covers "map:" prefix handling as well before invoking
TestProgram.RunTests.

In `@WFInfo/Data.cs`:
- Around line 212-224: In ReloadItems, stop silently falling back to English
when item["i18n"][_settings.Locale]["name"] is missing: instead of using
Main.AddLog + English fallback, validate presence of item["i18n"],
item["i18n"][_settings.Locale], and the "name" field and throw a clear exception
(or propagate one) that includes itemId and _settings.Locale so upstream can
detect/regressions; update the block around
item["i18n"][_settings.Locale]["name"] and tempMarketItems assignments in
ReloadItems to remove the English substitution and surface errors (use the
existing Main.AddLog only for context if desired, but do not append englishName
to tempMarketItems).
- Around line 42-43: ReloadItems and GetWfmItemList are writing/reading a single
fallback file (fallback_names.json) for all locales which causes English data to
overwrite localized data; change the fallback implementation to be locale-scoped
by including the locale in the fallback file name or path (e.g.,
fallback_names.{locale}.json) and update the wfmItemsFallbackPath usage so reads
and writes in GetWfmItemList and ReloadItems use the locale-specific path;
ensure the logic that writes English then locale-specific data writes each to
its own locale file and the fallback load (currently around
GetWfmItemList/ReloadItems) only reads the matching locale file so English
fallbacks never get loaded as localized data.

In `@WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs`:
- Around line 44-52: The override chain creates infinite recursion because
DefaultLevenshteinDistance calls LevenshteinDistanceWithPreprocessing which in
turn calls the virtual DefaultLevenshteinDistance override; fix by making the
preprocessing helper call a non-virtual core implementation instead of the
virtual method: add or use a private/core method (e.g., ComputeLevenshteinCore
or call base.DefaultLevenshteinDistance) from
LevenshteinDistanceWithPreprocessing so that CalculateLevenshteinDistance ->
DefaultLevenshteinDistance -> LevenshteinDistanceWithPreprocessing ->
ComputeLevenshteinCore executes without re-entering the override; update
references to BlueprintRemovals and NormalizeEuropeanCharacters accordingly.

In `@WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs`:
- Around line 41-43: The Japanese normalization path is calling RemoveAccents
which can strip Japanese combining marks (e.g., dakuten) and corrupt tokens; in
the JapaneseLanguageProcessor (the Normalize/NormalizeText method where
`normalized` is set) remove or gate the call to RemoveAccents so it is not
applied for Japanese processing—either delete the RemoveAccents(normalized)
invocation or wrap it behind a language check (e.g., only call RemoveAccents for
non-Japanese languages); ensure any downstream code still receives the original
normalized string without accent stripping.

In `@WFInfo/LanguageProcessing/PolishLanguageProcessor.cs`:
- Line 21: The CharacterWhitelist in PolishLanguageProcessor is missing several
Polish letters (Óó, Śś, Źź, Żż); update the CharacterWhitelist property to
include those characters or add the appropriate GenerateCharacterRange calls so
the whitelist contains full Polish diacritics (e.g., include Ó/ó, Ś/ś, Ź/ź, Ż/ż)
alongside the existing ranges — modify the CharacterWhitelist getter in
PolishLanguageProcessor and use GenerateCharacterRange or literal additions to
ensure all Polish letters are present.

In `@WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs`:
- Around line 39-40: The code in ThaiLanguageProcessor that calls
RemoveAccents(normalized) during normalization strips Thai combining marks and
should be removed or skipped for Thai text; update the normalization logic in
the method where normalized is set to stop calling RemoveAccents(normalized) for
Thai (either delete that call or guard it with a language check), and add a
brief comment explaining that Thai tone/vowel marks must be preserved to avoid
breaking OCR/token matching.

In `@WFInfo/Ocr.cs`:
- Around line 1292-1293: The code calls _tesseractService.SetNumbersOnlyMode()
but only resets the FirstEngine whitelist on the success path, which can leave
OCR stuck in numeric mode on exceptions; move the whitelist-reset logic into a
finally block so Reset/Restore of the FirstEngine whitelist always runs
regardless of exceptions, update the block that currently restores the whitelist
(the counterpart to _tesseractService.SetNumbersOnlyMode()) to be executed from
a finally and apply the same change to the other occurrence noted (the second
SetNumbersOnlyMode/restore pair).
- Around line 258-264: The distance filter currently uses >
Math.Max(part.Length, 6) which enforces ~100% instead of the intended 50%
cutoff; update the condition(s) that reference Math.Max(part.Length, 6) (e.g.,
the checks using firstProximity[i], part, correctName in the WFInfo.Ocr logic)
to compare against half the string length with a sensible floor, e.g. use >
Math.Max((int)Math.Ceiling(part.Length * 0.5), 3) (or Math.Max(part.Length/2, 3)
if you prefer integer math) so any Levenshtein distance greater than 50% of part
length is rejected; apply the same change to both occurrences noted in the diff.

---

Duplicate comments:
In `@tests/run_tests.bat`:
- Around line 19-20: Change the unquoted SET assignments to quoted form to
prevent cmd metacharacter injection and path-breaking: replace set
TEST_IMAGES_DIR=%~1 with set "TEST_IMAGES_DIR=%~1" and replace set
TEST_IMAGES_DIR=data (used when TEST_IMAGES_DIR is empty) with set
"TEST_IMAGES_DIR=data" (or change the if-block to if "%TEST_IMAGES_DIR%"=="" set
"TEST_IMAGES_DIR=data"); reference the TEST_IMAGES_DIR environment variable and
the SET statements in the batch script when making this change.
- Around line 32-36: The current echo and WFInfo.exe invocation rely on
locale-specific %DATE%/%TIME% substrings and leave the output filename unquoted;
replace those lines to generate a locale-safe timestamp (e.g., call PowerShell:
for /f "usebackq delims=" %%T in (`powershell -NoProfile -Command "Get-Date
-Format 'yyyyMMdd_HHmmss'"`) do set TIMESTAMP=%%T) and then use "%TIMESTAMP%"
when building filenames, and always wrap the output path/filename and
"%TEST_IMAGES_DIR%" in quotes in both the echo and the WFInfo.exe call so
single-digit hours or spaces don't split arguments (update the echo line and the
..\bin\Release\net48\WFInfo.exe map.json "%TEST_IMAGES_DIR%"
"test_results_%TIMESTAMP%.json" accordingly).

---

Nitpick comments:
In `@WFInfo/errorDialogue.xaml.cs`:
- Around line 50-74: Replace the LINQ .Where(...).ToList().ForEach(...)
allocations with simple foreach loops: build the parent directory using
Path.GetDirectoryName(startPath) and construct each candidate path with
Path.Combine (referencing priorityFiles and otherDataFiles construction) then
use if (File.Exists(path)) zip.AddFile(path, ""); do the same for the files
collection by iterating files (the FileInfo list) and calling
zip.AddFile(file.FullName, ""); update the code locations that call zip.AddFile
to use these foreach loops and Path.Combine instead of string "\..\”
concatenation.

In `@WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs`:
- Line 64: The chained string replacements on the variable result contain a
duplicated call to .Replace('ヶ', 'ケ'); update the code in
JapaneseLanguageProcessor (where result is manipulated) to remove the redundant
.Replace('ヶ', 'ケ') call so each kana mapping appears only once (keep one
.Replace('ヶ', 'ケ') alongside the existing .Replace('ヵ', 'カ')).

In `@WFInfo/LanguageProcessing/LanguageProcessor.cs`:
- Around line 295-299: NormalizeFullWidthCharacters should guard against
null/empty inputs before using input.Length; add a check at the start of the
method (e.g., if string.IsNullOrEmpty(input)) and return the original input (or
String.Empty when input is null) so you don't call input.Length on null. This
keeps the current behavior for empty strings and prevents NullReferenceException
while leaving the rest of the method (the StringBuilder usage and loop over
chars) unchanged.

In `@WFInfo/SnapItOverlay.xaml.cs`:
- Around line 81-92: The closeOverlay method duplicates rectangle cleanup logic;
replace its manual reset/remove steps with a single call to ResetRectangle() to
avoid drift—update closeOverlay() to call ResetRectangle() (which handles
setting rectangle.Width/Height, RenderTransform, Visibility and removing from
canvas) and remove the duplicated block that manipulates rectangle and canvas
directly.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c7fc11c and 42db901.

📒 Files selected for processing (20)
  • WFInfo/CustomEntrypoint.cs
  • WFInfo/Data.cs
  • WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs
  • WFInfo/LanguageProcessing/CyrillicLanguageProcessor.cs
  • WFInfo/LanguageProcessing/EnglishLanguageProcessor.cs
  • WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs
  • WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs
  • WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs
  • WFInfo/LanguageProcessing/LanguageProcessor.cs
  • WFInfo/LanguageProcessing/PolishLanguageProcessor.cs
  • WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs
  • WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs
  • WFInfo/Ocr.cs
  • WFInfo/Services/TesseractService.cs
  • WFInfo/SnapItOverlay.xaml.cs
  • WFInfo/Tests/TestProgram.cs
  • WFInfo/errorDialogue.xaml.cs
  • tests/README.md
  • tests/run_tests.bat
  • tests/usage_example.md
🚧 Files skipped from review as they are similar to previous changes (5)
  • WFInfo/Tests/TestProgram.cs
  • WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs
  • WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs
  • WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs
  • WFInfo/LanguageProcessing/CyrillicLanguageProcessor.cs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
WFInfo/Ocr.cs (1)

266-270: ⚠️ Potential issue | 🟠 Major

Reward-screen market data lookup can still null-dereference and abort processing.

If GetValue(correctName) returns null, Line [269] will throw before you can skip safely.

🛡️ Proposed guard
-                    JObject job = (JObject)Main.dataBase.marketData.GetValue(correctName);
+                    JObject job = Main.dataBase.marketData.GetValue(correctName) as JObject;
+                    if (job == null)
+                    {
+                        Main.AddLog($"MARKET DATA: No market data found for '{correctName}', skipping item");
+                        continue;
+                    }
                     JObject primeSet = (JObject)Main.dataBase.marketData.GetValue(primeSetName);
                     string ducats = job["ducats"].ToObject<string>();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Ocr.cs` around lines 266 - 270, The code can null-dereference when
Main.dataBase.marketData.GetValue(correctName) or GetValue(primeSetName) returns
null; update the WFInfo.Ocr logic around Data.GetSetName, the job and primeSet
variables to guard: after obtaining job and primeSet check for null (and also
check job["ducats"] exists) and skip/return when missing, and replace
int.Parse(ducats, Main.culture) with a safe parse (int.TryParse with
Main.culture) so missing or non-numeric ducats do not throw; ensure all
downstream uses of job/primeSet only run when those checks pass.
♻️ Duplicate comments (2)
WFInfo/Data.cs (1)

102-103: ⚠️ Potential issue | 🟠 Major

Fallback storage is still not request-locale scoped.

GetWfmItemList("en") and GetWfmItemList(_settings.Locale) both use the same wfmItemsFallbackPath, so English payloads can still overwrite localized fallback content.

🔧 Suggested fix
- private readonly string wfmItemsFallbackPath;
+ private readonly string wfmItemsFallbackDirectory;
...
- wfmItemsFallbackPath = applicationDirectory + $@"\fallback_names.{_settings.Locale}.json";
+ wfmItemsFallbackDirectory = applicationDirectory;
...
private async Task<(JObject Data, bool IsFallback)> GetWfmItemList(string locale)
{
+    string fallbackPath = Path.Combine(wfmItemsFallbackDirectory, $"fallback_names.{locale}.json");
     try
     {
         ...
-        File.WriteAllText(wfmItemsFallbackPath, body);
+        File.WriteAllText(fallbackPath, body);
         return (data, false);
     }
     catch (Exception ex)
     {
-        Main.AddLog("Failed to fetch/parse " + wfmItemsUrl + ", using file " + wfmItemsFallbackPath + Environment.NewLine + ex.ToString());
-        if (File.Exists(wfmItemsFallbackPath))
+        Main.AddLog("Failed to fetch/parse " + wfmItemsUrl + ", using file " + fallbackPath + Environment.NewLine + ex.ToString());
+        if (File.Exists(fallbackPath))
         {
-            string response = File.ReadAllText(wfmItemsFallbackPath);
+            string response = File.ReadAllText(fallbackPath);
             JObject data = JsonConvert.DeserializeObject<JObject>(response);
             return (data, true);
         }
         ...
     }
}

Based on learnings: Do not fall back to English locale data when a specific locale's fallback file is missing, because it breaks locale OCR matching.

Also applies to: 425-453

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Data.cs` around lines 102 - 103, The fallback file path currently uses
the global _settings.Locale (wfmItemsFallbackPath) causing
GetWfmItemList(requestedLocale) to share the same file for different requests;
change the logic so the fallback path is computed per-request using the
requested locale parameter (e.g., derive a locale-specific fallback path inside
GetWfmItemList or a helper that accepts the locale), ensure any references to
wfmItemsFallbackPath use the per-request value, and remove any implicit
default-to-English behavior so a missing locale fallback does not fall back to
"en".
WFInfo/Ocr.cs (1)

704-708: ⚠️ Potential issue | 🟠 Major

SnapIt distance cutoff still uses a ~100% threshold instead of the documented 50%.

Line [707] contradicts the comment in Lines [704]-[706], so weak matches are still accepted.

🎯 Proposed fix
-                if (levenDist == 9999 || levenDist > Math.Max(part.Name.Length, 6) || string.IsNullOrEmpty(name))
+                int maxDistance = Math.Max((int)Math.Ceiling(part.Name.Length * 0.5), 3);
+                if (levenDist == 9999 || levenDist > maxDistance || string.IsNullOrEmpty(name))
                 {
                     foundParts.RemoveAt(i); // remove invalid part from list
                     i--; // Adjust index since we removed an item
                     resultCount--;
                     continue;
                 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Ocr.cs` around lines 704 - 708, The cutoff condition incorrectly uses
Math.Max(part.Name.Length, 6) (allowing ~100% distance) instead of a 50%
threshold; update the if condition that checks levenDist (and currently compares
against Math.Max(part.Name.Length, 6)) to compare against a half-length
threshold, e.g. replace Math.Max(part.Name.Length, 6) with
Math.Max(part.Name.Length / 2, 6) (or
Math.Max((int)Math.Ceiling(part.Name.Length / 2.0), 6) for rounding) so the
check becomes: if (levenDist == 9999 || levenDist > Math.Max(part.Name.Length /
2, 6) || string.IsNullOrEmpty(name)) to enforce the documented ~50% cutoff for
weak matches.
🧹 Nitpick comments (6)
WFInfo/SnapItOverlay.xaml.cs (1)

46-59: Consider simplifying by keeping the rectangle in the canvas.

The current approach of removing and re-adding the rectangle works correctly, but is heavier than necessary. Simply resetting the properties and hiding the rectangle (without removing it from the canvas) would achieve the same visual result with less overhead.

♻️ Optional simplification
 private void ResetRectangle()
 {
     // Reset rectangle properties to ensure it doesn't persist from previous session
     rectangle.Width = 0;
     rectangle.Height = 0;
     rectangle.RenderTransform = new TranslateTransform(0, 0);
     rectangle.Visibility = Visibility.Hidden;
-            
-    // Remove rectangle from canvas to ensure clean state
-    if (canvas.Children.Contains(rectangle))
-    {
-        canvas.Children.Remove(rectangle);
-    }
 }

If you adopt this, you can also remove the re-add logic in canvas_MouseDown (lines 67-70).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/SnapItOverlay.xaml.cs` around lines 46 - 59, ResetRectangle currently
removes the rectangle from the canvas; instead keep the rectangle as a
persistent child and only reset its properties and hide it: leave
rectangle.Width/Height, RenderTransform and Visibility resets as-is but remove
the conditional canvas.Children.Remove(rectangle) call; then remove the re-add
logic in canvas_MouseDown (the block that checks/executes
canvas.Children.Add(rectangle)) since the rectangle will always remain in the
canvas, ensuring you only toggle Visibility and update transforms when showing
it.
tests/run_tests.bat (1)

55-55: Avoid unconditional pause to keep CI/non-interactive runs from hanging.

Line [55] should be gated for interactive usage only.

♻️ Suggested CI-safe pause
-pause
+if not defined CI pause
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/run_tests.bat` at line 55, Replace the unconditional "pause" with an
interactive-only guard so CI/non-interactive runs don't hang: detect an
interactive environment (e.g. check a CI env var like %CI% or %GITHUB_ACTIONS%
or use "if exist CON" / checks) and only call the "pause" command when running
interactively (for example: if "%CI%"=="" pause). Locate the lone "pause" token
in the batch script and wrap it with that conditional or an equivalent guard.
WFInfo/Data.cs (1)

975-978: Normalize the OCR input once before iterating localized market items.

processor.NormalizeForPatternMatching(name) is invariant inside the loop and can be computed once to reduce repeated work on this hot path.

♻️ Suggested refactor
- foreach (var marketItem in marketItems)
+ string normalizedName = processor.NormalizeForPatternMatching(name);
+ foreach (var marketItem in marketItems)
  {
      ...
-     string normalizedName = processor.NormalizeForPatternMatching(name);
      string normalizedStored = processor.NormalizeForPatternMatching(split[2]);
      int val = processor.SimpleLevenshteinDistance(normalizedName, normalizedStored);
      ...
  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Data.cs` around lines 975 - 978, Compute normalized OCR input once
before iterating localized market items instead of calling
processor.NormalizeForPatternMatching(name) inside the loop; precompute a
variable (e.g., normalizedName = processor.NormalizeForPatternMatching(name))
outside the loop that contains the comparison logic and then reuse that variable
when calling processor.SimpleLevenshteinDistance(normalizedName,
normalizedStored). Update the loop that currently computes normalizedName
per-iteration (the block with processor.NormalizeForPatternMatching(split[2])
and processor.SimpleLevenshteinDistance) to only normalize the stored value
per-item and reuse the precomputed normalizedName to avoid redundant work on
this hot path.
WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs (1)

469-549: Avoid rebuilding and resorting the replacement map on every normalization call.

This method allocates a very large list and sorts it each invocation; moving it to a static pre-sorted field will reduce allocation and CPU churn.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs` around lines 469 - 549,
The replacement map is being rebuilt and resorted on every call (the local
variable replacements and the foreach using replacements.OrderByDescending),
causing allocations and CPU overhead; move that large list into a static,
precomputed readonly field (e.g., a static readonly
List<KeyValuePair<string,string>> PrecomputedReplacements) that is initialized
once and sorted by descending Key.Length at type initialization, then change the
method to iterate that static list (replace the use of
replacements.OrderByDescending with PrecomputedReplacements) so normalization
only does the Replace loop and no longer rebuilds or sorts the map per-call.
KoreanProcessorTest.cs (1)

9-63: Console output checks here are not automated tests.

These checks only print status and won’t fail CI on regressions. Consider moving them into the existing test project with assertions so regressions are caught automatically.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@KoreanProcessorTest.cs` around lines 9 - 63, Current Main in
KoreanProcessorTest.cs only prints results and won't fail CI; replace these
manual console checks with automated unit tests in the test project that assert
expected outcomes. Create test cases that call NormalizeKoreanCharacters (via
public wrapper or make it internal and use InternalsVisibleTo),
CalculateLevenshteinDistance, and DecomposeHangul and verify outputs with
assertions (e.g., no exception for the duplicate-keys input, expected distances
for Korean/Korean/Latin/mixed pairs, and correct decomposition for '가'); use the
existing MockApplicationSettings to instantiate KoreanLanguageProcessor in test
setup and add these tests to the test project's test runner so regressions fail
CI.
WFInfo/LanguageProcessing/LanguageProcessor.cs (1)

254-258: Harden preprocessing against null blueprintRemovals/entries.

A null array or null element will throw in the shared base path.

🧰 Proposed hardening
-            foreach (string removal in blueprintRemovals)
+            foreach (string removal in blueprintRemovals ?? Array.Empty<string>())
             {
+                if (string.IsNullOrEmpty(removal)) continue;
                 s = System.Text.RegularExpressions.Regex.Replace(s, System.Text.RegularExpressions.Regex.Escape(removal), "", System.Text.RegularExpressions.RegexOptions.IgnoreCase | System.Text.RegularExpressions.RegexOptions.CultureInvariant);
                 t = System.Text.RegularExpressions.Regex.Replace(t, System.Text.RegularExpressions.Regex.Escape(removal), "", System.Text.RegularExpressions.RegexOptions.IgnoreCase | System.Text.RegularExpressions.RegexOptions.CultureInvariant);
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/LanguageProcessor.cs` around lines 254 - 258, The
loop over blueprintRemovals can throw if the array or an entry is null; before
iterating in the LanguageProcessor method, null-check blueprintRemovals (e.g.,
if null, treat as empty or return) and inside the foreach skip null/empty
entries using string.IsNullOrEmpty(removal) to continue, then call
System.Text.RegularExpressions.Regex.Escape and Regex.Replace only for non-null
strings so the replacements on s and t are safe (update the foreach that
references blueprintRemovals, s, t, and the Regex.Replace calls).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@KoreanProcessorTest.cs`:
- Around line 199-223: In MockApplicationSettings remove the repeated duplicate
property declarations so each property name appears only once: keep a single
definition each for UseCustomRevertLayer, UseCustomUnsetLayer,
UseCustomInitialLayer, and UseCustomInheritLayer; delete the extra repeated
lines and ensure the remaining properties have correct accessibility and return
values consistent with the class/interface (e.g., implement the intended getter
logic) so the class compiles.

In `@tests/run_tests.bat`:
- Around line 34-35: The PowerShell-based timestamp assignment (the for /f
invoking powershell) can fail leaving TIMESTAMP empty and causing
test_results_.json; after that for /f line check ERRORLEVEL and whether
TIMESTAMP is defined (if not defined "%TIMESTAMP%" or use if not defined
TIMESTAMP) and assign a safe fallback (for example build a sanitized fallback
using %DATE% and %TIME% or a unique value using %RANDOM%) and use that ensured
TIMESTAMP for the echo and output filename; update the script to set TIMESTAMP
when the powershell command fails and ensure both places that reference
TIMESTAMP use the guaranteed value.
- Around line 8-10: The script checks for "map.json" and uses a relative
executable path which makes it CWD-dependent; change the script to resolve its
own directory first (e.g., via %~dp0) and use that base for all path checks and
invocations so the if not exist "map.json" check and the later relative
executable call use %~dp0\map.json and %~dp0\<exe> instead of the current
working directory; update the ERROR echo to reference the anchored path so
failures are accurate.

In `@WFInfo/errorDialogue.xaml.cs`:
- Around line 50-90: The loop that adds FileInfo items from files to the ZIP
root can collide with entries from priorityFiles and otherDataFiles because all
are added with an empty directory path; before the foreach over files, filter
out any FileInfo whose Name (or Path.GetFileName(file.FullName)) matches any
Path.GetFileName value in priorityFiles or otherDataFiles (or whose full path
equals those targets) so only unique root filenames are added; update the code
around the files foreach (referencing variables files, priorityFiles,
otherDataFiles and zip.AddFile) to perform that exclusion check prior to calling
zip.AddFile.

In `@WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs`:
- Line 21: The whitelist in the CharacterWhitelist property
(ChineseLanguageProcessor.CharacterWhitelist) stops at 0x9FAF and omits many
valid CJK ideographs; update the whitelist to include the full primary CJK block
(0x4E00–0x9FFF) and add other relevant ranges (CJK Unified Ideographs Extension
A 0x3400–0x4DBF, CJK Compatibility Ideographs 0xF900–0xFAFF and the
supplementary extensions 0x20000–0x2CEAF) or otherwise switch to a Unicode-aware
classification that supports supplementary planes; apply the same fix to the
other occurrence of the whitelist in this file (mentioned at the second
location) and ensure GenerateCharacterRange (or its caller) can represent
supplementary-plane characters (surrogate pairs) if needed.

In `@WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs`:
- Line 94: The German CharacterWhitelist property currently only includes
uppercase umlauts (via GenerateCharacterRange(0x00C4,0x00C4), 0x00D6, 0x00DC)
and thus strips valid lowercase umlauts; update the CharacterWhitelist returned
by the CharacterWhitelist property to also include the lowercase umlauts by
adding GenerateCharacterRange(0x00E4,0x00E4),
GenerateCharacterRange(0x00F6,0x00F6) and GenerateCharacterRange(0x00FC,0x00FC)
alongside the existing ranges (keep GenerateCharacterRange(0x00DF,0x00DF) for ß)
so German OCR output retains ä, ö, ü.

In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs`:
- Around line 445-453: The similarity lookup in KoreanLanguageProcessor.cs
(inside the loop over Korean[groupType] where variables groupType,
similarityGroup, indexA, indexB are used) only checks similarityGroup.Value but
ignores similarityGroup.Key, so declared key pairs are missed; update the check
to also inspect the dictionary key (similarityGroup.Key) for the pair
(indexA,indexB) in either order—e.g., parse or compare the key representation
used for declared pairs and return the same similarity score when the key
matches, in addition to the existing check against similarityGroup.Value.

In `@WFInfo/Ocr.cs`:
- Around line 1015-1021: Before replacing the existing zones list in the
high-fragmentation branch, dispose the Bitmaps held in the current zones to
avoid leaking GDI resources: iterate the existing zones (the Tuple<Bitmap,
Rectangle> entries returned by DivideSnapZones) and call Dispose() on each
Tuple.Item1 (the Bitmap) before assigning a new List and setting snapThreads =
1; ensure this cleanup also handles null checks and swallows/disposes safely if
an exception occurs.

---

Outside diff comments:
In `@WFInfo/Ocr.cs`:
- Around line 266-270: The code can null-dereference when
Main.dataBase.marketData.GetValue(correctName) or GetValue(primeSetName) returns
null; update the WFInfo.Ocr logic around Data.GetSetName, the job and primeSet
variables to guard: after obtaining job and primeSet check for null (and also
check job["ducats"] exists) and skip/return when missing, and replace
int.Parse(ducats, Main.culture) with a safe parse (int.TryParse with
Main.culture) so missing or non-numeric ducats do not throw; ensure all
downstream uses of job/primeSet only run when those checks pass.

---

Duplicate comments:
In `@WFInfo/Data.cs`:
- Around line 102-103: The fallback file path currently uses the global
_settings.Locale (wfmItemsFallbackPath) causing GetWfmItemList(requestedLocale)
to share the same file for different requests; change the logic so the fallback
path is computed per-request using the requested locale parameter (e.g., derive
a locale-specific fallback path inside GetWfmItemList or a helper that accepts
the locale), ensure any references to wfmItemsFallbackPath use the per-request
value, and remove any implicit default-to-English behavior so a missing locale
fallback does not fall back to "en".

In `@WFInfo/Ocr.cs`:
- Around line 704-708: The cutoff condition incorrectly uses
Math.Max(part.Name.Length, 6) (allowing ~100% distance) instead of a 50%
threshold; update the if condition that checks levenDist (and currently compares
against Math.Max(part.Name.Length, 6)) to compare against a half-length
threshold, e.g. replace Math.Max(part.Name.Length, 6) with
Math.Max(part.Name.Length / 2, 6) (or
Math.Max((int)Math.Ceiling(part.Name.Length / 2.0), 6) for rounding) so the
check becomes: if (levenDist == 9999 || levenDist > Math.Max(part.Name.Length /
2, 6) || string.IsNullOrEmpty(name)) to enforce the documented ~50% cutoff for
weak matches.

---

Nitpick comments:
In `@KoreanProcessorTest.cs`:
- Around line 9-63: Current Main in KoreanProcessorTest.cs only prints results
and won't fail CI; replace these manual console checks with automated unit tests
in the test project that assert expected outcomes. Create test cases that call
NormalizeKoreanCharacters (via public wrapper or make it internal and use
InternalsVisibleTo), CalculateLevenshteinDistance, and DecomposeHangul and
verify outputs with assertions (e.g., no exception for the duplicate-keys input,
expected distances for Korean/Korean/Latin/mixed pairs, and correct
decomposition for '가'); use the existing MockApplicationSettings to instantiate
KoreanLanguageProcessor in test setup and add these tests to the test project's
test runner so regressions fail CI.

In `@tests/run_tests.bat`:
- Line 55: Replace the unconditional "pause" with an interactive-only guard so
CI/non-interactive runs don't hang: detect an interactive environment (e.g.
check a CI env var like %CI% or %GITHUB_ACTIONS% or use "if exist CON" / checks)
and only call the "pause" command when running interactively (for example: if
"%CI%"=="" pause). Locate the lone "pause" token in the batch script and wrap it
with that conditional or an equivalent guard.

In `@WFInfo/Data.cs`:
- Around line 975-978: Compute normalized OCR input once before iterating
localized market items instead of calling
processor.NormalizeForPatternMatching(name) inside the loop; precompute a
variable (e.g., normalizedName = processor.NormalizeForPatternMatching(name))
outside the loop that contains the comparison logic and then reuse that variable
when calling processor.SimpleLevenshteinDistance(normalizedName,
normalizedStored). Update the loop that currently computes normalizedName
per-iteration (the block with processor.NormalizeForPatternMatching(split[2])
and processor.SimpleLevenshteinDistance) to only normalize the stored value
per-item and reuse the precomputed normalizedName to avoid redundant work on
this hot path.

In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs`:
- Around line 469-549: The replacement map is being rebuilt and resorted on
every call (the local variable replacements and the foreach using
replacements.OrderByDescending), causing allocations and CPU overhead; move that
large list into a static, precomputed readonly field (e.g., a static readonly
List<KeyValuePair<string,string>> PrecomputedReplacements) that is initialized
once and sorted by descending Key.Length at type initialization, then change the
method to iterate that static list (replace the use of
replacements.OrderByDescending with PrecomputedReplacements) so normalization
only does the Replace loop and no longer rebuilds or sorts the map per-call.

In `@WFInfo/LanguageProcessing/LanguageProcessor.cs`:
- Around line 254-258: The loop over blueprintRemovals can throw if the array or
an entry is null; before iterating in the LanguageProcessor method, null-check
blueprintRemovals (e.g., if null, treat as empty or return) and inside the
foreach skip null/empty entries using string.IsNullOrEmpty(removal) to continue,
then call System.Text.RegularExpressions.Regex.Escape and Regex.Replace only for
non-null strings so the replacements on s and t are safe (update the foreach
that references blueprintRemovals, s, t, and the Regex.Replace calls).

In `@WFInfo/SnapItOverlay.xaml.cs`:
- Around line 46-59: ResetRectangle currently removes the rectangle from the
canvas; instead keep the rectangle as a persistent child and only reset its
properties and hide it: leave rectangle.Width/Height, RenderTransform and
Visibility resets as-is but remove the conditional
canvas.Children.Remove(rectangle) call; then remove the re-add logic in
canvas_MouseDown (the block that checks/executes canvas.Children.Add(rectangle))
since the rectangle will always remain in the canvas, ensuring you only toggle
Visibility and update transforms when showing it.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 42db901 and 3d74a05.

📒 Files selected for processing (17)
  • KoreanProcessorTest.cs
  • WFInfo/CustomEntrypoint.cs
  • WFInfo/Data.cs
  • WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs
  • WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs
  • WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs
  • WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs
  • WFInfo/LanguageProcessing/LanguageProcessor.cs
  • WFInfo/LanguageProcessing/PolishLanguageProcessor.cs
  • WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs
  • WFInfo/Ocr.cs
  • WFInfo/SnapItOverlay.xaml.cs
  • WFInfo/Tests/KoreanProcessorTests.cs
  • WFInfo/errorDialogue.xaml.cs
  • tests/README.md
  • tests/run_tests.bat
  • tests/usage_example.md
✅ Files skipped from review due to trivial changes (1)
  • WFInfo/Tests/KoreanProcessorTests.cs
🚧 Files skipped from review as they are similar to previous changes (4)
  • WFInfo/CustomEntrypoint.cs
  • WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs
  • tests/usage_example.md
  • tests/README.md

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
WFInfo/Ocr.cs (1)

1010-1011: ⚠️ Potential issue | 🟡 Minor

numberTooFewCharacters is now dead and skews noise diagnostics.

After the refactor, filtered lines no longer increment this counter, so the noise heuristic at Line [1210] is no longer meaningful.

Suggested fix
 if (filteredWords.Count == 0)
 {
+    numberTooFewCharacters++;
     continue;
 }

Also applies to: 1105-1108, 1210-1214

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Ocr.cs` around lines 1010 - 1011, The variable numberTooFewCharacters
is no longer being incremented after the refactor, which breaks the noise
heuristic that uses it alongside numberTooLargeButEnoughCharacters; fix by
either removing numberTooFewCharacters and updating the heuristic to only use
numberTooLargeButEnoughCharacters, or (preferred) restore incrementing
numberTooFewCharacters in the same places where lines are filtered for having
too few characters so the noise-count logic (the heuristic that references
numberTooFewCharacters and numberTooLargeButEnoughCharacters) receives correct
values; update all related spots that previously incremented it (see the
filtering logic that now bypasses those increments) and adjust the heuristic
computation accordingly.
♻️ Duplicate comments (3)
tests/run_tests.bat (1)

37-45: ⚠️ Potential issue | 🟡 Minor

Initialize TIMESTAMP before PowerShell call to avoid stale inherited value.

If the environment already defines TIMESTAMP, Line [40] won’t trigger fallback on PowerShell failure, and the script can reuse an old filename suffix.

🛠️ Proposed fix
 REM Generate locale-safe timestamp with fallback
+set "TIMESTAMP="
 for /f "usebackq delims=" %%T in (`powershell -NoProfile -Command "Get-Date -Format 'yyyyMMdd_HHmmss'" 2^>nul`) do set TIMESTAMP=%%T
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/run_tests.bat` around lines 37 - 45, Initialize the TIMESTAMP variable
to an empty value before running the PowerShell for /f call so an inherited
TIMESTAMP won't prevent the fallback branch from executing; specifically, add a
statement to clear TIMESTAMP prior to the for /f `powershell -NoProfile -Command
"Get-Date ..."` invocation so the subsequent check if "%TIMESTAMP%"=="" properly
detects PowerShell failure and triggers the fallback logic that constructs
TIMESTAMP from %DATE% and %TIME%.
WFInfo/errorDialogue.xaml.cs (1)

70-77: ⚠️ Potential issue | 🟡 Minor

Use case-insensitive dedupe for ZIP root filenames.

The collision filter can miss case-variant duplicates because ToHashSet() defaults to case-sensitive comparisons. On Windows, that can still cause root-entry conflicts during extraction.

Suggested fix
- var priorityFileNames = priorityFiles.Select(Path.GetFileName).ToHashSet();
- var otherDataFileNames = otherDataFiles.Select(Path.GetFileName).ToHashSet();
+ var priorityFileNames = priorityFiles
+     .Select(Path.GetFileName)
+     .ToHashSet(StringComparer.OrdinalIgnoreCase);
+ var otherDataFileNames = otherDataFiles
+     .Select(Path.GetFileName)
+     .ToHashSet(StringComparer.OrdinalIgnoreCase);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/errorDialogue.xaml.cs` around lines 70 - 77, The current dedupe uses
ToHashSet() which is case-sensitive and can miss case-variant ZIP root
filenames; change the creation of priorityFileNames and otherDataFileNames to
use a case-insensitive comparer (e.g. StringComparer.OrdinalIgnoreCase) when
calling ToHashSet so the subsequent loop that checks
priorityFileNames.Contains(fileName) and otherDataFileNames.Contains(fileName)
correctly treats "File.txt" and "file.txt" as duplicates; update the code that
builds those sets (symbols: priorityFileNames, otherDataFileNames, files,
FileInfo, Path.GetFileName) to use the case-insensitive HashSet.
WFInfo/Ocr.cs (1)

1022-1053: ⚠️ Potential issue | 🟠 Major

Zone bitmap cleanup is partial; regular multi-zone path still leaks.

Only the fragmentation fallback disposes zone bitmaps. In normal multi-zone execution, cloned zone bitmaps are never released.

Suggested fix direction
 Task.WaitAll(snapTasks);

+foreach (var zone in zones)
+{
+    // Keep shared buffer disposal owned by existing end-of-method cleanup.
+    if (!ReferenceEquals(zone.Item1, filteredImageClean))
+    {
+        zone.Item1?.Dispose();
+    }
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Ocr.cs` around lines 1022 - 1053, The code currently only disposes
zone bitmaps in the "fragmentation fallback" branch, causing cloned bitmaps from
DivideSnapZones to leak in the normal multi-zone path; update the handling
around DivideSnapZones and the zones list so that every Bitmap stored as
zone.Item1 is explicitly disposed after its use (for example, after per-zone OCR
processing or right before replacing the zones list), and ensure you do not
dispose shared originals like filteredImageClean unintentionally; modify the
routines that iterate/process zones (the code using zones returned by
DivideSnapZones and the block that sets snapThreads) to call
zone.Item1?.Dispose() in a finally/cleanup section so all paths release cloned
Bitmaps.
🧹 Nitpick comments (4)
WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs (1)

60-76: Remove unreachable uppercase replacement mappings.

After ToLowerInvariant(), the uppercase .Replace(...) calls never execute meaningfully. This is safe to simplify.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs` around lines 60 - 76,
The code in EuropeanLanguageProcessor (the chain starting with
input.ToLowerInvariant()) performs ToLowerInvariant() then applies uppercase
replacement mappings which are unreachable; remove the redundant uppercase
.Replace calls (those replacing 'À','Á',... 'Ÿ') from the normalization chain
and keep only the lowercase diacritic replacements (and the final
semicolon/return) so Normalize/that method no longer contains unnecessary
uppercase mappings.
WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs (1)

360-363: validKoreanFragments contains unreachable entries.

The filter only consults the list for single-character words, so multi-character fragments in the array are never used.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs` around lines 360 - 363,
validKoreanFragments currently contains multi-character strings but is only
checked when word.Length == 1, making those entries unreachable; either reduce
validKoreanFragments to only single-character fragments or change the predicate
to consult fragments for multi-character words (e.g., replace the (word.Length
== 1 && !validKoreanFragments.Contains(word)) check with a check that uses
validKoreanFragments.Any(f => word.Contains(f)) or similar) so that the
multi-character entries in validKoreanFragments are actually used; update the
logic around the word variable and validKoreanFragments accordingly.
WFInfo/Data.cs (2)

962-1000: Consider lock duration for performance-sensitive scenarios.

The marketItemsLock is held while iterating through marketItems and computing Levenshtein distances (lines 963-1000). For large item databases, this could block other threads attempting to access marketItems.

If OCR processing becomes a bottleneck, consider copying the relevant data outside the lock and performing distance calculations on the copy.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Data.cs` around lines 962 - 1000, Hold the lock only to snapshot the
minimal data needed and do the heavy Levenshtein work outside it: inside the
marketItemsLock (around marketItems) build a local list of the relevant fields
(e.g., a list of tuples of split[0], split[2] for each marketItem where key !=
"version" and split.Length >= 3) and capture the
LanguageProcessorFactory.GetCurrentProcessor() reference if required, then
release the lock and iterate that local copy calling NormalizeForPatternMatching
and SimpleLevenshteinDistance to compute val and update low, lowest,
lowest_unfiltered and multipleLowest; ensure you apply the same pre-filtering
logic (lengthDiff and distance < 50% of string length) on the snapshot so
behavior remains identical.

958-1001: Inconsistent filtering logic between English and non-English branches.

The non-English branch uses different filtering criteria than the English branch:

  1. Length filter (line 978): lengthDiff > split[2].Length / 2 uses integer division and is based only on the stored string length
  2. English branch (line 1008): lengthDiff > Math.Max(prop.Key.Length, name.Length) / 2 uses the maximum of both lengths

This inconsistency could cause different matching behavior depending on locale. For short localized names (e.g., 3 characters), integer division yields a threshold of 1, which may be too restrictive.

♻️ Suggested alignment
-                            // Pre-filter: only check items with reasonable length difference (matching GetLocalizedNameData logic)
-                            int lengthDiff = Math.Abs(split[2].Length - name.Length);
-                            if (lengthDiff > split[2].Length / 2) continue;
+                            // Pre-filter: only check items with reasonable length difference
+                            int lengthDiff = Math.Abs(split[2].Length - name.Length);
+                            if (lengthDiff > Math.Max(split[2].Length, name.Length) / 2) continue;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Data.cs` around lines 958 - 1001, The non-English branch uses an
inconsistent, overly strict length filter (int division against split[2].Length)
— update the lengthDiff check inside the marketItems loop to mirror the English
logic by comparing against Math.Max(englishNameLength, name.Length)/2;
specifically compute englishNameLength from split[0].Length and use
Math.Max(split[0].Length, name.Length) / 2.0 (or equivalent) as the threshold so
the length filter and division semantics match the English branch; keep the rest
of the loop (NormalizeForPatternMatching, SimpleLevenshteinDistance, low/lowest
handling) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/run_tests.bat`:
- Around line 23-29: The TEST_IMAGES_DIR default and existence check are using
the current working directory; change the script to resolve TEST_IMAGES_DIR
relative to the script directory (%~dp0) so relative paths work when run from
elsewhere. Specifically, when TEST_IMAGES_DIR is empty, set it to the
script-dir-relative path for "data" (use the run_tests.bat script's directory
variable) and if TEST_IMAGES_DIR is provided as a relative path, prefix it with
the script directory before the existence check; keep the existence validation
using TEST_IMAGES_DIR so the subsequent echo/exit logic remains the same.

In `@WFInfo/Data.cs`:
- Around line 238-240: ReloadItems currently calls SaveAllJSONs() after updating
only marketItems, which risks overwriting other databases (equipmentData,
relicData, nameData, marketData) with null or stale state; change ReloadItems to
persist only the updated marketItems (e.g., call a new or existing
SaveMarketItems-like routine or modify SaveAllJSONs to accept a target list)
instead of SaveAllJSONs(), and ensure this call occurs after marketItems is
fully updated (and do not touch equipmentData/relicData/nameData/marketData
here); reference ReloadItems, SaveAllJSONs, marketItems and UpdateInner when
making the change.
- Line 982: The code uses processor.SimpleLevenshteinDistance(normalizedName,
normalizedStored) for non-English branches which is inconsistent with the
English branch that uses processor.CalculateLevenshteinDistance; replace the
SimpleLevenshteinDistance call with processor.CalculateLevenshteinDistance so
the same language-aware algorithm and preprocessing are used (call
CalculateLevenshteinDistance with the same inputs used in this
branch—normalizedName/normalizedStored or the raw name/stored variables to match
the English-branch usage).

In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs`:
- Around line 317-318: The global replacement entry {"버", ""} in
KoreanLanguageProcessor should not remove the syllable inside valid names;
change the logic so "버" is stripped only when it appears as a standalone OCR
artifact (e.g., an entire token) or matches an OCR pattern, not when embedded in
a larger string. Locate the replacement map/logic in KoreanLanguageProcessor
(the {"버",""} entry) and replace the unconditional replace with a conditional
rule: either apply a word-boundary/regex removal (e.g., \b버\b) or split on
tokens and drop tokens equal to "버" (and mirror this change for the similar
entries referenced at the other occurrence), so embedded syllables in real part
names are preserved.
- Around line 126-127: The Replace calls on s and t can throw if either is null;
update the code in KoreanLanguageProcessor (where s and t are processed) to
guard against null by using a null-coalescing or explicit check before calling
Replace (e.g. replace s and t with (s ?? "") and (t ?? "") or return/handle null
inputs earlier), then proceed with the existing Replace("설계도","").Replace("
","") and prefixing the leading space so the distance calculation won't crash on
null inputs.

In `@WFInfo/LanguageProcessing/LanguageProcessor.cs`:
- Around line 103-115: IsBlueprintTerm currently does case-sensitive checks and
misses mixed/upper-case OCR variants; update IsBlueprintTerm to perform
case-insensitive matching by normalizing text and each BlueprintRemovals entry
(e.g., use ToLowerInvariant or IndexOf with StringComparison.OrdinalIgnoreCase)
and then check the same patterns (contains, parentheses, starts/ends) against
the normalized text and removal; ensure you reference the BlueprintRemovals
collection and all the existing checks (Contains, StartsWith, EndsWith) so they
use the case-insensitive comparison method.

In `@WFInfo/Ocr.cs`:
- Around line 1096-1099: Null check the processor returned from
LanguageProcessorFactory.GetCurrentProcessor() before calling
processor.ShouldFilterWord(word) in the SnapIt word filtering loop; if
GetCurrentProcessor() returns null, skip filtering or use a safe fallback (e.g.,
treat all words as not filtered) so the OCR flow does not throw. Update the loop
that iterates over words to obtain the processor once, verify processor != null,
and only call ShouldFilterWord when non-null (or replace the call with a
null-safe alternative) to prevent runtime exceptions.

---

Outside diff comments:
In `@WFInfo/Ocr.cs`:
- Around line 1010-1011: The variable numberTooFewCharacters is no longer being
incremented after the refactor, which breaks the noise heuristic that uses it
alongside numberTooLargeButEnoughCharacters; fix by either removing
numberTooFewCharacters and updating the heuristic to only use
numberTooLargeButEnoughCharacters, or (preferred) restore incrementing
numberTooFewCharacters in the same places where lines are filtered for having
too few characters so the noise-count logic (the heuristic that references
numberTooFewCharacters and numberTooLargeButEnoughCharacters) receives correct
values; update all related spots that previously incremented it (see the
filtering logic that now bypasses those increments) and adjust the heuristic
computation accordingly.

---

Duplicate comments:
In `@tests/run_tests.bat`:
- Around line 37-45: Initialize the TIMESTAMP variable to an empty value before
running the PowerShell for /f call so an inherited TIMESTAMP won't prevent the
fallback branch from executing; specifically, add a statement to clear TIMESTAMP
prior to the for /f `powershell -NoProfile -Command "Get-Date ..."` invocation
so the subsequent check if "%TIMESTAMP%"=="" properly detects PowerShell failure
and triggers the fallback logic that constructs TIMESTAMP from %DATE% and
%TIME%.

In `@WFInfo/errorDialogue.xaml.cs`:
- Around line 70-77: The current dedupe uses ToHashSet() which is case-sensitive
and can miss case-variant ZIP root filenames; change the creation of
priorityFileNames and otherDataFileNames to use a case-insensitive comparer
(e.g. StringComparer.OrdinalIgnoreCase) when calling ToHashSet so the subsequent
loop that checks priorityFileNames.Contains(fileName) and
otherDataFileNames.Contains(fileName) correctly treats "File.txt" and "file.txt"
as duplicates; update the code that builds those sets (symbols:
priorityFileNames, otherDataFileNames, files, FileInfo, Path.GetFileName) to use
the case-insensitive HashSet.

In `@WFInfo/Ocr.cs`:
- Around line 1022-1053: The code currently only disposes zone bitmaps in the
"fragmentation fallback" branch, causing cloned bitmaps from DivideSnapZones to
leak in the normal multi-zone path; update the handling around DivideSnapZones
and the zones list so that every Bitmap stored as zone.Item1 is explicitly
disposed after its use (for example, after per-zone OCR processing or right
before replacing the zones list), and ensure you do not dispose shared originals
like filteredImageClean unintentionally; modify the routines that
iterate/process zones (the code using zones returned by DivideSnapZones and the
block that sets snapThreads) to call zone.Item1?.Dispose() in a finally/cleanup
section so all paths release cloned Bitmaps.

---

Nitpick comments:
In `@WFInfo/Data.cs`:
- Around line 962-1000: Hold the lock only to snapshot the minimal data needed
and do the heavy Levenshtein work outside it: inside the marketItemsLock (around
marketItems) build a local list of the relevant fields (e.g., a list of tuples
of split[0], split[2] for each marketItem where key != "version" and
split.Length >= 3) and capture the
LanguageProcessorFactory.GetCurrentProcessor() reference if required, then
release the lock and iterate that local copy calling NormalizeForPatternMatching
and SimpleLevenshteinDistance to compute val and update low, lowest,
lowest_unfiltered and multipleLowest; ensure you apply the same pre-filtering
logic (lengthDiff and distance < 50% of string length) on the snapshot so
behavior remains identical.
- Around line 958-1001: The non-English branch uses an inconsistent, overly
strict length filter (int division against split[2].Length) — update the
lengthDiff check inside the marketItems loop to mirror the English logic by
comparing against Math.Max(englishNameLength, name.Length)/2; specifically
compute englishNameLength from split[0].Length and use Math.Max(split[0].Length,
name.Length) / 2.0 (or equivalent) as the threshold so the length filter and
division semantics match the English branch; keep the rest of the loop
(NormalizeForPatternMatching, SimpleLevenshteinDistance, low/lowest handling)
unchanged.

In `@WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs`:
- Around line 60-76: The code in EuropeanLanguageProcessor (the chain starting
with input.ToLowerInvariant()) performs ToLowerInvariant() then applies
uppercase replacement mappings which are unreachable; remove the redundant
uppercase .Replace calls (those replacing 'À','Á',... 'Ÿ') from the
normalization chain and keep only the lowercase diacritic replacements (and the
final semicolon/return) so Normalize/that method no longer contains unnecessary
uppercase mappings.

In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs`:
- Around line 360-363: validKoreanFragments currently contains multi-character
strings but is only checked when word.Length == 1, making those entries
unreachable; either reduce validKoreanFragments to only single-character
fragments or change the predicate to consult fragments for multi-character words
(e.g., replace the (word.Length == 1 && !validKoreanFragments.Contains(word))
check with a check that uses validKoreanFragments.Any(f => word.Contains(f)) or
similar) so that the multi-character entries in validKoreanFragments are
actually used; update the logic around the word variable and
validKoreanFragments accordingly.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d74a05 and 983504f.

📒 Files selected for processing (9)
  • WFInfo/Data.cs
  • WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs
  • WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs
  • WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs
  • WFInfo/LanguageProcessing/LanguageProcessor.cs
  • WFInfo/Ocr.cs
  • WFInfo/SnapItOverlay.xaml.cs
  • WFInfo/errorDialogue.xaml.cs
  • tests/run_tests.bat
🚧 Files skipped from review as they are similar to previous changes (1)
  • WFInfo/SnapItOverlay.xaml.cs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
WFInfo/Ocr.cs (1)

1112-1122: ⚠️ Potential issue | 🟠 Major

Inverted null-check logic causes all words to be filtered when processor is null.

The condition processor != null && !processor.ShouldFilterWord(word) filters out all words when processor is null, because the condition is never true. This is the opposite of the intended safe fallback behavior.

When processor is null, words should be kept (not filtered), since we can't determine which to filter.

🐛 Proposed fix
                     // Filter individual words as intended
                     foreach (var word in words)
                     {
-                        if (processor != null && !processor.ShouldFilterWord(word))
+                        if (processor == null || !processor.ShouldFilterWord(word))
                         {
                             filteredWords.Add(word);
                         }
-                        else if (word.Length <= 3)
+                        else if (processor != null && word.Length <= 3)
                         {
                             numberTooFewCharacters++;
                         }
                     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Ocr.cs` around lines 1112 - 1122, The null-check on processor in the
foreach over words is inverted causing all words to be dropped when processor is
null; change the condition in the loop that currently reads "processor != null
&& !processor.ShouldFilterWord(word)" so that when processor is null words are
treated as not filtered (i.e., keep the word), by either checking "processor ==
null || !processor.ShouldFilterWord(word)" before adding to filteredWords;
update the block that increments numberTooFewCharacters (and preserve the
existing short-word logic) so the behavior of filteredWords,
numberTooFewCharacters, and the use of ShouldFilterWord are correct for both
null and non-null processor cases.
🧹 Nitpick comments (2)
WFInfo/LanguageProcessing/LanguageProcessor.cs (1)

234-264: Consider consolidating duplicate Levenshtein implementations.

SimpleLevenshteinDistance and DefaultLevenshteinDistance (lines 199-229) have identical implementations. Consider having one delegate to the other or extracting a shared private method to reduce code duplication.

♻️ Proposed refactor
         public int SimpleLevenshteinDistance(string s, string t)
         {
-            s = s.ToLower(_culture);
-            t = t.ToLower(_culture);
-            int n = s.Length;
-            int m = t.Length;
-
-            if (n == 0) return m;
-            if (m == 0) return n;
-
-            int[,] d = new int[n + 1, m + 1];
-
-            for (int i = 0; i <= n; i++)
-                d[i, 0] = i;
-
-            for (int j = 0; j <= m; j++)
-                d[0, j] = j;
-
-            for (int i = 1; i <= n; i++)
-            {
-                for (int j = 1; j <= m; j++)
-                {
-                    int cost = (s[i - 1] == t[j - 1]) ? 0 : 1;
-                    d[i, j] = Math.Min(
-                        Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
-                        d[i - 1, j - 1] + cost);
-                }
-            }
-
-            return d[n, m];
+            return DefaultLevenshteinDistance(s, t);
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/LanguageProcessor.cs` around lines 234 - 264,
SimpleLevenshteinDistance duplicates DefaultLevenshteinDistance; remove the
duplication by extracting the shared logic into a single private helper (e.g.,
ComputeLevenshtein or LevenshteinCore) or by having one method call the other
(e.g., DefaultLevenshteinDistance calls SimpleLevenshteinDistance or vice
versa). Ensure the new helper accepts the same parameters (string s, string t)
and uses the existing _culture normalization, then update both
SimpleLevenshteinDistance and DefaultLevenshteinDistance to delegate to that
helper so behavior remains identical and no business logic is changed.
WFInfo/Ocr.cs (1)

711-721: Inconsistent Levenshtein distance threshold between methods.

The floor value differs between ProcessRewardScreen (Line 260: floor of 3) and ProcessSnapIt (Line 715: floor of 6). For a 6-character string:

  • Reward screen: allows distance ≤ 3
  • SnapIt: allows distance ≤ 6

This makes SnapIt significantly more permissive for short part names. Please verify this is intentional.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Ocr.cs` around lines 711 - 721, The Levenshtein distance floor is
inconsistent between ProcessRewardScreen and ProcessSnapIt causing different
acceptance for short names; locate the check using levenDist (in ProcessSnapIt)
and the analogous check in ProcessRewardScreen and make their floor logic
consistent—either by using the same Math.Max(part.Name.Length / 2, 3) (or
whichever minimum you choose) in both methods or by extracting a shared
helper/constant (e.g., GetMaxAllowedLevenshtein or MaxLevenThreshold) and use it
when comparing levenDist and 9999 so both methods apply the same minimum floor
for short part names.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs`:
- Line 446: The mapping entry new KeyValuePair<string, string>("류", "hyu") is
incorrect; update the romanization so "류" maps to "ryu" (replace "hyu" with
"ryu" for the ("류", ...) pair) and verify the "휴" entry maps to "hyu" so "hyu"
is used for ("휴", ...); adjust the two KeyValuePair entries accordingly to
correct the romanization.

In `@WFInfo/LanguageProcessing/LanguageProcessor.cs`:
- Around line 18-43: The static field normalizedBlueprintRemovals is shared
across all concrete LanguageProcessor types causing cross-language corruption;
change the caching to be per-concrete-type (e.g., replace the static string[]
normalizedBlueprintRemovals with a static ConcurrentDictionary<Type,string[]> or
ConditionalWeakTable<Type,string[]>) and initialize the entry keyed by GetType()
inside the protected LanguageProcessor constructor using that key; ensure you
populate the per-type normalized array from the instance's BlueprintRemovals
(using ToLowerInvariant) and then read from the per-type cache wherever
normalizedBlueprintRemovals was previously used so each concrete processor
(different BlueprintRemovals) gets its own normalized copy.

In `@WFInfo/Ocr.cs`:
- Around line 1084-1095: The disposal loop currently disposes every zone bitmap
including filteredImageClean, causing a use-after-dispose in GetItemCounts() and
a double-dispose later; modify the logic in the foreach over zones so it does
not dispose filteredImageClean (either by checking reference equality to
filteredImageClean and skipping disposal, or by ensuring filteredImageClean is
removed from zones before this loop), and keep the final disposal of
filteredImageClean at its intended single disposal site (the existing disposal
at Line 1222) so GetItemCounts() can safely use filteredImageClean; references
to update: the zones collection, the foreach disposal block, the
filteredImageClean variable, and GetItemCounts().

---

Duplicate comments:
In `@WFInfo/Ocr.cs`:
- Around line 1112-1122: The null-check on processor in the foreach over words
is inverted causing all words to be dropped when processor is null; change the
condition in the loop that currently reads "processor != null &&
!processor.ShouldFilterWord(word)" so that when processor is null words are
treated as not filtered (i.e., keep the word), by either checking "processor ==
null || !processor.ShouldFilterWord(word)" before adding to filteredWords;
update the block that increments numberTooFewCharacters (and preserve the
existing short-word logic) so the behavior of filteredWords,
numberTooFewCharacters, and the use of ShouldFilterWord are correct for both
null and non-null processor cases.

---

Nitpick comments:
In `@WFInfo/LanguageProcessing/LanguageProcessor.cs`:
- Around line 234-264: SimpleLevenshteinDistance duplicates
DefaultLevenshteinDistance; remove the duplication by extracting the shared
logic into a single private helper (e.g., ComputeLevenshtein or LevenshteinCore)
or by having one method call the other (e.g., DefaultLevenshteinDistance calls
SimpleLevenshteinDistance or vice versa). Ensure the new helper accepts the same
parameters (string s, string t) and uses the existing _culture normalization,
then update both SimpleLevenshteinDistance and DefaultLevenshteinDistance to
delegate to that helper so behavior remains identical and no business logic is
changed.

In `@WFInfo/Ocr.cs`:
- Around line 711-721: The Levenshtein distance floor is inconsistent between
ProcessRewardScreen and ProcessSnapIt causing different acceptance for short
names; locate the check using levenDist (in ProcessSnapIt) and the analogous
check in ProcessRewardScreen and make their floor logic consistent—either by
using the same Math.Max(part.Name.Length / 2, 3) (or whichever minimum you
choose) in both methods or by extracting a shared helper/constant (e.g.,
GetMaxAllowedLevenshtein or MaxLevenThreshold) and use it when comparing
levenDist and 9999 so both methods apply the same minimum floor for short part
names.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 983504f and be71c1a.

📒 Files selected for processing (7)
  • WFInfo/Data.cs
  • WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs
  • WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs
  • WFInfo/LanguageProcessing/LanguageProcessor.cs
  • WFInfo/Ocr.cs
  • WFInfo/errorDialogue.xaml.cs
  • tests/run_tests.bat
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/run_tests.bat

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
WFInfo/Data.cs (1)

1063-1084: ⚠️ Potential issue | 🟡 Minor

Potential null reference if GetLocaleNameData returns null.

For non-English locales, GetLocaleNameData(name, false) could return null if no match is found. This would cause a NullReferenceException when LevenshteinDistance is called at lines 1071 and 1084.

🛡️ Proposed fix to handle null resolvedName
             // Resolve OCR text to English once before loops to avoid repeated expensive database searches
             // Only resolve for non-English locales to avoid regression in English
             string resolvedName = _settings.Locale == "en" ? name : GetLocaleNameData(name, false);
+            // Fall back to original name if resolution failed
+            if (string.IsNullOrEmpty(resolvedName))
+            {
+                resolvedName = name;
+            }
             foreach (KeyValuePair<string, JToken> prop in nameData)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Data.cs` around lines 1063 - 1084, GetLocaleNameData(name, false) can
return null for non-English locales, causing LevenshteinDistance to throw when
passed resolvedName; update the logic around resolvedName (set by
GetLocaleNameData) to guard against null by falling back to the original OCR
string (name) before using it in comparisons and LevenshteinDistance calls
(references: _settings.Locale, resolvedName, GetLocaleNameData,
LevenshteinDistance, name). Ensure resolvedName is non-null prior to the first
foreach and again before the second LevenshteinDistance loop so both usages use
the fallback if needed.
🧹 Nitpick comments (3)
WFInfo/LanguageProcessing/LanguageProcessor.cs (2)

25-28: Empty static constructor can be removed.

The static constructor at lines 25-28 is empty and serves no purpose. Static field initializers (_normalizedBlueprintRemovalsCache) run before the static constructor regardless.

♻️ Suggested removal
-        static LanguageProcessor()
-        {
-            // Will be initialized when first concrete instance is created
-        }
-
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/LanguageProcessor.cs` around lines 25 - 28, Remove
the empty static constructor from the LanguageProcessor class: delete the static
LanguageProcessor() { /* ... */ } block since it does nothing and static field
initializers like _normalizedBlueprintRemovalsCache already run before type
initialization; ensure no static initialization logic is left only in that
constructor before removal.

136-145: Minor redundancy in blueprint term checks.

The Contains($"({normalizedRemoval})") check on line 139 is redundant since Contains(normalizedRemoval) on line 138 would already match any string containing the removal term, including when wrapped in parentheses.

♻️ Simplified condition
             for (int i = 0; i < normalizedBlueprintRemovals.Length; i++)
             {
                 string normalizedRemoval = normalizedBlueprintRemovals[i];
                 if (normalizedText.Contains(normalizedRemoval) ||
-                    normalizedText.Contains($"({normalizedRemoval})") ||
                     normalizedText.StartsWith($"({normalizedRemoval}") ||
                     normalizedText.EndsWith($"{normalizedRemoval})"))
                 {
                     return true;
                 }
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/LanguageProcessor.cs` around lines 136 - 145, The
conditional contains a redundant check: remove the
Contains($"({normalizedRemoval})") clause inside the loop that iterates
normalizedBlueprintRemovals so the logic uses
normalizedText.Contains(normalizedRemoval) and the surrounding
StartsWith/EndsWith parenthesis checks only; update the conditional in the
method that references normalizedText, normalizedRemoval, and
normalizedBlueprintRemovals to eliminate the duplicate
Contains($"({normalizedRemoval})") term while keeping the other checks intact.
WFInfo/Data.cs (1)

1044-1048: Consider documenting or generalizing the Gara/Ivara workaround.

This hardcoded special case handles OCR confusion between "Gara" and "Ivara" for English. If similar OCR ambiguities exist in other languages, they won't be handled by this workaround since it's only in the English path. Consider either:

  1. Adding a comment explaining why this specific workaround exists
  2. Moving this logic into the language processor for potential extension to other locales
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Data.cs` around lines 1044 - 1048, The hardcoded Gara/Ivara workaround
inside the block that sets lowest and lowest_unfiltered (checking val == low &&
lowest.StartsWith("Gara") && prop.Key.StartsWith("Ivara")) needs clarification
and future-proofing: add a concise comment explaining the OCR ambiguity and why
this specific check exists, and refactor the conditional into a dedicated,
language-aware helper (e.g., a method on the language processor like
ResolveOcrAmbiguity or NormalizeOperatorName) so the logic can be extended for
other locales; update usages to call that helper instead of keeping the
special-case inline.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/run_tests.bat`:
- Around line 26-30: The findstr regex currently used to detect absolute Windows
paths fails because echo outputs the value in quotes; update the pattern used
when testing %TEST_IMAGES_DIR% so it accounts for the leading quote (e.g., match
"^\".:\\") and thereby correctly detect quoted absolute paths like "C:\...";
modify the findstr invocation that uses the pattern ^.\:\\.* to use a pattern
that begins with a quote before the drive letter and backslash so the branch
that prefixes %SCRIPT_DIR% runs only for relative paths.

---

Outside diff comments:
In `@WFInfo/Data.cs`:
- Around line 1063-1084: GetLocaleNameData(name, false) can return null for
non-English locales, causing LevenshteinDistance to throw when passed
resolvedName; update the logic around resolvedName (set by GetLocaleNameData) to
guard against null by falling back to the original OCR string (name) before
using it in comparisons and LevenshteinDistance calls (references:
_settings.Locale, resolvedName, GetLocaleNameData, LevenshteinDistance, name).
Ensure resolvedName is non-null prior to the first foreach and again before the
second LevenshteinDistance loop so both usages use the fallback if needed.

---

Nitpick comments:
In `@WFInfo/Data.cs`:
- Around line 1044-1048: The hardcoded Gara/Ivara workaround inside the block
that sets lowest and lowest_unfiltered (checking val == low &&
lowest.StartsWith("Gara") && prop.Key.StartsWith("Ivara")) needs clarification
and future-proofing: add a concise comment explaining the OCR ambiguity and why
this specific check exists, and refactor the conditional into a dedicated,
language-aware helper (e.g., a method on the language processor like
ResolveOcrAmbiguity or NormalizeOperatorName) so the logic can be extended for
other locales; update usages to call that helper instead of keeping the
special-case inline.

In `@WFInfo/LanguageProcessing/LanguageProcessor.cs`:
- Around line 25-28: Remove the empty static constructor from the
LanguageProcessor class: delete the static LanguageProcessor() { /* ... */ }
block since it does nothing and static field initializers like
_normalizedBlueprintRemovalsCache already run before type initialization; ensure
no static initialization logic is left only in that constructor before removal.
- Around line 136-145: The conditional contains a redundant check: remove the
Contains($"({normalizedRemoval})") clause inside the loop that iterates
normalizedBlueprintRemovals so the logic uses
normalizedText.Contains(normalizedRemoval) and the surrounding
StartsWith/EndsWith parenthesis checks only; update the conditional in the
method that references normalizedText, normalizedRemoval, and
normalizedBlueprintRemovals to eliminate the duplicate
Contains($"({normalizedRemoval})") term while keeping the other checks intact.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 983504f and 804402e.

📒 Files selected for processing (7)
  • WFInfo/Data.cs
  • WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs
  • WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs
  • WFInfo/LanguageProcessing/LanguageProcessor.cs
  • WFInfo/Ocr.cs
  • WFInfo/errorDialogue.xaml.cs
  • tests/run_tests.bat
🚧 Files skipped from review as they are similar to previous changes (1)
  • WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (5)
WFInfo/Main.cs (1)

265-266: Good defensive null-check, but inconsistent with other dispatcher usages.

The null-check prevents NullReferenceException in headless/test scenarios. However, similar dispatcher calls elsewhere in this file lack the same protection:

  • Line 98: UpdateMarketStatusAsync - MainWindow.INSTANCE.Dispatcher.InvokeAsync(...)
  • Line 545: LoggedIn() - MainWindow.INSTANCE.Dispatcher.Invoke(...)
  • Line 615: SignOut() - MainWindow.INSTANCE.Dispatcher.Invoke(...)

Consider applying the same pattern consistently to avoid NullReferenceException in those code paths during headless execution.

♻️ Suggested fix for UpdateMarketStatusAsync (Line 98)
         // Use async UI dispatcher call
-        await MainWindow.INSTANCE.Dispatcher.InvokeAsync(() =>
+        if (MainWindow.INSTANCE?.Dispatcher != null)
         {
-            MainWindow.INSTANCE.UpdateMarketStatus(msg);
-        });
+            await MainWindow.INSTANCE.Dispatcher.InvokeAsync(() =>
+            {
+                MainWindow.INSTANCE.UpdateMarketStatus(msg);
+            });
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Main.cs` around lines 265 - 266, Several Dispatcher.Invoke/InvokeAsync
calls lack the defensive null-check used elsewhere and can throw in
headless/test runs; update the calls in UpdateMarketStatusAsync, LoggedIn, and
SignOut to use the same pattern as the existing safe call around
MainWindow.INSTANCE.Dispatcher (e.g., check MainWindow.INSTANCE?.Dispatcher !=
null or use the safe navigation to the Dispatcher before calling
Invoke/InvokeAsync) so each call only executes when the Dispatcher is present;
locate the calls by the method names UpdateMarketStatusAsync, LoggedIn, and
SignOut and apply the null-guard consistently.
WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs (1)

52-80: Consider extracting shared logic to reduce duplication.

ShouldFilterWord implementations are nearly identical between SimplifiedChineseLanguageProcessor and TraditionalChineseLanguageProcessor. Consider extracting to a shared base class or helper method.

♻️ Optional refactor to reduce duplication
+    /// <summary>
+    /// Base class for Chinese language processors with shared logic
+    /// </summary>
+    public abstract class ChineseLanguageProcessorBase : LanguageProcessor
+    {
+        protected ChineseLanguageProcessorBase(IReadOnlyApplicationSettings settings) : base(settings) { }
+        
+        public override bool ShouldFilterWord(string word)
+        {
+            if (string.IsNullOrEmpty(word)) return true;
+            
+            bool hasCJK = ContainsCJK(word);
+            bool hasLatin = word.Any(c => (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z'));
+            
+            if (hasCJK && !hasLatin) return false;
+            if (hasLatin && !hasCJK) return word.Length <= 2;
+            if (hasCJK && hasLatin && word.Length <= 2) return true;
+            
+            return false;
+        }
+        
+        public static bool ContainsCJK(string text) { /* existing impl */ }
+    }

Also applies to: 150-178

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs` around lines 52 - 80,
The ShouldFilterWord logic is duplicated across
SimplifiedChineseLanguageProcessor and TraditionalChineseLanguageProcessor;
extract the shared logic into a single protected method (e.g., a protected bool
FilterWordCore(string word) or into a static helper class) and have both
ShouldFilterWord implementations delegate to it; reuse the existing ContainsCJK
helper from ChineseLanguageProcessor and preserve the exact rule checks for pure
CJK, pure Latin, and mixed cases so behavior remains unchanged while removing
duplicate code.
tests/README.md (1)

17-27: Add language specifier to fenced code block.

The directory structure code block lacks a language specifier. While this doesn't affect functionality, adding one improves syntax highlighting and satisfies linting.

📝 Suggested fix
-```
+```text
 tests/
 ├── map.json           # Lists scenarios to run
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/README.md` around lines 17 - 27, The fenced code block in
tests/README.md presenting the directory tree is missing a language specifier;
update the opening fence (the line with ```) to include a language token such as
text or console (e.g., change ``` to ```text) so the block is labeled for syntax
highlighting; look for the directory-tree fenced block (the one containing
"tests/ ├── map.json ...") and add the language specifier to the opening fence.
WFInfo/Services/TesseractService.cs (1)

226-268: Consider defensive handling for unsupported locales in checksum lookup.

Line 255 calls traineddata_checksums.GetValue(Locale).ToObject<string>() which will throw if the locale is not in the checksums dictionary. While the UI dropdown gates valid locale selection, a defensive check would improve robustness for edge cases (manual config edits, future locale additions).

🛡️ Optional defensive guard
+            var checksumToken = traineddata_checksums.GetValue(Locale);
+            if (checksumToken == null)
+            {
+                Main.AddLog($"No traineddata checksum configured for locale '{Locale}'. Skipping download.");
+                return;
+            }
+
-            if (!File.Exists(app_data_traineddata_path) || CustomEntrypoint.GetMD5hash(app_data_traineddata_path) != traineddata_checksums.GetValue(Locale).ToObject<string>())
+            if (!File.Exists(app_data_traineddata_path) || CustomEntrypoint.GetMD5hash(app_data_traineddata_path) != checksumToken.ToObject<string>())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Services/TesseractService.cs` around lines 226 - 268, The
getLocaleTessdata method assumes traineddata_checksums contains Locale and calls
traineddata_checksums.GetValue(Locale).ToObject<string>(), which will throw for
unsupported locales; change it to first check for the key (e.g., use
traineddata_checksums.TryGetValue / traineddata_checksums[Locale] != null or
JToken token = traineddata_checksums.GetValue(Locale)) and only compute
MD5/compare if the token exists, otherwise skip download (or log a warning) so
unsupported/unknown Locale values are handled defensively; reference
getLocaleTessdata, traineddata_checksums, Locale, and the MD5 comparison block
to apply this guard.
WFInfo/Ocr.cs (1)

1029-1165: Consider removing or wiring the new OCR scoring helpers.

AnalyzeOCRQuality, SelectBestPSM, and CalculateVariance are currently unused, which adds maintenance surface without runtime benefit.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Ocr.cs` around lines 1029 - 1165, These three helpers
(AnalyzeOCRQuality, SelectBestPSM, CalculateVariance) are unused and should
either be removed or wired into the OCR flow: either delete them to avoid dead
code, or call them from the PSM selection path by (1) collecting modeResults
(Dictionary<PageSegMode, List<Tuple<string, Rectangle>>>) after running OCR for
each PageSegMode, (2) computing modeScores by calling
AnalyzeOCRQuality(textLines, mode, imageWidth, imageHeight) for each mode, and
(3) replacing the current PSM pick with SelectBestPSM(modeResults, modeScores);
keep CalculateVariance as-is because AnalyzeOCRQuality uses it, and add or
update unit tests for the new selection behavior if you choose to wire them in.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@WFInfo/Data.cs`:
- Around line 983-1009: The current non-English branch assumes split[2] is
localized to _settings.Locale but may be stale if cached marketItems were built
under a different locale; modify the logic in the block that builds
marketItemsSnapshot (inside the lock using marketItems, marketItemsLock,
processor.NormalizeForPatternMatching, and split[]) to first detect the cache
locale/version (e.g., a metadata entry in marketItems like "locale" or a version
string) and if it does not match _settings.Locale either trigger/flag a reload
of market_items.json or skip using split[2] for localized matching and fall back
to English split[0]; ensure the code uses the detected metadata to decide
whether to use processor-normalized split[2] or to use split[0] so matching
remains correct until a fresh locale-specific cache is loaded.
- Around line 443-445: Before writing the body to the fallback cache, validate
the deserialized payload (the JObject returned by
JsonConvert.DeserializeObject<JObject>(body)) to ensure it matches the expected
shape (non-null and contains required keys/structure for an item payload); only
call File.WriteAllText(localeSpecificFallbackPath, body) when the validation
passes; if validation fails, do not overwrite the existing fallback file—return
without changing the cache or return the previous cached data—so update the code
around the variables data, body, and localeSpecificFallbackPath to perform this
check and skip the write when invalid.

In `@WFInfo/Ocr.cs`:
- Around line 1022-1025: The catch block in WFInfo/Ocr.cs that currently
swallows exceptions must be changed to capture and log the exception instead of
silently returning empty results: change the bare catch to catch(Exception ex)
and use the module's existing logging facility (e.g., _logger.Error /
Logger.Error / Trace.WriteLine) to emit a descriptive message including
ex.Message and ex.ToString() and the context (method name where OCR extraction
occurs), then either rethrow or return a clearly documented failure result;
update the OCR extraction method (the try/catch around the OCR extraction code)
to preserve telemetry for debugging.

In `@WFInfo/Services/TesseractService.cs`:
- Around line 120-186: The CreateEngine method sets an unsupported Tesseract
variable "smooth_scaling_factor" in two places (CJK branch and English branch);
remove both engine.SetVariable("smooth_scaling_factor", "...") calls (or replace
them with a supported parameter if smoothing is required) and keep all other
SetVariable calls unchanged so the engine no longer attempts to set an
unrecognized option.

In `@WFInfo/Tests/OCRTestRunner.cs`:
- Around line 230-253: CompareResults uses HashSet-based membership which drops
multiplicity and never treats ExtraParts as a failure; update it to perform
multiset-aware comparisons by counting occurrences from result.ExpectedParts and
result.ActualParts (e.g., dictionaries/counters) so duplicates are detected as
missing or extra items, populate result.MissingParts and result.ExtraParts based
on differing counts rather than set membership, compute matched by summing
min(expectedCount, actualCount) across keys to derive AccuracyScore, and make
result.Success require both MissingParts.Count == 0 and ExtraParts.Count == 0
and no ErrorMessage.
- Around line 69-72: Resolve scenarioPath against a canonical testMapDir and
reject any resulting path that escapes that base: compute string baseDir =
Path.GetFullPath(testMapDir), then build jsonFull =
Path.GetFullPath(Path.Combine(baseDir, scenarioPath + ".json")) and imageFull =
Path.GetFullPath(Path.Combine(baseDir, scenarioPath + ".png")); verify each path
is either equal to baseDir or starts with baseDir + Path.DirectorySeparatorChar
(use case-insensitive comparison on Windows) and throw an exception or fail the
test if the check fails; then assign jsonPath = jsonFull, imagePath = imageFull
and keep testName = Path.GetFileName(scenarioPath).

---

Nitpick comments:
In `@tests/README.md`:
- Around line 17-27: The fenced code block in tests/README.md presenting the
directory tree is missing a language specifier; update the opening fence (the
line with ```) to include a language token such as text or console (e.g., change
``` to ```text) so the block is labeled for syntax highlighting; look for the
directory-tree fenced block (the one containing "tests/ ├── map.json ...") and
add the language specifier to the opening fence.

In `@WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs`:
- Around line 52-80: The ShouldFilterWord logic is duplicated across
SimplifiedChineseLanguageProcessor and TraditionalChineseLanguageProcessor;
extract the shared logic into a single protected method (e.g., a protected bool
FilterWordCore(string word) or into a static helper class) and have both
ShouldFilterWord implementations delegate to it; reuse the existing ContainsCJK
helper from ChineseLanguageProcessor and preserve the exact rule checks for pure
CJK, pure Latin, and mixed cases so behavior remains unchanged while removing
duplicate code.

In `@WFInfo/Main.cs`:
- Around line 265-266: Several Dispatcher.Invoke/InvokeAsync calls lack the
defensive null-check used elsewhere and can throw in headless/test runs; update
the calls in UpdateMarketStatusAsync, LoggedIn, and SignOut to use the same
pattern as the existing safe call around MainWindow.INSTANCE.Dispatcher (e.g.,
check MainWindow.INSTANCE?.Dispatcher != null or use the safe navigation to the
Dispatcher before calling Invoke/InvokeAsync) so each call only executes when
the Dispatcher is present; locate the calls by the method names
UpdateMarketStatusAsync, LoggedIn, and SignOut and apply the null-guard
consistently.

In `@WFInfo/Ocr.cs`:
- Around line 1029-1165: These three helpers (AnalyzeOCRQuality, SelectBestPSM,
CalculateVariance) are unused and should either be removed or wired into the OCR
flow: either delete them to avoid dead code, or call them from the PSM selection
path by (1) collecting modeResults (Dictionary<PageSegMode, List<Tuple<string,
Rectangle>>>) after running OCR for each PageSegMode, (2) computing modeScores
by calling AnalyzeOCRQuality(textLines, mode, imageWidth, imageHeight) for each
mode, and (3) replacing the current PSM pick with SelectBestPSM(modeResults,
modeScores); keep CalculateVariance as-is because AnalyzeOCRQuality uses it, and
add or update unit tests for the new selection behavior if you choose to wire
them in.

In `@WFInfo/Services/TesseractService.cs`:
- Around line 226-268: The getLocaleTessdata method assumes
traineddata_checksums contains Locale and calls
traineddata_checksums.GetValue(Locale).ToObject<string>(), which will throw for
unsupported locales; change it to first check for the key (e.g., use
traineddata_checksums.TryGetValue / traineddata_checksums[Locale] != null or
JToken token = traineddata_checksums.GetValue(Locale)) and only compute
MD5/compare if the token exists, otherwise skip download (or log a warning) so
unsupported/unknown Locale values are handled defensively; reference
getLocaleTessdata, traineddata_checksums, Locale, and the MD5 comparison block
to apply this guard.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 983504f and 6cb476c.

📒 Files selected for processing (15)
  • WFInfo/CustomEntrypoint.cs
  • WFInfo/Data.cs
  • WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs
  • WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs
  • WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs
  • WFInfo/LanguageProcessing/LanguageProcessor.cs
  • WFInfo/Main.cs
  • WFInfo/Ocr.cs
  • WFInfo/Services/TesseractService.cs
  • WFInfo/Tests/OCRTestRunner.cs
  • WFInfo/Tests/TestModels.cs
  • WFInfo/Tests/TestProgram.cs
  • WFInfo/errorDialogue.xaml.cs
  • tests/README.md
  • tests/run_tests.bat
🚧 Files skipped from review as they are similar to previous changes (3)
  • WFInfo/CustomEntrypoint.cs
  • WFInfo/errorDialogue.xaml.cs
  • WFInfo/Tests/TestModels.cs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (6)
WFInfo/Main.cs (1)

266-267: Consider centralizing dispatcher invocation logic.

The same guard+invoke pattern is repeated across multiple methods. A shared helper would reduce duplication and keep behavior consistent.

♻️ Proposed refactor
+        private static bool TryInvokeMainWindow(Action<MainWindow> action)
+        {
+            var window = MainWindow.INSTANCE;
+            if (window?.Dispatcher == null) return false;
+            window.Dispatcher.Invoke(() => action(window));
+            return true;
+        }

         public static void RunOnUIThread(Action act)
         {
-            if (MainWindow.INSTANCE?.Dispatcher != null)
-                MainWindow.INSTANCE.Dispatcher.Invoke(act);
+            _ = TryInvokeMainWindow(_ => act());
         }

         public static void StatusUpdate(string message, int severity)
         {
-            if (MainWindow.INSTANCE?.Dispatcher != null)
-                MainWindow.INSTANCE.Dispatcher.Invoke(() => { MainWindow.INSTANCE.ChangeStatus(message, severity); });
+            _ = TryInvokeMainWindow(window => window.ChangeStatus(message, severity));
         }

         public void LoggedIn()
         {
-            if (MainWindow.INSTANCE?.Dispatcher != null)
-                MainWindow.INSTANCE.Dispatcher.Invoke(() => { MainWindow.INSTANCE.LoggedIn(); });
+            _ = TryInvokeMainWindow(window => window.LoggedIn());
             // start the AFK timer...
         }

         public static void SignOut()
         {
-            if (MainWindow.INSTANCE?.Dispatcher != null)
-                MainWindow.INSTANCE.Dispatcher.Invoke(() => { MainWindow.INSTANCE.SignOut(); });
+            _ = TryInvokeMainWindow(window => window.SignOut());
         }

Also applies to: 303-304, 546-547, 617-618

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Main.cs` around lines 266 - 267, Create a single helper to centralize
the guard+invoke pattern (e.g., a static method InvokeOnUI(Action act) placed on
MainWindow or a UI helper class) and replace all occurrences of "if
(MainWindow.INSTANCE?.Dispatcher != null)
MainWindow.INSTANCE.Dispatcher.Invoke(act);" with that helper; the helper should
check MainWindow.INSTANCE and its Dispatcher, and call Dispatcher.Invoke (or
BeginInvoke if non-blocking behavior is preferred) to preserve current semantics
so all call sites like the current uses of
MainWindow.INSTANCE.Dispatcher.Invoke(act) use the consistent centralized logic.
WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs (1)

419-423: Move replacement table to static precomputed data.

Line [419]-Line [483] rebuilds and re-sorts a large mapping on every call to NormalizeKoreanCharacters. This is avoidable overhead in a hot path.

♻️ Refactor direction
+        private static readonly KeyValuePair<string, string>[] KoreanReplacements =
+            BuildKoreanReplacements()
+                .OrderByDescending(r => r.Key.Length)
+                .ToArray();
@@
-            var replacements = new List<KeyValuePair<string, string>>
-            {
-                // huge mapping...
-            };
-            
             string result = input;
-            foreach (var replacement in replacements.OrderByDescending(r => r.Key.Length))
+            foreach (var replacement in KoreanReplacements)
             {
                 result = result.Replace(replacement.Key, replacement.Value);
             }

Also applies to: 480-483

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs` around lines 419 - 423,
The replacements list is being rebuilt and sorted on every call to
NormalizeKoreanCharacters causing unnecessary allocation and CPU work; extract
the mapping into a static readonly precomputed field (e.g., a static readonly
List<KeyValuePair<string,string>> or an immutable array) and initialize and sort
it once (static constructor or inline static initializer), then have
NormalizeKoreanCharacters reference that static replacements collection instead
of creating a local one; update any other local usages (the variable named
replacements around lines 419 and similar block at 480-483) to use the static
field.
WFInfo/Data.cs (1)

989-993: Drop the unused locale-resolution call in the non-English path.

Line [991]-Line [993] computes resolvedName via GetLocaleNameData(...) for non-English locales, but that value is not used in the non-English branch (Line [995]+). This adds unnecessary lookup/lock overhead in a hot path.

♻️ Proposed cleanup
-            // Resolve OCR text to English once before loops to avoid repeated expensive database searches
-            // Only resolve for non-English locales to avoid regression in English
-            string resolvedName = _settings.Locale == "en" ? name : GetLocaleNameData(name, false);
-            resolvedName = resolvedName ?? name; // Fallback to original OCR string if resolution fails
+            // Keep the original OCR text for direct matching paths.

If you still want resolvedName, compute it only inside the English branch where it is actually used.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Data.cs` around lines 989 - 993, The code performs a locale-resolution
call into GetLocaleNameData for non-English paths but never uses the result;
remove that unnecessary lookup by only assigning resolvedName (with fallback to
name) inside the English branch where resolvedName is actually consumed, keep
the existing conditional check on _settings.Locale and preserve the fallback
behavior (resolvedName = name) and null-safety around GetLocaleNameData when
computed.
WFInfo/Ocr.cs (1)

779-812: Consider consolidating repetitive try-catch blocks.

The three consecutive try-catch blocks for IsPartVaulted, IsPartMastered, and PartsOwned follow an identical pattern. While functionally correct, this could be refactored to reduce duplication.

♻️ Optional: Extract helper method
private static T SafeCall<T>(Func<T> func, T defaultValue, string operationName, string itemName)
{
    try
    {
        return func();
    }
    catch (Exception ex)
    {
        Main.AddLog($"ERROR: {operationName} failed for '{itemName}': {ex.Message}");
        return defaultValue;
    }
}

// Usage:
bool vaulted = SafeCall(() => Main.dataBase.IsPartVaulted(name), false, "IsPartVaulted", name);
bool mastered = SafeCall(() => Main.dataBase.IsPartMastered(name), false, "IsPartMastered", name);
string partsOwned = SafeCall(() => Main.dataBase.PartsOwned(name), "0", "PartsOwned", name);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Ocr.cs` around lines 779 - 812, Replace the three repetitive try/catch
blocks around Main.dataBase.IsPartVaulted(name),
Main.dataBase.IsPartMastered(name) and Main.dataBase.PartsOwned(name) with a
single reusable helper (eg. SafeCall<T>) that accepts a Func<T>, a default
value, an operation name and the item name; have SafeCall invoke the func, catch
Exception, call Main.AddLog($"ERROR: {operationName} failed for '{itemName}':
{ex.Message}") and return the default on error, then call SafeCall for vaulted,
mastered and partsOwned (default false/false/"0") to remove duplication and keep
the same logging/behavior.
WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs (1)

28-44: Consider extracting shared normalization logic to reduce duplication.

NormalizeForPatternMatching is nearly identical in both SimplifiedChineseLanguageProcessor and TraditionalChineseLanguageProcessor. Similarly, IsPartNameValid and NormalizeChineseCharacters are duplicated.

♻️ Optional: Create shared base class
public abstract class ChineseLanguageProcessorBase : LanguageProcessor
{
    protected ChineseLanguageProcessorBase(IReadOnlyApplicationSettings settings) : base(settings) { }

    public override string CharacterWhitelist => 
        GenerateCharacterRange(0x4E00, 0x9FFF) + 
        GenerateCharacterRange(0x3400, 0x4DBF) + 
        GenerateCharacterRange(0xF900, 0xFAFF) + 
        "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz ";

    public override string NormalizeForPatternMatching(string input)
    {
        if (string.IsNullOrEmpty(input)) return input;
        string normalized = input.ToLower(_culture).Trim();
        normalized = normalized.Replace("prime", " prime ");
        normalized = RemoveAccents(normalized);
        var parts = normalized.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
        return string.Join(" ", parts);
    }

    public override bool IsPartNameValid(string partName) =>
        !string.IsNullOrEmpty(partName) && partName.Replace(" ", "").Length >= 4;

    public override bool ShouldFilterWord(string word) =>
        SimplifiedChineseLanguageProcessor.FilterWordCore(word);

    protected static string NormalizeChineseCharacters(string input) =>
        NormalizeFullWidthCharacters(input).ToLowerInvariant();
}

Also applies to: 134-150

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs` around lines 28 - 44,
The SimplifiedChineseLanguageProcessor and TraditionalChineseLanguageProcessor
duplicate NormalizeForPatternMatching, IsPartNameValid, and
NormalizeChineseCharacters; extract these shared behaviors into a new abstract
base class (e.g., ChineseLanguageProcessorBase : LanguageProcessor) that
provides a constructor accepting IReadOnlyApplicationSettings, implements
CharacterWhitelist, NormalizeForPatternMatching, IsPartNameValid,
ShouldFilterWord (delegating to any existing core filter method like
FilterWordCore), and a protected NormalizeChineseCharacters helper; then have
both SimplifiedChineseLanguageProcessor and TraditionalChineseLanguageProcessor
inherit from ChineseLanguageProcessorBase and remove the duplicated
implementations so they only contain language-specific overrides or additions.
WFInfo/LanguageProcessing/LanguageProcessor.cs (1)

336-344: Consider potential memory allocation for large Unicode ranges.

GenerateCharacterRange allocates a char[] for the entire range. For very large ranges (e.g., CJK blocks with ~20,000+ characters), this creates significant memory pressure. Since whitelists are typically created once at startup, this is acceptable but worth noting.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/LanguageProcessor.cs` around lines 336 - 344,
GenerateCharacterRange currently allocates a char[] for the entire Unicode range
which can spike memory for very large ranges; replace or augment it by either
(A) adding a sanity check constant (e.g., MaxGeneratedRangeSize) and throw
ArgumentOutOfRangeException when (end - start + 1) exceeds that limit to prevent
accidental huge allocations in GenerateCharacterRange, or (B) add a streaming
alternative method GenerateCharacterRangeIterator(int start, int end) that
yields chars with yield return (no large array allocation) and update callers to
use the iterator for large ranges; reference the existing GenerateCharacterRange
method name when making the change so callers can be located and migrated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@WFInfo/Data.cs`:
- Around line 444-457: The code validates the JSON payload but still returns
(data, false) on invalid shape, which lets malformed data reach ReloadItems;
instead, after logging the invalid payload (use Main.AddLog and mention
wfmItemsUrl and localeSpecificFallbackPath), throw an exception (e.g.,
InvalidDataException or HttpRequestException) so the outer catch path will run
and the fallback file will be used, or alternatively return a clearly failing
sentinel (such as null with a flag that the caller interprets as error) — update
the return behavior in this method to not return the malformed data but to
trigger the fallback handling in ReloadItems.

In `@WFInfo/Tests/OCRTestRunner.cs`:
- Around line 186-198: The OCR engines are only reinitialized when
localeChanged, so changes to HDR (testCase.HDR) for the same locale leave the
previous HeadlessHDRDetector in place; update the condition to also trigger
reinitialization when the HDR setting changed (e.g., compute hdrChanged by
comparing the new HDR value against the currently active detector/settings) and
call OCR.InitForTest(...) with new HeadlessHDRDetector(testCase.HDR) when
localeChanged || hdrChanged, and still call
Main.dataBase.ReloadItems().GetAwaiter().GetResult() after reinitialization.

In `@WFInfo/Tests/TestProgram.cs`:
- Around line 20-24: The Main invocation path currently prints usage and returns
as success; update the code in TestProgram.cs (within the Main method around the
args.Length check that calls PrintUsage()) to set a non-zero exit code before
returning—e.g., assign Environment.ExitCode = 1 or call Environment.Exit(1)—so
invalid invocations are reported as failure instead of success.

---

Nitpick comments:
In `@WFInfo/Data.cs`:
- Around line 989-993: The code performs a locale-resolution call into
GetLocaleNameData for non-English paths but never uses the result; remove that
unnecessary lookup by only assigning resolvedName (with fallback to name) inside
the English branch where resolvedName is actually consumed, keep the existing
conditional check on _settings.Locale and preserve the fallback behavior
(resolvedName = name) and null-safety around GetLocaleNameData when computed.

In `@WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs`:
- Around line 28-44: The SimplifiedChineseLanguageProcessor and
TraditionalChineseLanguageProcessor duplicate NormalizeForPatternMatching,
IsPartNameValid, and NormalizeChineseCharacters; extract these shared behaviors
into a new abstract base class (e.g., ChineseLanguageProcessorBase :
LanguageProcessor) that provides a constructor accepting
IReadOnlyApplicationSettings, implements CharacterWhitelist,
NormalizeForPatternMatching, IsPartNameValid, ShouldFilterWord (delegating to
any existing core filter method like FilterWordCore), and a protected
NormalizeChineseCharacters helper; then have both
SimplifiedChineseLanguageProcessor and TraditionalChineseLanguageProcessor
inherit from ChineseLanguageProcessorBase and remove the duplicated
implementations so they only contain language-specific overrides or additions.

In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs`:
- Around line 419-423: The replacements list is being rebuilt and sorted on
every call to NormalizeKoreanCharacters causing unnecessary allocation and CPU
work; extract the mapping into a static readonly precomputed field (e.g., a
static readonly List<KeyValuePair<string,string>> or an immutable array) and
initialize and sort it once (static constructor or inline static initializer),
then have NormalizeKoreanCharacters reference that static replacements
collection instead of creating a local one; update any other local usages (the
variable named replacements around lines 419 and similar block at 480-483) to
use the static field.

In `@WFInfo/LanguageProcessing/LanguageProcessor.cs`:
- Around line 336-344: GenerateCharacterRange currently allocates a char[] for
the entire Unicode range which can spike memory for very large ranges; replace
or augment it by either (A) adding a sanity check constant (e.g.,
MaxGeneratedRangeSize) and throw ArgumentOutOfRangeException when (end - start +
1) exceeds that limit to prevent accidental huge allocations in
GenerateCharacterRange, or (B) add a streaming alternative method
GenerateCharacterRangeIterator(int start, int end) that yields chars with yield
return (no large array allocation) and update callers to use the iterator for
large ranges; reference the existing GenerateCharacterRange method name when
making the change so callers can be located and migrated.

In `@WFInfo/Main.cs`:
- Around line 266-267: Create a single helper to centralize the guard+invoke
pattern (e.g., a static method InvokeOnUI(Action act) placed on MainWindow or a
UI helper class) and replace all occurrences of "if
(MainWindow.INSTANCE?.Dispatcher != null)
MainWindow.INSTANCE.Dispatcher.Invoke(act);" with that helper; the helper should
check MainWindow.INSTANCE and its Dispatcher, and call Dispatcher.Invoke (or
BeginInvoke if non-blocking behavior is preferred) to preserve current semantics
so all call sites like the current uses of
MainWindow.INSTANCE.Dispatcher.Invoke(act) use the consistent centralized logic.

In `@WFInfo/Ocr.cs`:
- Around line 779-812: Replace the three repetitive try/catch blocks around
Main.dataBase.IsPartVaulted(name), Main.dataBase.IsPartMastered(name) and
Main.dataBase.PartsOwned(name) with a single reusable helper (eg. SafeCall<T>)
that accepts a Func<T>, a default value, an operation name and the item name;
have SafeCall invoke the func, catch Exception, call Main.AddLog($"ERROR:
{operationName} failed for '{itemName}': {ex.Message}") and return the default
on error, then call SafeCall for vaulted, mastered and partsOwned (default
false/false/"0") to remove duplication and keep the same logging/behavior.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 983504f and 63ceea0.

📒 Files selected for processing (15)
  • WFInfo/CustomEntrypoint.cs
  • WFInfo/Data.cs
  • WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs
  • WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs
  • WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs
  • WFInfo/LanguageProcessing/LanguageProcessor.cs
  • WFInfo/Main.cs
  • WFInfo/Ocr.cs
  • WFInfo/Services/TesseractService.cs
  • WFInfo/Tests/OCRTestRunner.cs
  • WFInfo/Tests/TestModels.cs
  • WFInfo/Tests/TestProgram.cs
  • WFInfo/errorDialogue.xaml.cs
  • tests/README.md
  • tests/run_tests.bat
🚧 Files skipped from review as they are similar to previous changes (4)
  • WFInfo/errorDialogue.xaml.cs
  • tests/run_tests.bat
  • WFInfo/Tests/TestModels.cs
  • WFInfo/CustomEntrypoint.cs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (5)
WFInfo/Tests/OCRTestRunner.cs (1)

189-202: HDR change now triggers OCR reinitialization, but consider conditional data reload.

The implementation now correctly reinitializes OCR engines when either locale or HDR changes, addressing the prior concern. However, ReloadItems() is always called when reinitialization occurs, even if only HDR changed. Per the earlier suggestion, data reload may only be necessary for locale changes.

⚡ Optimize to skip data reload when only HDR changed
             if (localeChanged || hdrChanged)
             {
                 string reason = localeChanged ? $"Locale changed to '{newLocale}'" : $"HDR changed to '{testCase.HDR}'";
                 Main.AddLog($"  {reason}, reinitializing OCR engines...");
                 OCR.InitForTest(
                     new TesseractService(),
                     ApplicationSettings.GlobalReadonlySettings,
                     _windowService,
                     new HeadlessHDRDetector(testCase.HDR));
 
                 // Also re-update Data so Levenshtein uses the right locale for matching
-                Main.dataBase.ReloadItems().GetAwaiter().GetResult();
+                if (localeChanged)
+                    Main.dataBase.ReloadItems().GetAwaiter().GetResult();
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Tests/OCRTestRunner.cs` around lines 189 - 202, The code currently
calls Main.dataBase.ReloadItems() whenever localeChanged || hdrChanged triggers
OCR.InitForTest; change this so ReloadItems() is only invoked when localeChanged
is true (HDR-only changes should skip data reload). Update the conditional
around Main.dataBase.ReloadItems().GetAwaiter().GetResult() (within the block
that handles localeChanged || hdrChanged, near OCR.InitForTest and
HeadlessHDRDetector) to run the reload only if localeChanged, leaving
OCR.InitForTest to run for both locale and HDR changes.
WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs (1)

23-81: Duplicate romanization entries are technically harmless but wasteful.

Several Korean syllables appear twice in koreanReplacements (e.g., "궈" at lines 27, "놰" at line 31, "돼" at line 35, "퀘" at line 67). While the List<KeyValuePair> approach correctly avoids the runtime crash from the previous Dictionary implementation, the duplicate entries are redundant since only the first match applies during iteration.

🧹 Consider deduplicating entries

Remove duplicate entries to reduce memory and avoid confusion:

  • Line 27: "궈" appears twice (keep first "gwo")
  • Line 31: "놰" appears twice with different values ("nwo" vs "nwae")
  • Line 35: "돼" appears twice with different values ("dwae" vs "doe")
  • Line 67: "퀘" appears twice with different values ("kwo" vs "kwae")

For entries with intentionally different mappings, consider which romanization is more accurate for OCR matching purposes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs` around lines 23 - 81,
The koreanReplacements list contains duplicated Korean keys (e.g., "궈", "놰",
"돼", "퀘") which are redundant and can confuse maintenance; open the static
List<KeyValuePair<string,string>> koreanReplacements and remove the later
duplicate entries so each Hangul syllable appears only once, choosing the
preferred romanization for conflicts (keep the first occurrence or replace it
with the more accurate mapping) and ensure the final list has unique keys.
WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs (1)

105-106: Latin-1 supplement range (0x00C0-0x00FF) includes non-letter characters.

The range includes control characters (0x00C0-0x00FF) like × (0x00D7), ÷ (0x00F7), and others that aren't letters. This is likely fine for Tesseract whitelisting since OCR rarely produces these, but if precision is desired, consider listing specific accented characters like the German processor does.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs` around lines 105 -
106, The CharacterWhitelist property in EuropeanLanguageProcessor currently
appends GenerateCharacterRange(0x00C0, 0x00FF) which includes non-letter symbols
(e.g., ×, ÷); change this to a precise set of accented letters instead of the
full Latin-1 supplement range: either replace GenerateCharacterRange(0x00C0,
0x00FF) with an explicit string of only valid accented letters (similar to the
German processor) or implement a helper that filters GenerateCharacterRange to
include only unicode letters, and update CharacterWhitelist accordingly so it
contains only A-Z, a-z, space and the specific accented characters.
WFInfo/LanguageProcessing/LanguageProcessor.cs (1)

51-59: Consider logging fallback to invariant culture.

The catch-all exception silently falls back to InvariantCulture. While this prevents crashes, it may hide configuration issues.

📝 Proposed enhancement
             catch
             {
                 // Fallback to invariant culture for unsupported locales
+                // Consider: Main.AddLog($"Unsupported locale '{locale}', falling back to invariant culture");
                 return CultureInfo.InvariantCulture;
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/LanguageProcessor.cs` around lines 51 - 59, The
catch block that returns CultureInfo.InvariantCulture currently swallows all
exceptions; update it to log the failure and the offending locale before
returning the fallback. In the method containing the call to new
CultureInfo(locale, false) (the code that currently returns
CultureInfo.InvariantCulture), capture the caught exception (e) and write a log
entry including the locale and exception details (using your project's logging
facility or System.Diagnostics) so callers can see why the fallback was used,
then return CultureInfo.InvariantCulture as before.
WFInfo/Data.cs (1)

992-1000: Remove unused locale-resolution work from the non-English path.

resolvedName is computed for non-English at Line 999 but not used in that branch, adding avoidable work on hot OCR matching calls.

♻️ Proposed refactor
-            string resolvedName;
-            if (_settings.Locale == "en")
-            {
-                resolvedName = name; // Use original OCR text for English
-            }
-            else
-            {
-                resolvedName = GetLocaleNameData(name, false) ?? name; // Fallback to original OCR string if resolution fails
-            }
+            string resolvedName = name; // only used by English branch below

If you still want pre-resolution for non-English in the future, move it into the branch where it is actually consumed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Data.cs` around lines 992 - 1000, The non-English branch is computing
resolvedName via GetLocaleNameData(name, false) but never using it, wasting work
on hot OCR matching; remove the unnecessary computation or move it into the
location where resolvedName is actually consumed. Specifically, eliminate the
resolvedName assignment inside the else branch (or defer calling
GetLocaleNameData) and keep the English branch behavior using name; adjust any
code paths that expect resolvedName to ensure they reference name or call
GetLocaleNameData only when needed (referencing _settings.Locale, resolvedName,
GetLocaleNameData, and name to locate the logic).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@WFInfo/Data.cs`:
- Around line 1024-1034: The parsing currently requires split.Length >= 3 which
skips two-field cache rows; change the logic around the split handling in the
block that builds marketItemsSnapshot so that you accept split.Length >= 2,
treat split[0] as the English name and when split.Length == 2 set the
localized/slug field to split[1] or to split[0] as a safe fallback, compute
englishNameLength from split[0], compute lengthDiff using (useLocalizedNames &&
split.Length >= 3 ? split[2].Length : split[0].Length), and set comparisonName
similarly (useLocalizedNames && split.Length >= 3 ? split[2] : split[0]) before
adding Tuple.Create(split[0], comparisonName,
processor.NormalizeForPatternMatching(comparisonName)) to marketItemsSnapshot so
older "name|slug" rows are included for non-localized matching.
- Around line 605-648: The null-recovery blocks that replace failed deserialized
data with new empty JObjects (for
marketData/marketItems/equipmentData/relicData/nameData) hide corruption;
instead, update the code around ParseFileOrMakeNew and the variables marketItems
(inside marketItemsLock), equipmentData, relicData, and nameData to fail fast:
if ParseFileOrMakeNew returns null set parseHasFailed (or throw a descriptive
exception) and stop initialization rather than creating an empty JObject, and
ensure any logging (Main.AddLog) includes the parse context and error before
aborting; keep ParseFileOrMakeNew usage but remove the silent creation of new
JObject so the caller surfaces the error.

In `@WFInfo/Ocr.cs`:
- Around line 2577-2584: The catch in GetTextFromImage silently swallows
exceptions; update the catch to capture the exception (e) and log it using the
same logger/format as GetTextWithBoundsFromImage so OCR failures are recorded,
then still set modeResults[mode] = "" and modeScores[mode] = 0; reference the
GetTextFromImage method and the modeResults/modeScores assignments so you locate
and replace the empty catch with a catch(Exception e) that calls the existing
logging utility (same logger/format used in GetTextWithBoundsFromImage) before
assigning the defaults.

In `@WFInfo/Services/TesseractService.cs`:
- Around line 125-127: Remove the invalid post-init SetVariable calls on engine
(engine.SetVariable("engine_mode"...), engine.SetVariable("oem_engine"...),
engine.SetVariable("enable_smoothing"...)); instead, set the OCR engine mode via
the TesseractEngine constructor/init using the documented parameter
tessedit_ocr_engine_mode (init-only), and if smoothing adjustments are required
use documented parameters such as thresholding_smooth_kernel_size or
textord_wordstats_smooth_factor passed at init or configured via appropriate API
before Init; update any code that assumes engine.SetVariable will change
init-only options and remove the unsupported enable_smoothing usage.

---

Nitpick comments:
In `@WFInfo/Data.cs`:
- Around line 992-1000: The non-English branch is computing resolvedName via
GetLocaleNameData(name, false) but never using it, wasting work on hot OCR
matching; remove the unnecessary computation or move it into the location where
resolvedName is actually consumed. Specifically, eliminate the resolvedName
assignment inside the else branch (or defer calling GetLocaleNameData) and keep
the English branch behavior using name; adjust any code paths that expect
resolvedName to ensure they reference name or call GetLocaleNameData only when
needed (referencing _settings.Locale, resolvedName, GetLocaleNameData, and name
to locate the logic).

In `@WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs`:
- Around line 105-106: The CharacterWhitelist property in
EuropeanLanguageProcessor currently appends GenerateCharacterRange(0x00C0,
0x00FF) which includes non-letter symbols (e.g., ×, ÷); change this to a precise
set of accented letters instead of the full Latin-1 supplement range: either
replace GenerateCharacterRange(0x00C0, 0x00FF) with an explicit string of only
valid accented letters (similar to the German processor) or implement a helper
that filters GenerateCharacterRange to include only unicode letters, and update
CharacterWhitelist accordingly so it contains only A-Z, a-z, space and the
specific accented characters.

In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs`:
- Around line 23-81: The koreanReplacements list contains duplicated Korean keys
(e.g., "궈", "놰", "돼", "퀘") which are redundant and can confuse maintenance; open
the static List<KeyValuePair<string,string>> koreanReplacements and remove the
later duplicate entries so each Hangul syllable appears only once, choosing the
preferred romanization for conflicts (keep the first occurrence or replace it
with the more accurate mapping) and ensure the final list has unique keys.

In `@WFInfo/LanguageProcessing/LanguageProcessor.cs`:
- Around line 51-59: The catch block that returns CultureInfo.InvariantCulture
currently swallows all exceptions; update it to log the failure and the
offending locale before returning the fallback. In the method containing the
call to new CultureInfo(locale, false) (the code that currently returns
CultureInfo.InvariantCulture), capture the caught exception (e) and write a log
entry including the locale and exception details (using your project's logging
facility or System.Diagnostics) so callers can see why the fallback was used,
then return CultureInfo.InvariantCulture as before.

In `@WFInfo/Tests/OCRTestRunner.cs`:
- Around line 189-202: The code currently calls Main.dataBase.ReloadItems()
whenever localeChanged || hdrChanged triggers OCR.InitForTest; change this so
ReloadItems() is only invoked when localeChanged is true (HDR-only changes
should skip data reload). Update the conditional around
Main.dataBase.ReloadItems().GetAwaiter().GetResult() (within the block that
handles localeChanged || hdrChanged, near OCR.InitForTest and
HeadlessHDRDetector) to run the reload only if localeChanged, leaving
OCR.InitForTest to run for both locale and HDR changes.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 983504f and 2774b6a.

📒 Files selected for processing (15)
  • WFInfo/CustomEntrypoint.cs
  • WFInfo/Data.cs
  • WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs
  • WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs
  • WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs
  • WFInfo/LanguageProcessing/LanguageProcessor.cs
  • WFInfo/Main.cs
  • WFInfo/Ocr.cs
  • WFInfo/Services/TesseractService.cs
  • WFInfo/Tests/OCRTestRunner.cs
  • WFInfo/Tests/TestModels.cs
  • WFInfo/Tests/TestProgram.cs
  • WFInfo/errorDialogue.xaml.cs
  • tests/README.md
  • tests/run_tests.bat
🚧 Files skipped from review as they are similar to previous changes (4)
  • WFInfo/CustomEntrypoint.cs
  • tests/run_tests.bat
  • tests/README.md
  • WFInfo/Tests/TestProgram.cs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (1)
tests/run_tests.bat (1)

40-47: ⚠️ Potential issue | 🟡 Minor

Guard timestamp generation failure before building output filename.

If wmic fails, TIMESTAMP can be empty and produce test_results_.json, which is easy to overwrite across runs.

🛠️ Suggested hardening
+set "TIMESTAMP="
 for /f "tokens=2 delims==" %%I in ('wmic os get localdatetime /value') do set "TIMESTAMP=%%I"
+if not defined TIMESTAMP (
+    echo ERROR: Failed to generate timestamp via WMIC
+    exit /b 4
+)
 set "TIMESTAMP=%TIMESTAMP:~0,8%_%TIMESTAMP:~8,6%"
#!/bin/bash
rg -n "wmic os get localdatetime|if not defined TIMESTAMP|set \"TIMESTAMP=%TIMESTAMP:~0,8%_%TIMESTAMP:~8,6%\"" tests/run_tests.bat -C2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/run_tests.bat` around lines 40 - 47, After calling wmic to set
TIMESTAMP, add a guard that verifies TIMESTAMP is defined/non-empty and, if not,
assigns a safe fallback (e.g. a sanitized %DATE%_%TIME% or a unique value using
%RANDOM% with time) before composing OUTPUT_FILE; update the logic around the
TIMESTAMP and OUTPUT_FILE variables so the if "%OUTPUT_FILE%"=="" block always
builds a filename with a non-empty TIMESTAMP (referencing the TIMESTAMP and
OUTPUT_FILE variables and the wmic call that produces TIMESTAMP).
🧹 Nitpick comments (4)
WFInfo/Tests/TestModels.cs (1)

39-40: Initialize Scenarios to an empty list by default.

This prevents null-dereference paths and simplifies runner logic for partially populated JSON maps.

🛠️ Suggested change
 public class TestMap
 {
     [JsonProperty("scenarios")]
-    public List<string> Scenarios { get; set; }
+    public List<string> Scenarios { get; set; } = new List<string>();
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Tests/TestModels.cs` around lines 39 - 40, The Scenarios property on
the TestModels class should be initialized to an empty list to avoid null
dereferences; update the declaration of public List<string> Scenarios { get;
set; } to instantiate a new List<string> (or set it in the TestModels
constructor) so Scenarios defaults to an empty collection whenever a TestModels
instance is created.
WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs (1)

479-482: Precompute replacement iteration order once.

OrderByDescending runs on every normalization call. Consider storing a pre-sorted static list to remove repeated allocation/sort work on hot OCR paths.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs` around lines 479 - 482,
The loop currently calls koreanReplacements.OrderByDescending(r => r.Key.Length)
on every normalization, causing repeated sorting/allocation; precompute a single
ordered collection (e.g., a static readonly IList<KeyValuePair<string,string>>
or array named something like koreanReplacementsOrdered) sorted by Key.Length
descending when the processor is initialized, and then in the normalization
method iterate that pre-sorted collection (use koreanReplacementsOrdered in
place of koreanReplacements.OrderByDescending(...)) so
result.Replace(replacement.Key, replacement.Value) uses the cached order without
per-call sorting.
WFInfo/LanguageProcessing/LanguageProcessor.cs (1)

218-247: Minor observation: potential redundant lowercasing.

When callBaseDefault is false (the default), LevenshteinDistanceWithPreprocessing calls DefaultLevenshteinDistance which applies ToLower(_culture). If the normalizer function already lowercases the input (as NormalizeChineseCharacters does), this results in double lowercasing. This is functionally harmless but slightly inefficient.

Consider passing callBaseDefault: true from callers that already normalize case, or document that normalizer should not lowercase if callBaseDefault is false.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/LanguageProcessor.cs` around lines 218 - 247,
LevenshteinDistanceWithPreprocessing can double-lowercase when a provided
normalizer (e.g., NormalizeChineseCharacters) already lowercases and
callBaseDefault is false; update callers of LevenshteinDistanceWithPreprocessing
that pass such normalizers to call it with callBaseDefault: true so the method
calls ComputeLevenshteinCore (which expects already-normalized input) instead of
DefaultLevenshteinDistance, or alternatively document the expectation that
normalizers must not lowercase when relying on DefaultLevenshteinDistance;
locate usages of LevenshteinDistanceWithPreprocessing and change the invocation
to include callBaseDefault: true for those that perform case normalization.
WFInfo/Ocr.cs (1)

111-115: Consider including Japanese locale in CJK check.

IsCJKLocale() checks for Korean and Chinese but not Japanese ("ja"). Japanese text also uses CJK ideographs and may benefit from the same line height and padding adjustments.

However, per the PR description, Japanese is not supported yet, so this may be intentional.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Ocr.cs` around lines 111 - 115, The IsCJKLocale() method currently
checks ApplicationSettings.GlobalReadonlySettings.Locale for "ko", "zh-hans",
and "zh-hant" but omits Japanese; update IsCJKLocale() to include "ja" in the
comparison (e.g., add || locale == "ja") so Japanese locales are treated as CJK
for line-height/padding logic; ensure any callers that rely on IsCJKLocale()
will now apply the same adjustments for Japanese text and add/update a unit test
or comment if this behavior is intended.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@WFInfo/Data.cs`:
- Around line 614-619: In UpdateInner, do not silently return when JSON
deserialization yields null (e.g., the marketData null-check that currently
calls Main.AddLog, sets parseHasFailed and returns); instead throw a descriptive
exception (for example InvalidDataException or a custom ParseException) that
includes context (which field failed like "marketData") and any relevant data;
apply the same change to the other similar branches (the checks currently at the
ranges you noted) so each branch replaces Main.AddLog/parseHasFailed/return with
a thrown exception to surface fatal parse errors to callers.

In `@WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs`:
- Line 16: CharacterWhitelist in ChineseLanguageProcessor currently builds a
single large range (GenerateCharacterRange(0x4E00, 0x9FFF)) that exceeds
GenerateCharacterRange's 10,000-character limit and throws
ArgumentOutOfRangeException; split the large CJK ranges into multiple sub-ranges
under the limit and concatenate them (e.g., call GenerateCharacterRange
repeatedly for consecutive smaller spans) when building CharacterWhitelist,
keeping other ranges (0x3400-0x4DBF, 0xF900-0xFAFF) similarly split if needed;
reference the CharacterWhitelist property in ChineseLanguageProcessor and the
GenerateCharacterRange helper and follow the approach noted in
LanguageProcessor.cs comment.

In `@WFInfo/LanguageProcessing/LanguageProcessor.cs`:
- Around line 344-360: Character whitelist generation is throwing
ArgumentOutOfRangeException because ChineseLanguageProcessorBase and
JapaneseLanguageProcessor request ranges (0x4E00-0x9FFF and 0x4E00-0x9FAF)
larger than MaxGeneratedRangeSize; fix by either increasing
MaxGeneratedRangeSize to at least 25000 so GenerateCharacterRange(int start, int
end) can allocate those ranges, or refactor CharacterWhitelist creation in
ChineseLanguageProcessorBase/JapaneseLanguageProcessor to use
GenerateCharacterRangeIterator(int start, int end) and materialize via
string.Concat(iterator) (or a streaming builder) instead of calling
GenerateCharacterRange for large ranges; update references to
CharacterWhitelist, GenerateCharacterRange, GenerateCharacterRangeIterator, and
MaxGeneratedRangeSize accordingly.

In `@WFInfo/Main.cs`:
- Around line 98-102: Capture MainWindow.INSTANCE and its Dispatcher in local
variables before the await to avoid reading them across the async boundary;
e.g., store var wnd = MainWindow.INSTANCE and var disp = wnd?.Dispatcher, check
disp != null, then await disp.InvokeAsync(() => wnd.UpdateMarketStatus(msg));
ensure you reference MainWindow.INSTANCE, Dispatcher and UpdateMarketStatus and
only access INSTANCE once to prevent race conditions during teardown.
- Around line 266-267: RunOnUIThread currently races by checking
MainWindow.INSTANCE then invoking its Dispatcher later; fix it by capturing the
window once and performing safer dispatcher checks: store MainWindow.INSTANCE
into a local variable (e.g., mw), then if mw?.Dispatcher != null &&
!mw.Dispatcher.HasShutdownStarted && !mw.Dispatcher.HasShutdownFinished use
mw.Dispatcher.CheckAccess to run the action directly when already on UI thread
or call mw.Dispatcher.Invoke/BeginInvoke otherwise; apply this pattern to the
calls that reference MainWindow.INSTANCE and Dispatcher.Invoke (the act lambda
sites) to eliminate the lifecycle race and guard against dispatcher shutdown.

In `@WFInfo/Services/TesseractService.cs`:
- Around line 256-266: The empty catch around the traineddata download swallows
all errors; inside the try block that calls
webClient.DownloadFile(traineddata_hotlink, app_data_traineddata_path) and
File.Copy(app_data_traineddata_path, curr_data_traineddata_path, true) (in
TesseractService), replace the empty catch with concrete error handling: catch
(Exception ex) and either log the exception (using the project's logger) with
context about traineddata_hotlink, app_data_traineddata_path and
curr_data_traineddata_path, and/or rethrow or surface the error so callers can
react; ensure you include the exception message/stack in the log to preserve
root-cause information.

In `@WFInfo/Tests/OCRTestRunner.cs`:
- Around line 39-45: Validate that
JsonConvert.DeserializeObject<TestMap>(testMapJson) returned a non-null TestMap
and that testMap.Scenarios is non-null/non-empty before using it; if the
deserialization failed or scenarios is null, log a clear error via Main.AddLog
(or throw a descriptive exception) including testMapPath and abort the run, then
only proceed to use testMap.Scenarios in the foreach. Target the TestMap
deserialization and subsequent use of testMap and testMap.Scenarios in
OCRTestRunner.cs.

---

Duplicate comments:
In `@tests/run_tests.bat`:
- Around line 40-47: After calling wmic to set TIMESTAMP, add a guard that
verifies TIMESTAMP is defined/non-empty and, if not, assigns a safe fallback
(e.g. a sanitized %DATE%_%TIME% or a unique value using %RANDOM% with time)
before composing OUTPUT_FILE; update the logic around the TIMESTAMP and
OUTPUT_FILE variables so the if "%OUTPUT_FILE%"=="" block always builds a
filename with a non-empty TIMESTAMP (referencing the TIMESTAMP and OUTPUT_FILE
variables and the wmic call that produces TIMESTAMP).

---

Nitpick comments:
In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs`:
- Around line 479-482: The loop currently calls
koreanReplacements.OrderByDescending(r => r.Key.Length) on every normalization,
causing repeated sorting/allocation; precompute a single ordered collection
(e.g., a static readonly IList<KeyValuePair<string,string>> or array named
something like koreanReplacementsOrdered) sorted by Key.Length descending when
the processor is initialized, and then in the normalization method iterate that
pre-sorted collection (use koreanReplacementsOrdered in place of
koreanReplacements.OrderByDescending(...)) so result.Replace(replacement.Key,
replacement.Value) uses the cached order without per-call sorting.

In `@WFInfo/LanguageProcessing/LanguageProcessor.cs`:
- Around line 218-247: LevenshteinDistanceWithPreprocessing can double-lowercase
when a provided normalizer (e.g., NormalizeChineseCharacters) already lowercases
and callBaseDefault is false; update callers of
LevenshteinDistanceWithPreprocessing that pass such normalizers to call it with
callBaseDefault: true so the method calls ComputeLevenshteinCore (which expects
already-normalized input) instead of DefaultLevenshteinDistance, or
alternatively document the expectation that normalizers must not lowercase when
relying on DefaultLevenshteinDistance; locate usages of
LevenshteinDistanceWithPreprocessing and change the invocation to include
callBaseDefault: true for those that perform case normalization.

In `@WFInfo/Ocr.cs`:
- Around line 111-115: The IsCJKLocale() method currently checks
ApplicationSettings.GlobalReadonlySettings.Locale for "ko", "zh-hans", and
"zh-hant" but omits Japanese; update IsCJKLocale() to include "ja" in the
comparison (e.g., add || locale == "ja") so Japanese locales are treated as CJK
for line-height/padding logic; ensure any callers that rely on IsCJKLocale()
will now apply the same adjustments for Japanese text and add/update a unit test
or comment if this behavior is intended.

In `@WFInfo/Tests/TestModels.cs`:
- Around line 39-40: The Scenarios property on the TestModels class should be
initialized to an empty list to avoid null dereferences; update the declaration
of public List<string> Scenarios { get; set; } to instantiate a new List<string>
(or set it in the TestModels constructor) so Scenarios defaults to an empty
collection whenever a TestModels instance is created.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 983504f and 8cb060d.

📒 Files selected for processing (15)
  • WFInfo/CustomEntrypoint.cs
  • WFInfo/Data.cs
  • WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs
  • WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs
  • WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs
  • WFInfo/LanguageProcessing/LanguageProcessor.cs
  • WFInfo/Main.cs
  • WFInfo/Ocr.cs
  • WFInfo/Services/TesseractService.cs
  • WFInfo/Tests/OCRTestRunner.cs
  • WFInfo/Tests/TestModels.cs
  • WFInfo/Tests/TestProgram.cs
  • WFInfo/errorDialogue.xaml.cs
  • tests/README.md
  • tests/run_tests.bat
🚧 Files skipped from review as they are similar to previous changes (3)
  • WFInfo/CustomEntrypoint.cs
  • tests/README.md
  • WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
WFInfo/Ocr.cs (2)

1413-1757: ⚠️ Potential issue | 🟠 Major

Temporary count-scan bitmaps leak on early continue paths.

cloneBitmap and cloneBitmapColoured are allocated before several early continue branches (Line 1463, Line 1544, Line 1637, Line 1710) and are not always disposed. This can exhaust GDI handles during noisy scans.

♻️ Proposed disposal pattern
-                        Bitmap cloneBitmap = filteredImageClean.Clone(cloneRect, filteredImageClean.PixelFormat);
-                        Bitmap cloneBitmapColoured = unfilteredImage.Clone(cloneRect, filteredImageClean.PixelFormat);
+                        Bitmap cloneBitmap = null;
+                        Bitmap cloneBitmapColoured = null;
+                        try
+                        {
+                            cloneBitmap = filteredImageClean.Clone(cloneRect, filteredImageClean.PixelFormat);
+                            cloneBitmapColoured = unfilteredImage.Clone(cloneRect, filteredImageClean.PixelFormat);
 
                         // ... existing processing logic (including early continue paths) ...
 
-                        cloneBitmapColoured.Dispose();
-                        cloneBitmap.Dispose();
+                        }
+                        finally
+                        {
+                            cloneBitmapColoured?.Dispose();
+                            cloneBitmap?.Dispose();
+                        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Ocr.cs` around lines 1413 - 1757, The code leaks GDI bitmaps because
cloneBitmap and cloneBitmapColoured are created near the top of the inner loop
(variables cloneBitmap, cloneBitmapColoured) but early continue paths (where
sumBlack checks and Width/Height checks occur) return without disposing them;
fix by ensuring both Bitmaps are disposed on all paths—wrap their creation in
using blocks or add try/finally that disposes cloneBitmap and
cloneBitmapColoured before every continue or on method exit; update the code
paths around the first center-of-mass check (sumBlack < Height), after the
flood-fill check (sumBlack < Height), after selecting topColor (topColor default
check), and after the final width/height validation (Width < 5 || Height < 5) so
no early return leaves undisposed Bitmaps.

320-341: ⚠️ Potential issue | 🟠 Major

Market-data safety check is incomplete; plat/volume still can throw.

You now guard ducats, but job["plat"] and double.Parse(...) are still unchecked. A malformed/missing plat can throw and abort reward processing mid-run.

🛠️ Proposed hardening
 string ducats = job["ducats"].ToObject<string>();
 if (!int.TryParse(ducats, System.Globalization.NumberStyles.Integer, Main.culture, out int ducatValue) || ducatValue == 0)
 {
     hideRewardInfo = true;
 }
 //else if (correctName != "Kuva" || correctName != "Exilus Weapon Adapter Blueprint" || correctName != "Riven Sliver" || correctName != "Ayatan Amber Star")
 primeRewards.Add(correctName);
-string plat = job["plat"].ToObject<string>();
+string plat = job["plat"]?.ToObject<string>();
+if (!double.TryParse(plat, styles, Main.culture, out double platinum))
+{
+    Main.AddLog($"MARKET DATA: Invalid plat value for '{correctName}', skipping");
+    continue;
+}
 string primeSetPlat = null;
 if (primeSet != null)
 {
     primeSetPlat = (string)primeSet["plat"];
 }
-double platinum = double.Parse(plat, styles, Main.culture);
-string volume = job["volume"].ToObject<string>();
+string volume = job["volume"]?.ToObject<string>() ?? "0";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Ocr.cs` around lines 320 - 341, The code currently guards
job["ducats"] but still accesses job["plat"] and job["volume"] directly and
calls double.Parse, which can throw; update the processing around job, plat, and
volume (symbols: job, plat, volume, double.Parse, primeRewards, hideRewardInfo)
to first null-check job["plat"] and job["volume"], then use double.TryParse with
the same styles and Main.culture to safely parse platinum into a double (handle
failure by setting a default or skipping this reward and setting hideRewardInfo
as appropriate), and only add to primeRewards when parsing succeeds and data is
valid; ensure any fallback behavior (skip, log, or default values) is consistent
with existing hideRewardInfo logic so malformed plat/volume won't abort
processing.
♻️ Duplicate comments (2)
WFInfo/Services/TesseractService.cs (1)

256-269: ⚠️ Potential issue | 🟠 Major

Do not continue silently when traineddata download fails and no local file exists.

Current logic logs and continues. On first-run/offline for a new locale, this can defer failure to engine creation with weaker diagnostics. Re-throw when both destination candidates are missing.

🛠️ Suggested fix
                     catch (Exception ex)
                     {
-                        Main.AddLog($"Failed to download traineddata for locale '{Locale}': {ex.Message}. Source: {traineddata_hotlink}, Target: {app_data_traineddata_path}");
-                        // Don't throw during initialization to allow service to continue with existing data
+                        Main.AddLog($"Failed to download traineddata for locale '{Locale}': {ex.Message}. Source: {traineddata_hotlink}, Target: {app_data_traineddata_path}");
+                        // Continue only if some local traineddata is already available.
+                        if (!File.Exists(app_data_traineddata_path) && !File.Exists(curr_data_traineddata_path))
+                            throw;
                     }
#!/bin/bash
# Verify the catch path currently logs and continues without a local-file existence gate.
rg -n -C3 'DownloadFile|File\.Copy|catch \(Exception ex\)|Failed to download traineddata|CreateEngine\(' WFInfo/Services/TesseractService.cs

Expected result: the catch block around download/copy logs an error but does not conditionally rethrow when both traineddata paths are absent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Services/TesseractService.cs` around lines 256 - 269, The catch
currently logs download failures in the webClient.DownloadFile/File.Copy block
but swallows the exception; change it to check existence of both
app_data_traineddata_path and curr_data_traineddata_path after the failed
download and if neither file exists re-throw the caught exception (or throw a
new informative exception including Locale, traineddata_hotlink,
app_data_traineddata_path) so service initialization fails fast instead of
deferring to CreateEngine; keep the Main.AddLog call for diagnostics and
reference webClient.DownloadFile, File.Copy, Locale, traineddata_hotlink,
app_data_traineddata_path, curr_data_traineddata_path and any existing
catch(Exception ex) block when implementing.
WFInfo/Main.cs (1)

266-275: ⚠️ Potential issue | 🟠 Major

RunOnUIThread still allows MainWindow.INSTANCE lifecycle races through callsites.

Even with dispatcher guards, the Action delegate can still dereference MainWindow.INSTANCE later (Line [311], Line [553], Line [623]). Pass the captured window into the callback to close the race.

🛠️ Suggested fix
-        public static void RunOnUIThread(Action act)
+        public static void RunOnUIThread(Action<MainWindow> act)
         {
             var mw = MainWindow.INSTANCE;
             if (mw?.Dispatcher != null && !mw.Dispatcher.HasShutdownStarted && !mw.Dispatcher.HasShutdownFinished)
             {
                 if (mw.Dispatcher.CheckAccess())
-                    act();
+                    act(mw);
                 else
-                    mw.Dispatcher.Invoke(act);
+                    mw.Dispatcher.Invoke(() => act(mw));
             }
         }

-            RunOnUIThread(() => { MainWindow.INSTANCE.ChangeStatus(message, severity); });
+            RunOnUIThread(w => w.ChangeStatus(message, severity));

-            RunOnUIThread(() => { MainWindow.INSTANCE.LoggedIn(); });
+            RunOnUIThread(w => w.LoggedIn());

-            RunOnUIThread(() => { MainWindow.INSTANCE.SignOut(); });
+            RunOnUIThread(w => w.SignOut());

Also applies to: 311-311, 553-553, 623-623

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Main.cs` around lines 266 - 275, The RunOnUIThread method currently
captures MainWindow.INSTANCE lazily and still allows races when callers
dereference MainWindow later; change RunOnUIThread to capture the window
instance into a local variable (mw) and accept an Action<MainWindow> (or
Func/Delegate that receives the window) so the invoked callback receives that
captured mw rather than re-reading MainWindow.INSTANCE; update the method
signature (RunOnUIThread) and all call sites that pass Action to instead pass
Action<MainWindow> and use the provided mw parameter inside those callbacks to
eliminate lifecycle races.
🧹 Nitpick comments (4)
WFInfo/Tests/OCRTestRunner.cs (1)

89-99: Redundant equality check in path traversal validation.

The check !jsonFull.Equals(baseDir, ...) on lines 89 and 95 is always true since a file path (ending in .json or .png) can never equal a directory path. This makes the condition effectively just the StartsWith check, which is correct.

While the logic is safe, the redundant equality check adds confusion. Consider simplifying:

🔧 Simplify path validation
-            if (!jsonFull.Equals(baseDir, StringComparison.OrdinalIgnoreCase) && 
-                !jsonFull.StartsWith(baseDir + Path.DirectorySeparatorChar, StringComparison.OrdinalIgnoreCase))
+            if (!jsonFull.StartsWith(baseDir + Path.DirectorySeparatorChar, StringComparison.OrdinalIgnoreCase))
             {
                 throw new Exception($"Path traversal detected for JSON file: {scenarioPath}");
             }
             
-            if (!imageFull.Equals(baseDir, StringComparison.OrdinalIgnoreCase) && 
-                !imageFull.StartsWith(baseDir + Path.DirectorySeparatorChar, StringComparison.OrdinalIgnoreCase))
+            if (!imageFull.StartsWith(baseDir + Path.DirectorySeparatorChar, StringComparison.OrdinalIgnoreCase))
             {
                 throw new Exception($"Path traversal detected for image file: {scenarioPath}");
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Tests/OCRTestRunner.cs` around lines 89 - 99, The path-traversal
checks in OCRTestRunner.cs use redundant equality comparisons
(jsonFull.Equals(baseDir, ...) and imageFull.Equals(baseDir, ...)) which will
never be true for file paths; remove those equality checks and simplify each
conditional to only verify that jsonFull (and imageFull) StartsWith baseDir +
Path.DirectorySeparatorChar using StringComparison.OrdinalIgnoreCase, keeping
the same exception throw referencing scenarioPath; update the if conditions
around jsonFull, imageFull, baseDir and scenarioPath accordingly.
WFInfo/WFInfo.csproj (1)

133-164: Consider adding existence check before zipping.

The CreatePublishZip target assumes WFInfo.exe exists at $(ExePath). If the build fails or the path is incorrect, the PowerShell command will fail silently or produce an unclear error.

💡 Optional: Add validation before zipping
   <Target Name="CreatePublishZip" AfterTargets="GenerateUpdateXml" Condition="'$(Configuration)'=='Release'">
     <PropertyGroup>
       <ExePath>$(OutputPath)WFInfo.exe</ExePath>
       <ZipPath>$(OutputPath)WFInfo.zip</ZipPath>
     </PropertyGroup>
     
+    <Error Condition="!Exists('$(ExePath)')" Text="Cannot create zip: $(ExePath) not found" />
+    
     <!-- Create zip file using PowerShell with only the executable -->
     <Exec Command="powershell -Command &quot;Compress-Archive -Path '$(ExePath)' -DestinationPath '$(ZipPath)' -Force&quot;" 
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/WFInfo.csproj` around lines 133 - 164, The CreatePublishZip target
assumes WFInfo.exe exists at $(ExePath); add an existence check and fail early
or skip zipping if it's missing. Inside the CreatePublishZip target (referencing
ExePath, ZipPath and the Exec Command that calls Compress-Archive), add an
MSBuild Condition/validation such as an <Error> with
Condition="!Exists('$(ExePath)')" to stop the build with a clear message or wrap
the <Exec> in Condition="Exists('$(ExePath)')" so the zip step only runs when
the file is present; ensure the Message/Fail text mentions $(ExePath) for
clarity.
WFInfo/LanguageProcessing/PolishLanguageProcessor.cs (1)

59-75: Simplify unreachable uppercase replacements in NormalizePolishCharacters.

After ToLowerInvariant() at Line [59], uppercase replacements cannot match. Removing them reduces noise.

♻️ Optional cleanup
         private static string NormalizePolishCharacters(string input)
         {
             // Convert Polish characters to standard equivalents for comparison
             return input.ToLowerInvariant()
                 .Replace('ą', 'a')
-                .Replace('Ą', 'A')
                 .Replace('ę', 'e')
-                .Replace('Ę', 'E')
                 .Replace('ć', 'c')
-                .Replace('Ć', 'C')
                 .Replace('ł', 'l')
-                .Replace('Ł', 'L')
                 .Replace('ś', 's')
-                .Replace('Ś', 'S')
                 .Replace('ź', 'z')
-                .Replace('Ź', 'Z')
                 .Replace('ż', 'z')
-                .Replace('Ż', 'Z')
-                .Replace('ó', 'o')
-                .Replace('Ó', 'O');
+                .Replace('ó', 'o');
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/PolishLanguageProcessor.cs` around lines 59 - 75,
In NormalizePolishCharacters remove the redundant Replace calls that handle
uppercase Polish letters (e.g., .Replace('Ą','A'), .Replace('Ę','E'),
.Replace('Ć','C'), .Replace('Ł','L'), .Replace('Ś','S'), .Replace('Ź','Z'),
.Replace('Ż','Z'), .Replace('Ó','O')) because the method already calls
ToLowerInvariant() at the start so those branches are unreachable; keep the
lowercase replacements (ą,ę,ć,ł,ś,ź,ż,ó) and the initial ToLowerInvariant() and
return the cleaned chain (refer to NormalizePolishCharacters to locate and
update the Replace call chain).
WFInfo/Data.cs (1)

879-883: Reduce lock scope in GetLocaleNameData to avoid contention on OCR-heavy paths.

processor.GetLocalizedNameData(...) can be expensive; holding marketItemsLock for the full operation unnecessarily blocks concurrent readers/writers.

♻️ Suggested refactor
         public string GetLocaleNameData(string s, bool useLevenshtein)
         {
             var processor = LanguageProcessorFactory.GetCurrentProcessor();
-            lock (marketItemsLock)
-            {
-                return processor.GetLocalizedNameData(s, marketItems, useLevenshtein);
-            }
+            JObject snapshot;
+            lock (marketItemsLock)
+            {
+                snapshot = marketItems;
+            }
+            return processor.GetLocalizedNameData(s, snapshot, useLevenshtein);
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Data.cs` around lines 879 - 883, GetLocaleNameData currently holds
marketItemsLock while calling the expensive processor.GetLocalizedNameData,
causing contention; instead, acquire marketItemsLock only to capture a safe
snapshot (e.g., copy or shallow clone) of marketItems, then release the lock and
call LanguageProcessorFactory.GetCurrentProcessor() followed by
processor.GetLocalizedNameData(s, snapshot, useLevenshtein). Ensure you still
read any minimal state under marketItemsLock (using marketItemsLock) but do not
call processor.GetLocalizedNameData while holding the lock.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs`:
- Around line 16-21: CharacterWhitelist currently calls
GenerateCharacterRange(0x4E00, 0x7FFF) which exceeds MaxGeneratedRangeSize and
throws; split that large range into two valid chunks (for example 0x4E00-0x6FFF
and 0x7000-0x7FFF) and combine them with the other existing
GenerateCharacterRange calls (0x8000-0x9FFF, 0x3400-0x4DBF, 0xF900-0xFAFF) so
each GenerateCharacterRange invocation stays <= MaxGeneratedRangeSize; update
the CharacterWhitelist getter to concatenate the new smaller ranges (referencing
the CharacterWhitelist property and GenerateCharacterRange method) and ensure
ordering/spacing remains the same.

In `@WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs`:
- Around line 178-187: The check for mixed Japanese-Latin (hasJapanese &&
hasLatin) is unreachable because the earlier hasJapanese check returns
immediately; in JapaneseLanguageProcessor.cs move the mixed-language check
before the plain hasJapanese check (or remove the redundant plain hasJapanese
branch) so that the mixed-case logic runs; update the same method that contains
the hasJapanese/hasLatin variables (the word-filtering method) to evaluate
hasJapanese && hasLatin first and then fall back to hasJapanese-only and the
non-Japanese length check.
- Around line 217-220: The chained replacements in JapaneseLanguageProcessor use
result.Replace('ヲ', 'ヲ') which is a no-op and duplicates a later mapping to 'ヲ'
-> 'オ'; remove the redundant result.Replace('ヲ', 'ヲ') (or correct it to the
intended source character if it was a typo) so the Replace chain on the local
variable result only contains the correct mappings (e.g. keep
result.Replace('ヲ', 'オ') and the other mappings for 'ヮ','ヰ','ヱ'), then return
result.ToLowerInvariant().

In `@WFInfo/LanguageProcessing/LanguageProcessor.cs`:
- Around line 344-360: The first CJK range in ChineseLanguageProcessorBase is
calling GenerateCharacterRange(0x4E00, 0x7FFF) which produces 12,800 chars and
exceeds MaxGeneratedRangeSize; update the ChineseLanguageProcessorBase ranges to
split that call into smaller ranges (for example GenerateCharacterRange(0x4E00,
0x6FFF) and GenerateCharacterRange(0x7000, 0x7FFF)) so each generated range is
<= MaxGeneratedRangeSize, or replace the large call with repeated calls to
GenerateCharacterRangeIterator to build the same set without exceeding the
limit.

In `@WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs`:
- Around line 188-210: The NormalizeThaiCharacters method currently performs
destructive chained one-way replacements (e.g., 'ซ'→'ศ' then 'ศ'→'ษ') which
loses original info and risks false matches; replace that logic with a
single-pass, non-destructive canonicalization using a lookup map or grouping
(e.g., a Dictionary<char,char> or HashSet groups) that maps each character to a
chosen canonical representative (or leaves it unchanged) instead of chaining
Replace calls, or remove those replacements entirely and rely on the existing
GetThaiCharacterDifference similarity logic at comparison time; update
NormalizeThaiCharacters (and preserve NormalizeFullWidthCharacters usage) so it
iterates characters once, applies the canonical map, strips karan and diacritics
as needed, and returns the normalized string without multi-step cascading
replaces.
- Around line 136-145: The check for mixed Thai-Latin (if (hasThai && hasLatin))
is unreachable because the prior if (hasThai) returns early; in the
ThaiLanguageProcessor method containing the hasThai/hasLatin logic, either
remove the redundant mixed check or move the mixed-case branch above the plain
hasThai branch so mixed words are handled first (i.e., evaluate if (hasThai &&
hasLatin) before the if (hasThai) return false) and ensure the final return
applies only to non-Thai words.
- Around line 113-116: The toneMarks array in ThaiLanguageProcessor.cs contains
duplicate entries; update the definition of toneMarks so it only lists each Thai
tone mark once (remove the repeated '่', '้', '๊', '๋'), keeping the subsequent
logic that computes aIsTone and bIsTone and the early return if both are tone
marks unchanged.

In `@WFInfo/Ocr.cs`:
- Around line 191-199: The catch block in the _tesseractService.Init() call
should not set _tesseractService = null because callers like ProcessSnapIt,
profile scan paths, and tests still dereference it; instead remove the null
assignment and either rethrow the exception or set a dedicated init-failed flag
(e.g. add a private bool _tesseractInitFailed) so callers can explicitly check
initialization state; update places that assume non-null (e.g. ProcessSnapIt and
any test entry points) to check the new flag or handle the exception, and retain
the Main.AddLog(...) call to record the error.
- Around line 2901-2931: ProcessRewardScreenForTest currently dereferences
_tesseractService without checking for null (InitForTest can set it to null);
add an early guard at the top of ProcessRewardScreenForTest that returns an
empty results list if _tesseractService is null (or if _tesseractService.Engines
or FirstEngine are missing), and mirror the same null-guard pattern in the other
test entry point around lines 2954-2968; reference symbols:
ProcessRewardScreenForTest, _tesseractService, InitForTest, GetTextFromImage,
and ensure you check for _tesseractService == null (and safe-check
Engines/FirstEngine) before accessing them.
- Around line 111-115: IsCJKLocale currently reads
ApplicationSettings.GlobalReadonlySettings.Locale (a global static) which can
differ from the injected _settings or be null; change it to use the injected
settings instance (e.g., _settings.Locale or Settings.Locale) and stop using the
global static: make IsCJKLocale non-static if necessary so it can access the
instance field, read the locale from _settings with a null-safe check (default
to empty string) and then return locale == "ko" || locale == "zh-hans" || locale
== "zh-hant" || locale == "ja".

---

Outside diff comments:
In `@WFInfo/Ocr.cs`:
- Around line 1413-1757: The code leaks GDI bitmaps because cloneBitmap and
cloneBitmapColoured are created near the top of the inner loop (variables
cloneBitmap, cloneBitmapColoured) but early continue paths (where sumBlack
checks and Width/Height checks occur) return without disposing them; fix by
ensuring both Bitmaps are disposed on all paths—wrap their creation in using
blocks or add try/finally that disposes cloneBitmap and cloneBitmapColoured
before every continue or on method exit; update the code paths around the first
center-of-mass check (sumBlack < Height), after the flood-fill check (sumBlack <
Height), after selecting topColor (topColor default check), and after the final
width/height validation (Width < 5 || Height < 5) so no early return leaves
undisposed Bitmaps.
- Around line 320-341: The code currently guards job["ducats"] but still
accesses job["plat"] and job["volume"] directly and calls double.Parse, which
can throw; update the processing around job, plat, and volume (symbols: job,
plat, volume, double.Parse, primeRewards, hideRewardInfo) to first null-check
job["plat"] and job["volume"], then use double.TryParse with the same styles and
Main.culture to safely parse platinum into a double (handle failure by setting a
default or skipping this reward and setting hideRewardInfo as appropriate), and
only add to primeRewards when parsing succeeds and data is valid; ensure any
fallback behavior (skip, log, or default values) is consistent with existing
hideRewardInfo logic so malformed plat/volume won't abort processing.

---

Duplicate comments:
In `@WFInfo/Main.cs`:
- Around line 266-275: The RunOnUIThread method currently captures
MainWindow.INSTANCE lazily and still allows races when callers dereference
MainWindow later; change RunOnUIThread to capture the window instance into a
local variable (mw) and accept an Action<MainWindow> (or Func/Delegate that
receives the window) so the invoked callback receives that captured mw rather
than re-reading MainWindow.INSTANCE; update the method signature (RunOnUIThread)
and all call sites that pass Action to instead pass Action<MainWindow> and use
the provided mw parameter inside those callbacks to eliminate lifecycle races.

In `@WFInfo/Services/TesseractService.cs`:
- Around line 256-269: The catch currently logs download failures in the
webClient.DownloadFile/File.Copy block but swallows the exception; change it to
check existence of both app_data_traineddata_path and curr_data_traineddata_path
after the failed download and if neither file exists re-throw the caught
exception (or throw a new informative exception including Locale,
traineddata_hotlink, app_data_traineddata_path) so service initialization fails
fast instead of deferring to CreateEngine; keep the Main.AddLog call for
diagnostics and reference webClient.DownloadFile, File.Copy, Locale,
traineddata_hotlink, app_data_traineddata_path, curr_data_traineddata_path and
any existing catch(Exception ex) block when implementing.

---

Nitpick comments:
In `@WFInfo/Data.cs`:
- Around line 879-883: GetLocaleNameData currently holds marketItemsLock while
calling the expensive processor.GetLocalizedNameData, causing contention;
instead, acquire marketItemsLock only to capture a safe snapshot (e.g., copy or
shallow clone) of marketItems, then release the lock and call
LanguageProcessorFactory.GetCurrentProcessor() followed by
processor.GetLocalizedNameData(s, snapshot, useLevenshtein). Ensure you still
read any minimal state under marketItemsLock (using marketItemsLock) but do not
call processor.GetLocalizedNameData while holding the lock.

In `@WFInfo/LanguageProcessing/PolishLanguageProcessor.cs`:
- Around line 59-75: In NormalizePolishCharacters remove the redundant Replace
calls that handle uppercase Polish letters (e.g., .Replace('Ą','A'),
.Replace('Ę','E'), .Replace('Ć','C'), .Replace('Ł','L'), .Replace('Ś','S'),
.Replace('Ź','Z'), .Replace('Ż','Z'), .Replace('Ó','O')) because the method
already calls ToLowerInvariant() at the start so those branches are unreachable;
keep the lowercase replacements (ą,ę,ć,ł,ś,ź,ż,ó) and the initial
ToLowerInvariant() and return the cleaned chain (refer to
NormalizePolishCharacters to locate and update the Replace call chain).

In `@WFInfo/Tests/OCRTestRunner.cs`:
- Around line 89-99: The path-traversal checks in OCRTestRunner.cs use redundant
equality comparisons (jsonFull.Equals(baseDir, ...) and
imageFull.Equals(baseDir, ...)) which will never be true for file paths; remove
those equality checks and simplify each conditional to only verify that jsonFull
(and imageFull) StartsWith baseDir + Path.DirectorySeparatorChar using
StringComparison.OrdinalIgnoreCase, keeping the same exception throw referencing
scenarioPath; update the if conditions around jsonFull, imageFull, baseDir and
scenarioPath accordingly.

In `@WFInfo/WFInfo.csproj`:
- Around line 133-164: The CreatePublishZip target assumes WFInfo.exe exists at
$(ExePath); add an existence check and fail early or skip zipping if it's
missing. Inside the CreatePublishZip target (referencing ExePath, ZipPath and
the Exec Command that calls Compress-Archive), add an MSBuild
Condition/validation such as an <Error> with Condition="!Exists('$(ExePath)')"
to stop the build with a clear message or wrap the <Exec> in
Condition="Exists('$(ExePath)')" so the zip step only runs when the file is
present; ensure the Message/Fail text mentions $(ExePath) for clarity.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 983504f and 82ae52f.

📒 Files selected for processing (20)
  • WFInfo/CustomEntrypoint.cs
  • WFInfo/Data.cs
  • WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs
  • WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs
  • WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs
  • WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs
  • WFInfo/LanguageProcessing/LanguageProcessor.cs
  • WFInfo/LanguageProcessing/PolishLanguageProcessor.cs
  • WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs
  • WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs
  • WFInfo/Main.cs
  • WFInfo/Ocr.cs
  • WFInfo/Services/TesseractService.cs
  • WFInfo/Tests/OCRTestRunner.cs
  • WFInfo/Tests/TestModels.cs
  • WFInfo/Tests/TestProgram.cs
  • WFInfo/WFInfo.csproj
  • WFInfo/errorDialogue.xaml.cs
  • tests/README.md
  • tests/run_tests.bat
✅ Files skipped from review due to trivial changes (1)
  • tests/README.md
🚧 Files skipped from review as they are similar to previous changes (4)
  • WFInfo/Tests/TestModels.cs
  • WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs
  • tests/run_tests.bat
  • WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs

Comment on lines +16 to +21
public override string CharacterWhitelist =>
GenerateCharacterRange(0x4E00, 0x7FFF) +
GenerateCharacterRange(0x8000, 0x9FFF) +
GenerateCharacterRange(0x3400, 0x4DBF) +
GenerateCharacterRange(0xF900, 0xFAFF) +
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz "; // Full CJK ideographs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Runtime exception: CJK range 0x4E00-0x7FFF exceeds the 10,000 character limit.

The range 0x4E00-0x7FFF contains 12,800 characters (0x7FFF - 0x4E00 + 1), which exceeds MaxGeneratedRangeSize (10,000) and will throw ArgumentOutOfRangeException when CharacterWhitelist is accessed.

🐛 Split the large range into smaller chunks
         public override string CharacterWhitelist => 
-            GenerateCharacterRange(0x4E00, 0x7FFF) + 
+            GenerateCharacterRange(0x4E00, 0x6FFF) + 
+            GenerateCharacterRange(0x7000, 0x7FFF) + 
             GenerateCharacterRange(0x8000, 0x9FFF) + 
             GenerateCharacterRange(0x3400, 0x4DBF) + 
             GenerateCharacterRange(0xF900, 0xFAFF) + 
             "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz "; // Full CJK ideographs

Range sizes after fix:

  • 0x4E00-0x6FFF: 8,704 chars ✓
  • 0x7000-0x7FFF: 4,096 chars ✓
  • 0x8000-0x9FFF: 8,192 chars ✓
  • 0x3400-0x4DBF: 6,592 chars ✓
  • 0xF900-0xFAFF: 512 chars ✓
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public override string CharacterWhitelist =>
GenerateCharacterRange(0x4E00, 0x7FFF) +
GenerateCharacterRange(0x8000, 0x9FFF) +
GenerateCharacterRange(0x3400, 0x4DBF) +
GenerateCharacterRange(0xF900, 0xFAFF) +
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz "; // Full CJK ideographs
public override string CharacterWhitelist =>
GenerateCharacterRange(0x4E00, 0x6FFF) +
GenerateCharacterRange(0x7000, 0x7FFF) +
GenerateCharacterRange(0x8000, 0x9FFF) +
GenerateCharacterRange(0x3400, 0x4DBF) +
GenerateCharacterRange(0xF900, 0xFAFF) +
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz "; // Full CJK ideographs
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs` around lines 16 - 21,
CharacterWhitelist currently calls GenerateCharacterRange(0x4E00, 0x7FFF) which
exceeds MaxGeneratedRangeSize and throws; split that large range into two valid
chunks (for example 0x4E00-0x6FFF and 0x7000-0x7FFF) and combine them with the
other existing GenerateCharacterRange calls (0x8000-0x9FFF, 0x3400-0x4DBF,
0xF900-0xFAFF) so each GenerateCharacterRange invocation stays <=
MaxGeneratedRangeSize; update the CharacterWhitelist getter to concatenate the
new smaller ranges (referencing the CharacterWhitelist property and
GenerateCharacterRange method) and ensure ordering/spacing remains the same.

Comment on lines +178 to +187
// Keep all Japanese text (Hiragana/Katakana/Kanji characters) since Japanese words are meaningful
// even when split by OCR
if (hasJapanese) return false;

// For mixed Japanese-Latin words, be more lenient
if (hasJapanese && hasLatin) return false;

// For non-Japanese text, use standard filtering (filter very short words)
return word.Length < 2;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Dead code: mixed Japanese-Latin check is unreachable.

Same issue as ThaiLanguageProcessor: line 180 returns when hasJapanese is true, making line 183 (if (hasJapanese && hasLatin)) unreachable.

🔧 Remove unreachable code
             // Keep all Japanese text (Hiragana/Katakana/Kanji characters) since Japanese words are meaningful
             // even when split by OCR
             if (hasJapanese) return false;
             
-            // For mixed Japanese-Latin words, be more lenient
-            if (hasJapanese && hasLatin) return false;
-            
             // For non-Japanese text, use standard filtering (filter very short words)
             return word.Length < 2;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs` around lines 178 -
187, The check for mixed Japanese-Latin (hasJapanese && hasLatin) is unreachable
because the earlier hasJapanese check returns immediately; in
JapaneseLanguageProcessor.cs move the mixed-language check before the plain
hasJapanese check (or remove the redundant plain hasJapanese branch) so that the
mixed-case logic runs; update the same method that contains the
hasJapanese/hasLatin variables (the word-filtering method) to evaluate
hasJapanese && hasLatin first and then fall back to hasJapanese-only and the
non-Japanese length check.

Comment on lines +217 to +220
// Common katakana OCR confusions
result = result.Replace('ヲ', 'ヲ').Replace('ヮ', 'ワ').Replace('ヰ', 'イ').Replace('ヱ', 'エ').Replace('ヲ', 'オ');

return result.ToLowerInvariant();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

No-op replacement and potential error in normalization.

Line 218 contains result.Replace('ヲ', 'ヲ') which replaces a character with itself (no-op). Later on the same line, 'ヲ' is also replaced with 'オ', but the first replacement does nothing and may indicate a copy-paste error where a different character was intended.

🔧 Remove or fix no-op replacement
-            result = result.Replace('ヲ', 'ヲ').Replace('ヮ', 'ワ').Replace('ヰ', 'イ').Replace('ヱ', 'エ').Replace('ヲ', 'オ');
+            result = result.Replace('ヮ', 'ワ').Replace('ヰ', 'イ').Replace('ヱ', 'エ').Replace('ヲ', 'オ');
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Common katakana OCR confusions
result = result.Replace('ヲ', 'ヲ').Replace('ヮ', 'ワ').Replace('ヰ', 'イ').Replace('ヱ', 'エ').Replace('ヲ', 'オ');
return result.ToLowerInvariant();
// Common katakana OCR confusions
result = result.Replace('ヮ', 'ワ').Replace('ヰ', 'イ').Replace('ヱ', 'エ').Replace('ヲ', 'オ');
return result.ToLowerInvariant();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs` around lines 217 -
220, The chained replacements in JapaneseLanguageProcessor use
result.Replace('ヲ', 'ヲ') which is a no-op and duplicates a later mapping to 'ヲ'
-> 'オ'; remove the redundant result.Replace('ヲ', 'ヲ') (or correct it to the
intended source character if it was a typo) so the Replace chain on the local
variable result only contains the correct mappings (e.g. keep
result.Replace('ヲ', 'オ') and the other mappings for 'ヮ','ヰ','ヱ'), then return
result.ToLowerInvariant().

Comment on lines +113 to +116
var toneMarks = new[] {'่', '้', '๊', '๋', '่', '้', '๊', '๋'}; // Different tone marks
bool aIsTone = toneMarks.Contains(a);
bool bIsTone = toneMarks.Contains(b);
if (aIsTone && bIsTone) return 1; // Low cost for tone mark differences
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Tone marks array contains duplicates.

The toneMarks array contains each tone mark twice ('่', '้', '๊', '๋' repeated), which is unnecessary and may indicate a copy-paste error.

🔧 Remove duplicate tone marks
-            var toneMarks = new[] {'่', '้', '๊', '๋', '่', '้', '๊', '๋'}; // Different tone marks
+            var toneMarks = new[] {'่', '้', '๊', '๋'}; // Thai tone marks
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
var toneMarks = new[] {'่', '้', '๊', '๋', '่', '้', '๊', '๋'}; // Different tone marks
bool aIsTone = toneMarks.Contains(a);
bool bIsTone = toneMarks.Contains(b);
if (aIsTone && bIsTone) return 1; // Low cost for tone mark differences
var toneMarks = new[] {'่', '้', '๊', '๋'}; // Thai tone marks
bool aIsTone = toneMarks.Contains(a);
bool bIsTone = toneMarks.Contains(b);
if (aIsTone && bIsTone) return 1; // Low cost for tone mark differences
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs` around lines 113 - 116,
The toneMarks array in ThaiLanguageProcessor.cs contains duplicate entries;
update the definition of toneMarks so it only lists each Thai tone mark once
(remove the repeated '่', '้', '๊', '๋'), keeping the subsequent logic that
computes aIsTone and bIsTone and the early return if both are tone marks
unchanged.

Comment on lines +136 to +145

// Keep all Thai text since Thai words are meaningful even when split by OCR
if (hasThai) return false;

// For mixed Thai-Latin words, be more lenient
if (hasThai && hasLatin) return false;

// For non-Thai text, use standard filtering (filter very short words)
return word.Length < 2;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Dead code: mixed Thai-Latin check is unreachable.

Line 138 returns false when hasThai is true, making the subsequent check on line 141 (if (hasThai && hasLatin)) unreachable.

🔧 Remove unreachable code
             // Keep all Thai text since Thai words are meaningful even when split by OCR
             if (hasThai) return false;
             
-            // For mixed Thai-Latin words, be more lenient
-            if (hasThai && hasLatin) return false;
-            
             // For non-Thai text, use standard filtering (filter very short words)
             return word.Length < 2;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Keep all Thai text since Thai words are meaningful even when split by OCR
if (hasThai) return false;
// For mixed Thai-Latin words, be more lenient
if (hasThai && hasLatin) return false;
// For non-Thai text, use standard filtering (filter very short words)
return word.Length < 2;
}
// Keep all Thai text since Thai words are meaningful even when split by OCR
if (hasThai) return false;
// For non-Thai text, use standard filtering (filter very short words)
return word.Length < 2;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs` around lines 136 - 145,
The check for mixed Thai-Latin (if (hasThai && hasLatin)) is unreachable because
the prior if (hasThai) returns early; in the ThaiLanguageProcessor method
containing the hasThai/hasLatin logic, either remove the redundant mixed check
or move the mixed-case branch above the plain hasThai branch so mixed words are
handled first (i.e., evaluate if (hasThai && hasLatin) before the if (hasThai)
return false) and ensure the final return applies only to non-Thai words.

Comment on lines +188 to +210
private static string NormalizeThaiCharacters(string input)
{
string result = NormalizeFullWidthCharacters(input);

// Basic Thai tone mark normalization
result = result.Normalize(System.Text.NormalizationForm.FormC);

// Common Thai OCR confusions and character variations
result = result.Replace('ซ', 'ศ').Replace('ศ', 'ษ'); // so variations normalization
result = result.Replace('ผ', 'ฝ'); // pho/fo confusion
result = result.Replace('บ', 'ป'); // bo/po confusion
result = result.Replace('ด', 'ต'); // do/to confusion
result = result.Replace('อ', 'โ'); // o/o form variations

// Remove or normalize common diacritic issues
result = result.Replace("์", ""); // Remove karan (silent marker) for comparison

// Normalize similar vowel forms
result = result.Replace('ใ', 'ไ'); // ai vowel variations
result = result.Replace('ำ', 'ํ'); // am vowel variations

return result.ToLowerInvariant();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Thai character normalization is destructive and may cause false matches.

NormalizeThaiCharacters chains multiple one-way replacements (e.g., ซ→ศ then ศ→ษ) which:

  1. Loses the original character information
  2. May cause unrelated words to match incorrectly

This could reduce OCR accuracy rather than improve it. Consider whether these normalizations should be bidirectional similarity mappings (like in GetThaiCharacterDifference) rather than destructive replacements.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs` around lines 188 - 210,
The NormalizeThaiCharacters method currently performs destructive chained
one-way replacements (e.g., 'ซ'→'ศ' then 'ศ'→'ษ') which loses original info and
risks false matches; replace that logic with a single-pass, non-destructive
canonicalization using a lookup map or grouping (e.g., a Dictionary<char,char>
or HashSet groups) that maps each character to a chosen canonical representative
(or leaves it unchanged) instead of chaining Replace calls, or remove those
replacements entirely and rely on the existing GetThaiCharacterDifference
similarity logic at comparison time; update NormalizeThaiCharacters (and
preserve NormalizeFullWidthCharacters usage) so it iterates characters once,
applies the canonical map, strips karan and diacritics as needed, and returns
the normalized string without multi-step cascading replaces.

Comment on lines +111 to +115
private static bool IsCJKLocale()
{
var locale = ApplicationSettings.GlobalReadonlySettings.Locale;
return locale == "ko" || locale == "zh-hans" || locale == "zh-hant" || locale == "ja";
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use injected settings for locale detection instead of global static state.

IsCJKLocale() reads ApplicationSettings.GlobalReadonlySettings.Locale, which can diverge from _settings (and can be null in test/headless flows). This can misapply CJK heuristics or throw unexpectedly.

💡 Proposed fix
 private static bool IsCJKLocale()
 {
-    var locale = ApplicationSettings.GlobalReadonlySettings.Locale;
+    var locale = _settings?.Locale ?? ApplicationSettings.GlobalReadonlySettings?.Locale;
+    if (string.IsNullOrEmpty(locale))
+        return false;
     return locale == "ko" || locale == "zh-hans" || locale == "zh-hant" || locale == "ja";
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Ocr.cs` around lines 111 - 115, IsCJKLocale currently reads
ApplicationSettings.GlobalReadonlySettings.Locale (a global static) which can
differ from the injected _settings or be null; change it to use the injected
settings instance (e.g., _settings.Locale or Settings.Locale) and stop using the
global static: make IsCJKLocale non-static if necessary so it can access the
instance field, read the locale from _settings with a null-safe check (default
to empty string) and then return locale == "ko" || locale == "zh-hans" || locale
== "zh-hant" || locale == "ja".

Comment on lines +191 to +199
try
{
_tesseractService.Init();
}
catch (Exception ex)
{
Main.AddLog($"ERROR: Failed to initialize TesseractService: {ex.Message}");
_tesseractService = null;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid nulling _tesseractService on init failure; it creates new null-state crash paths.

Setting _tesseractService = null here pushes failures downstream into methods that still dereference it (ProcessSnapIt, profile scan paths, test entry points). Prefer surfacing init failure explicitly (or tracking an init-failed flag) rather than mutating the service reference to null.

Based on learnings: TesseractService properly handles TesseractExceptions from CreateEngine() calls at a higher level, displaying an error popup to the user without requiring additional try/catch blocks around individual CreateEngine() calls.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Ocr.cs` around lines 191 - 199, The catch block in the
_tesseractService.Init() call should not set _tesseractService = null because
callers like ProcessSnapIt, profile scan paths, and tests still dereference it;
instead remove the null assignment and either rethrow the exception or set a
dedicated init-failed flag (e.g. add a private bool _tesseractInitFailed) so
callers can explicitly check initialization state; update places that assume
non-null (e.g. ProcessSnapIt and any test entry points) to check the new flag or
handle the exception, and retain the Main.AddLog(...) call to record the error.

Comment on lines +2901 to +2931
internal static List<string> ProcessRewardScreenForTest(Bitmap screenshot, IWindowInfoService windowService)
{
var results = new List<string>();
windowService.UseImage(screenshot);

List<Bitmap> parts;
try
{
parts = ExtractPartBoxAutomatically(out uiScaling, out _, screenshot);
}
catch (Exception e)
{
Main.AddLog("Test ProcessReward: ExtractPartBoxAutomatically failed: " + e.Message);
return results;
}

int engineCount = Math.Min(parts.Count, _tesseractService.Engines.Length);
string[] checks = new string[parts.Count];
Task[] tasks = new Task[engineCount];
for (int i = 0; i < engineCount; i++)
{
int tempI = i;
tasks[i] = Task.Factory.StartNew(() => { checks[tempI] = GetTextFromImage(parts[tempI], _tesseractService.Engines[tempI]); });
}
Task.WaitAll(tasks);

// Process remaining parts sequentially if more parts than engines
for (int i = engineCount; i < parts.Count; i++)
{
checks[i] = GetTextFromImage(parts[i], _tesseractService.FirstEngine);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Test entry points should guard _tesseractService before engine access.

InitForTest() can set _tesseractService to null on init failure, but both test runners dereference _tesseractService unconditionally. Add early guard/return to avoid NullReferenceException in the test framework.

🧪 Proposed fix
 internal static List<string> ProcessRewardScreenForTest(Bitmap screenshot, IWindowInfoService windowService)
 {
     var results = new List<string>();
+    if (_tesseractService == null)
+    {
+        Main.AddLog("Test ProcessReward: TesseractService is not initialized");
+        return results;
+    }
     windowService.UseImage(screenshot);
     ...
 }

 internal static List<string> ProcessSnapItForTest(Bitmap screenshot, IWindowInfoService windowService)
 {
     var results = new List<string>();
+    if (_tesseractService == null)
+    {
+        Main.AddLog("Test SnapIt: TesseractService is not initialized");
+        return results;
+    }
     windowService.UseImage(screenshot);
     ...
 }

Also applies to: 2954-2968

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Ocr.cs` around lines 2901 - 2931, ProcessRewardScreenForTest currently
dereferences _tesseractService without checking for null (InitForTest can set it
to null); add an early guard at the top of ProcessRewardScreenForTest that
returns an empty results list if _tesseractService is null (or if
_tesseractService.Engines or FirstEngine are missing), and mirror the same
null-guard pattern in the other test entry point around lines 2954-2968;
reference symbols: ProcessRewardScreenForTest, _tesseractService, InitForTest,
GetTextFromImage, and ensure you check for _tesseractService == null (and
safe-check Engines/FirstEngine) before accessing them.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

♻️ Duplicate comments (8)
WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs (2)

224-224: ⚠️ Potential issue | 🟡 Minor

Remove the no-op replacement in normalization chain.

Replace('ヲ', 'ヲ') has no effect and only adds noise.

Small cleanup
-            result = result.Replace('ヲ', 'ヲ').Replace('ヮ', 'ワ').Replace('ヰ', 'イ').Replace('ヱ', 'エ').Replace('ヲ', 'オ');
+            result = result.Replace('ヮ', 'ワ').Replace('ヰ', 'イ').Replace('ヱ', 'エ').Replace('ヲ', 'オ');
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs` at line 224, In the
normalization chain inside JapaneseLanguageProcessor (the result.Replace(...)
sequence), remove the no-op call Replace('ヲ','ヲ') so the chain only contains
meaningful mappings (keep the intended Replace('ヲ','オ') and other replacements);
update the result.Replace(...) call to eliminate the redundant 'ヲ'->'ヲ' entry.

186-190: ⚠️ Potential issue | 🟡 Minor

Mixed-script branch is unreachable.

if (hasJapanese) return false; makes if (hasJapanese && hasLatin) dead code.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs` around lines 186 -
190, The first check "if (hasJapanese) return false;" makes the subsequent
mixed-script branch "if (hasJapanese && hasLatin)" unreachable; update the logic
in JapaneseLanguageProcessor so the first condition only handles Japanese-only
text (e.g., change it to check hasJapanese && !hasLatin) and keep the
mixed-script branch to handle hasJapanese && hasLatin; ensure you modify the
conditions around the hasJapanese and hasLatin checks rather than removing the
mixed branch.
WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs (2)

137-142: ⚠️ Potential issue | 🟡 Minor

Mixed Thai-Latin branch is unreachable.

if (hasThai) return false; makes if (hasThai && hasLatin) dead code.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs` around lines 137 - 142,
The first if-statement makes the mixed Thai-Latin branch unreachable; inside the
ThaiLanguageProcessor (where hasThai and hasLatin are evaluated) change the
conditions so they are mutually exclusive — e.g., replace the initial check with
a Thai-only condition (hasThai && !hasLatin) to return false, then keep the
mixed case (hasThai && hasLatin) as the subsequent branch, or alternatively
merge them into a single hasThai check if both should behave the same; update
the ifs around hasThai/hasLatin so the mixed branch is reachable.

196-207: ⚠️ Potential issue | 🟠 Major

Cascading replacements in NormalizeThaiCharacters collapse distinct letters.

Replace('ซ','ศ').Replace('ศ','ษ') maps both and to , which can increase false positives.

Safer one-pass canonicalization pattern
-            result = result.Replace('ซ', 'ศ').Replace('ศ', 'ษ'); // so variations normalization
+            var sb = new StringBuilder(result.Length);
+            foreach (char c in result)
+            {
+                sb.Append(c switch
+                {
+                    'ซ' => 'ศ',
+                    'ศ' => 'ศ',
+                    'ษ' => 'ศ',
+                    _ => c
+                });
+            }
+            result = sb.ToString();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs` around lines 196 - 207,
The chained Replace calls in NormalizeThaiCharacters (e.g., result =
result.Replace('ซ','ศ').Replace('ศ','ษ')) cause cascading remaps that collapse
distinct letters; replace this pattern with a single-pass canonicalization using
a mapping table/dictionary applied per character (or a Regex.Replace with a
callback) so each original character is mapped once to its canonical counterpart
(handle 'ซ','ศ','ษ' and other pairs via explicit map entries), and remove the
chained Replace sequences in NormalizeThaiCharacters to prevent unintended
transitive replacements.
WFInfo/Main.cs (1)

266-275: ⚠️ Potential issue | 🟠 Major

RunOnUIThread still allows lifecycle races via callsite lambdas.

The helper captures mw, but callsites still execute actions that dereference MainWindow.INSTANCE later. Pass the captured window into the delegate to avoid re-reading global state after lifecycle changes.

Suggested refactor
-        public static void RunOnUIThread(Action act)
+        public static void RunOnUIThread(Action<MainWindow> act)
         {
-            var mw = MainWindow.INSTANCE;
-            if (mw?.Dispatcher != null && !mw.Dispatcher.HasShutdownStarted && !mw.Dispatcher.HasShutdownFinished)
+            var mw = MainWindow.INSTANCE;
+            var dispatcher = mw?.Dispatcher;
+            if (dispatcher != null && !dispatcher.HasShutdownStarted && !dispatcher.HasShutdownFinished)
             {
-                if (mw.Dispatcher.CheckAccess())
-                    act();
+                if (dispatcher.CheckAccess())
+                    act(mw);
                 else
-                    mw.Dispatcher.Invoke(act);
+                    dispatcher.Invoke(() => act(mw));
             }
         }

-            RunOnUIThread(() => { MainWindow.INSTANCE.ChangeStatus(message, severity); });
+            RunOnUIThread(w => w.ChangeStatus(message, severity));

-            RunOnUIThread(() => { MainWindow.INSTANCE.LoggedIn(); });
+            RunOnUIThread(w => w.LoggedIn());

-            RunOnUIThread(() => { MainWindow.INSTANCE.SignOut(); });
+            RunOnUIThread(w => w.SignOut());
#!/bin/bash
# Verify current callsites still dereference MainWindow.INSTANCE inside RunOnUIThread lambdas.
rg -nP 'RunOnUIThread\s*\(\s*\(\)\s*=>\s*\{\s*MainWindow\.INSTANCE\.' WFInfo/Main.cs -C2

Also applies to: 311-311, 553-553, 623-623

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Main.cs` around lines 266 - 275, RunOnUIThread currently captures
MainWindow.INSTANCE but still lets callers dereference MainWindow.INSTANCE
inside their lambdas, creating lifecycle races; change RunOnUIThread to pass the
captured window into the delegate to avoid re-reading the global state—update
the method signature from Action act to Action<MainWindow> act (or alternately
keep Action and internally invoke a wrapper that calls actWithWindow(mw)),
ensure the implementation uses the captured mw when invoking
(CheckAccess/Invoke) and update all callsites (those that reference
MainWindow.INSTANCE inside their lambdas) to accept the MainWindow parameter and
use that captured instance instead of re-reading MainWindow.INSTANCE.
WFInfo/Ocr.cs (2)

195-199: ⚠️ Potential issue | 🟠 Major

Don’t null _tesseractService on init failure.

Line [198] and Line [3008] convert initialization errors into a latent null-state that later paths still dereference.

🛠️ Proposed direction
+        private static bool _tesseractInitFailed;
@@
             catch (Exception ex)
             {
                 Main.AddLog($"ERROR: Failed to initialize TesseractService: {ex.Message}");
-                _tesseractService = null;
+                _tesseractInitFailed = true;
             }
@@
             catch (Exception ex)
             {
                 Main.AddLog($"ERROR: Failed to initialize TesseractService in test mode: {ex.Message}");
-                _tesseractService = null;
+                _tesseractInitFailed = true;
             }

Then gate OCR entry points with _tesseractInitFailed (or rethrow) instead of relying on null.

Based on learnings: TesseractService properly handles TesseractExceptions from CreateEngine() calls at a higher level, displaying an error popup to the user without requiring additional try/catch blocks around individual CreateEngine() calls.

Also applies to: 3005-3009

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Ocr.cs` around lines 195 - 199, The catch block that handles
TesseractService initialization should not convert the failure into a latent
null-state by setting _tesseractService = null; instead record the failure via a
boolean flag (e.g., _tesseractInitFailed) or rethrow so callers can handle it
explicitly; update the initialization catch (around TesseractService creation)
to set _tesseractInitFailed = true (and log the error) rather than nulling
_tesseractService, and add guards at OCR entry points that currently dereference
_tesseractService to check _tesseractInitFailed (or handle the rethrown
exception) before using TesseractService.

111-115: ⚠️ Potential issue | 🟠 Major

Use injected settings for CJK locale detection.

Line [113] reads locale from global static settings, which can drift from _settings in headless/test flows and apply wrong CJK heuristics.

💡 Proposed fix
 private static bool IsCJKLocale()
 {
-    var locale = ApplicationSettings.GlobalReadonlySettings.Locale;
+    var locale = _settings?.Locale ?? ApplicationSettings.GlobalReadonlySettings?.Locale;
+    if (string.IsNullOrEmpty(locale))
+        return false;
     return locale == "ko" || locale == "zh-hans" || locale == "zh-hant" || locale == "ja";
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Ocr.cs` around lines 111 - 115, IsCJKLocale currently reads
ApplicationSettings.GlobalReadonlySettings.Locale (a static source) which can
diverge from the injected _settings used in headless/tests; change IsCJKLocale
to use the injected settings by either making it an instance method that reads
_settings.Locale or add a parameter (locale) and have callers pass
_settings.Locale. Update the IsCJKLocale implementation to check
_settings.Locale (or the passed-in locale) for "ko", "zh-hans", "zh-hant", or
"ja" and remove the reliance on
ApplicationSettings.GlobalReadonlySettings.Locale so tests and headless flows
use the injected configuration.
WFInfo/Services/TesseractService.cs (1)

174-176: ⚠️ Potential issue | 🔴 Critical

CreateEngine() now depends on factory init, but constructor creates engines first.

Line [174] can throw InvalidOperationException if LanguageProcessorFactory.Initialize(...) has not run yet, and this path executes during constructor engine creation.

🛡️ Safer fallback in engine creation
-            var processor = LanguageProcessorFactory.GetProcessor(Locale);
+            LanguageProcessor processor = null;
+            try
+            {
+                processor = LanguageProcessorFactory.GetProcessor(Locale);
+            }
+            catch (InvalidOperationException)
+            {
+                Main.AddLog($"LanguageProcessorFactory not initialized for locale '{Locale}', using fallback whitelist");
+            }
             var whitelist = processor?.CharacterWhitelist ?? DefaultWhitelist;
             engine.SetVariable("tessedit_char_whitelist", whitelist);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Services/TesseractService.cs` around lines 174 - 176, CreateEngine()
currently calls LanguageProcessorFactory.GetProcessor(Locale) which can throw
InvalidOperationException if the factory hasn't been initialized yet; wrap the
GetProcessor call in a try/catch (catch InvalidOperationException) or otherwise
guard with a factory-initialized check, and fall back to DefaultWhitelist when
the factory is not available so constructor engine creation cannot throw; update
the CreateEngine method to use the safe fallback (use
processor?.CharacterWhitelist ?? DefaultWhitelist) only after catching/avoiding
the factory exception and then call
engine.SetVariable("tessedit_char_whitelist", whitelist).
🧹 Nitpick comments (5)
WFInfo/WFInfo.csproj (1)

135-135: Generate update.xml only for Release builds.

This target currently runs for Debug too. Limiting it to Release reduces accidental non-release artifacts and keeps local build outputs cleaner.

♻️ Proposed tweak
-  <Target Name="GenerateUpdateXml" AfterTargets="Build">
+  <Target Name="GenerateUpdateXml" AfterTargets="Build" Condition="'$(Configuration)'=='Release'">
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/WFInfo.csproj` at line 135, The GenerateUpdateXml MSBuild target
(Target Name="GenerateUpdateXml") currently runs for all configurations;
restrict it to Release-only by adding a configuration condition to the target
(e.g., add Condition="'$(Configuration)' == 'Release'" on the Target element) so
the GenerateUpdateXml target executes only when building the Release
configuration.
WFInfo/LanguageProcessing/LanguageProcessor.cs (1)

34-39: Harden blueprint-removal cache init against null arrays/entries.

A single processor returning null (array or element) can throw during base construction and break processor creation.

Suggested guardrails
             _normalizedBlueprintRemovalsCache.GetOrAdd(concreteType, type => 
             {
-                var blueprintRemovals = BlueprintRemovals;
+                var blueprintRemovals = BlueprintRemovals ?? Array.Empty<string>();
                 var normalized = new string[blueprintRemovals.Length];
                 for (int i = 0; i < blueprintRemovals.Length; i++)
                 {
-                    normalized[i] = blueprintRemovals[i].ToLowerInvariant();
+                    normalized[i] = (blueprintRemovals[i] ?? string.Empty).ToLowerInvariant();
                 }
                 return normalized;
             });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/LanguageProcessor.cs` around lines 34 - 39, The
blueprint-removal cache init fails if BlueprintRemovals or any element is null;
in the LanguageProcessor constructor (the block using BlueprintRemovals and
building normalized), guard against null by treating BlueprintRemovals as an
empty array when null and replace null elements with String.Empty (or skip them)
before calling ToLowerInvariant; update the code that creates normalized
(referencing BlueprintRemovals, normalized, and the for-loop index i) to
null-check the source array and each entry so construction cannot throw
NullReferenceException.
WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs (1)

23-81: Deduplicate repeated transliteration entries in koreanReplacements.

At least one mapping appears more than once ("류" -> "ryu"), which adds unnecessary replace passes and maintenance noise.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs` around lines 23 - 81,
The koreanReplacements list in KoreanLanguageProcessor.cs contains duplicate
transliteration entries (e.g. "류" -> "ryu") which cause redundant replacements
and maintenance noise; remove the duplicate(s) from the static
List<KeyValuePair<string,string>> koreanReplacements so each Hangul key appears
only once (keep the intended canonical mapping), and scan the list in the
KoreanLanguageProcessor class for any other repeated keys to deduplicate them.
Ensure only a single KeyValuePair for "류" -> "ryu" remains and preserve existing
order/behavior otherwise.
WFInfo/Services/TesseractService.cs (1)

247-247: Dispose WebClient in getLocaleTessdata().

Line [247] allocates WebClient without disposal; repeated reload paths should not leak handles/sockets.

♻️ Proposed fix
-            WebClient webClient = CustomEntrypoint.CreateNewWebClient();
+            using WebClient webClient = CustomEntrypoint.CreateNewWebClient();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Services/TesseractService.cs` at line 247, getLocaleTessdata is
creating a WebClient via CustomEntrypoint.CreateNewWebClient() without disposing
it, which leaks handles; wrap the creation in a using statement (or ensure
Dispose in a try/finally) so the WebClient is disposed after use, e.g., replace
the raw allocation with a using (var webClient =
CustomEntrypoint.CreateNewWebClient()) { ... } pattern inside getLocaleTessdata
to guarantee proper resource cleanup.
WFInfo/Data.cs (1)

879-883: Reduce lock scope around localized-name matching.

processor.GetLocalizedNameData(...) can scan many items and perform distance calculations; holding marketItemsLock for that full duration increases contention. Snapshot under lock, then compute outside.

♻️ Proposed refactor
         public string GetLocaleNameData(string s, bool useLevenshtein)
         {
             var processor = LanguageProcessorFactory.GetCurrentProcessor();
+            JObject marketItemsSnapshot;
             lock (marketItemsLock)
             {
-                return processor.GetLocalizedNameData(s, marketItems, useLevenshtein);
+                marketItemsSnapshot = marketItems;
             }
+            return processor.GetLocalizedNameData(s, marketItemsSnapshot, useLevenshtein);
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Data.cs` around lines 879 - 883, The code currently holds
marketItemsLock while calling LanguageProcessorFactory.GetCurrentProcessor() and
processor.GetLocalizedNameData(s, marketItems, useLevenshtein), which can be
long-running; instead take a short snapshot of the shared data under
marketItemsLock (e.g., copy marketItems into a local immutable/list/array) and
then release the lock and call processor.GetLocalizedNameData using that
snapshot (keep calling LanguageProcessorFactory.GetCurrentProcessor() outside
the lock if needed), ensuring only the minimal read/copy is protected by
marketItemsLock.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@TestCharacterRanges.cs`:
- Around line 53-57: The TestApplicationSettings class currently only implements
Locale but must implement all members of IReadOnlyApplicationSettings; either
implement every required property/method on TestApplicationSettings (matching
the interface signatures) by adding the missing properties like ActivationKey,
ActivationKeyKey, ActivationMouseButton, Auto, AutoCSV, AutoCount, AutoDelay,
AutoList, CF_pBMax, CF_pBMin, etc., providing appropriate test-friendly default
values or auto-properties, or change TestApplicationSettings to be abstract so
it does not need to supply implementations; ensure each member name and type
exactly matches the interface contracts and that any methods/properties required
by IReadOnlyApplicationSettings are present (you can use simple defaults or
throw NotImplementedException for members not used in tests).
- Line 12: The method TestCharacterRanges() conflicts with the enclosing class
TestCharacterRanges because in C# a method cannot have the same name as its
class; rename the method (e.g., to RunTestCharacterRanges or
ExecuteTestCharacterRanges) and update any callers, or if you intended a
constructor remove the return type and make it parameterless constructor; modify
the method signature referenced as TestCharacterRanges() accordingly and ensure
references are updated to the new method name.

In `@WFInfo/Data.cs`:
- Around line 466-468: The fallback path currently reads and deserializes JSON
into JObject (response → data) and returns it without validating its shape;
update the fallback branch in the method that uses localeSpecificFallbackPath to
run the same payload validation routine you use for the online payload (e.g.,
reuse the existing ValidatePayload/ValidateResponse logic or parsing into the
same DTO/type) before returning (data, true), and do not catch/desuppress
deserialization or validation exceptions — allow them to bubble up so malformed
fallback JSON fails loudly.

In `@WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs`:
- Line 226: The CharacterWhitelist property is missing the space character
causing multi-word OCR results to collapse; update the overridden
CharacterWhitelist string in the EuropeanLanguageProcessor (the
CharacterWhitelist property) to include a space (' ') alongside the existing
characters so Italian text preserves word boundaries, matching the other
European processors' whitelists.

In `@WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs`:
- Around line 87-129: GetJapaneseCharacterDifference currently allocates
hiraganaKatakanaPairs and similarChars on every call, causing GC pressure; move
these into class-level static readonly fields (e.g., static readonly
Dictionary<char,char> HiraganaKatakanaPairs and static readonly
HashSet<(char,char)> or static readonly (char,char)[] SimilarCharPairs) and
initialize them once (static constructor or inline). Update
GetJapaneseCharacterDifference to reference those static fields (use TryGetValue
on HiraganaKatakanaPairs and check membership in SimilarCharPairs without
creating new arrays) so no per-call allocations occur.

In `@WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs`:
- Around line 25-30: CalculateLevenshteinDistance currently calls
ContainsThai(s) and ContainsThai(t) without guarding against null; update
CalculateLevenshteinDistance to null-check inputs (s and t) before calling
ContainsThai and handle nulls appropriately (e.g., treat null as empty string or
short-circuit to default behavior), and apply the same null-safety change to the
other affected method around lines 150-156 that also calls ContainsThai; ensure
the checks are added where ContainsThai is invoked so ContainsThai is never
iterated over a null value.

In `@WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs`:
- Around line 56-72: NormalizeTurkishCharacters currently calls ToLowerInvariant
which makes the subsequent uppercase Replace(...) calls unreachable and also
mishandles 'İ' (U+0130) by producing an "i" plus a combining dot; fix by
applying Turkish-aware casing and explicit I/İ handling before ASCII-folding: in
NormalizeTurkishCharacters first map 'İ'→'i' and 'I'→'ı' (or call ToLower with
new CultureInfo("tr-TR")) so Unicode special-case mapping for Turkish I is
applied, then perform the ASCII-fold replacements for 'ğ','ş','ç','ö','ü' and
finally map 'ı'→'i'; remove the now-unnecessary uppercase Replace(...) calls.

In `@WFInfo/Main.cs`:
- Around line 98-104: The dispatcher call in UpdateMarketStatusAsync using
MainWindow.INSTANCE.Dispatcher must be guarded against shutdown: before calling
disp.InvokeAsync(() => wnd.UpdateMarketStatus(msg)) check disp != null &&
!disp.HasShutdownStarted && !disp.HasShutdownFinished and ensure wnd is not
null, then await the returned DispatcherOperation.Task and wrap the await in a
try/catch that catches TaskCanceledException and OperationCanceledException (and
optionally InvalidOperationException) so exceptions during dispatcher shutdown
are observed/handled instead of being silently lost; reference
MainWindow.INSTANCE, Dispatcher, InvokeAsync, and UpdateMarketStatus when
applying this change.

In `@WFInfo/Tests/OCRTestRunner.cs`:
- Around line 133-138: After deserializing into TestCase, guard against testCase
being null: check if testCase == null right after
JsonConvert.DeserializeObject<TestCase>(...) and handle it explicitly (for
example, throw a descriptive exception or set the runner result to a
failed/error state and return) instead of dereferencing; ensure you reference
the same symbols used in the diff (testCase, result.Language, result.Theme,
result.Category, result.ExpectedParts, and testCase.Parts) so the code either
assigns defaults or aborts cleanly when the JSON is malformed.

In `@WFInfo/WFInfo.csproj`:
- Around line 50-52: The project currently keeps UpdateAssemblyVersion in
WFInfo.csproj in sync with AssemblyVersion in Properties/AssemblyInfo.cs, which
risks drift; remove the hardcoded <UpdateAssemblyVersion> entry and instead
derive the updater version by running the MSBuild GetAssemblyIdentity task
against the built assembly (the output assembly from the project) and assign its
Version metadata to the property used when generating update.xml; update the
MSBuild target that writes update.xml to use the property produced by
GetAssemblyIdentity (referencing the project output assembly and the
GetAssemblyIdentity task) so the value written to update.xml always matches the
compiled AssemblyVersion declared in AssemblyInfo.cs.

---

Duplicate comments:
In `@WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs`:
- Line 224: In the normalization chain inside JapaneseLanguageProcessor (the
result.Replace(...) sequence), remove the no-op call Replace('ヲ','ヲ') so the
chain only contains meaningful mappings (keep the intended Replace('ヲ','オ') and
other replacements); update the result.Replace(...) call to eliminate the
redundant 'ヲ'->'ヲ' entry.
- Around line 186-190: The first check "if (hasJapanese) return false;" makes
the subsequent mixed-script branch "if (hasJapanese && hasLatin)" unreachable;
update the logic in JapaneseLanguageProcessor so the first condition only
handles Japanese-only text (e.g., change it to check hasJapanese && !hasLatin)
and keep the mixed-script branch to handle hasJapanese && hasLatin; ensure you
modify the conditions around the hasJapanese and hasLatin checks rather than
removing the mixed branch.

In `@WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs`:
- Around line 137-142: The first if-statement makes the mixed Thai-Latin branch
unreachable; inside the ThaiLanguageProcessor (where hasThai and hasLatin are
evaluated) change the conditions so they are mutually exclusive — e.g., replace
the initial check with a Thai-only condition (hasThai && !hasLatin) to return
false, then keep the mixed case (hasThai && hasLatin) as the subsequent branch,
or alternatively merge them into a single hasThai check if both should behave
the same; update the ifs around hasThai/hasLatin so the mixed branch is
reachable.
- Around line 196-207: The chained Replace calls in NormalizeThaiCharacters
(e.g., result = result.Replace('ซ','ศ').Replace('ศ','ษ')) cause cascading remaps
that collapse distinct letters; replace this pattern with a single-pass
canonicalization using a mapping table/dictionary applied per character (or a
Regex.Replace with a callback) so each original character is mapped once to its
canonical counterpart (handle 'ซ','ศ','ษ' and other pairs via explicit map
entries), and remove the chained Replace sequences in NormalizeThaiCharacters to
prevent unintended transitive replacements.

In `@WFInfo/Main.cs`:
- Around line 266-275: RunOnUIThread currently captures MainWindow.INSTANCE but
still lets callers dereference MainWindow.INSTANCE inside their lambdas,
creating lifecycle races; change RunOnUIThread to pass the captured window into
the delegate to avoid re-reading the global state—update the method signature
from Action act to Action<MainWindow> act (or alternately keep Action and
internally invoke a wrapper that calls actWithWindow(mw)), ensure the
implementation uses the captured mw when invoking (CheckAccess/Invoke) and
update all callsites (those that reference MainWindow.INSTANCE inside their
lambdas) to accept the MainWindow parameter and use that captured instance
instead of re-reading MainWindow.INSTANCE.

In `@WFInfo/Ocr.cs`:
- Around line 195-199: The catch block that handles TesseractService
initialization should not convert the failure into a latent null-state by
setting _tesseractService = null; instead record the failure via a boolean flag
(e.g., _tesseractInitFailed) or rethrow so callers can handle it explicitly;
update the initialization catch (around TesseractService creation) to set
_tesseractInitFailed = true (and log the error) rather than nulling
_tesseractService, and add guards at OCR entry points that currently dereference
_tesseractService to check _tesseractInitFailed (or handle the rethrown
exception) before using TesseractService.
- Around line 111-115: IsCJKLocale currently reads
ApplicationSettings.GlobalReadonlySettings.Locale (a static source) which can
diverge from the injected _settings used in headless/tests; change IsCJKLocale
to use the injected settings by either making it an instance method that reads
_settings.Locale or add a parameter (locale) and have callers pass
_settings.Locale. Update the IsCJKLocale implementation to check
_settings.Locale (or the passed-in locale) for "ko", "zh-hans", "zh-hant", or
"ja" and remove the reliance on
ApplicationSettings.GlobalReadonlySettings.Locale so tests and headless flows
use the injected configuration.

In `@WFInfo/Services/TesseractService.cs`:
- Around line 174-176: CreateEngine() currently calls
LanguageProcessorFactory.GetProcessor(Locale) which can throw
InvalidOperationException if the factory hasn't been initialized yet; wrap the
GetProcessor call in a try/catch (catch InvalidOperationException) or otherwise
guard with a factory-initialized check, and fall back to DefaultWhitelist when
the factory is not available so constructor engine creation cannot throw; update
the CreateEngine method to use the safe fallback (use
processor?.CharacterWhitelist ?? DefaultWhitelist) only after catching/avoiding
the factory exception and then call
engine.SetVariable("tessedit_char_whitelist", whitelist).

---

Nitpick comments:
In `@WFInfo/Data.cs`:
- Around line 879-883: The code currently holds marketItemsLock while calling
LanguageProcessorFactory.GetCurrentProcessor() and
processor.GetLocalizedNameData(s, marketItems, useLevenshtein), which can be
long-running; instead take a short snapshot of the shared data under
marketItemsLock (e.g., copy marketItems into a local immutable/list/array) and
then release the lock and call processor.GetLocalizedNameData using that
snapshot (keep calling LanguageProcessorFactory.GetCurrentProcessor() outside
the lock if needed), ensuring only the minimal read/copy is protected by
marketItemsLock.

In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs`:
- Around line 23-81: The koreanReplacements list in KoreanLanguageProcessor.cs
contains duplicate transliteration entries (e.g. "류" -> "ryu") which cause
redundant replacements and maintenance noise; remove the duplicate(s) from the
static List<KeyValuePair<string,string>> koreanReplacements so each Hangul key
appears only once (keep the intended canonical mapping), and scan the list in
the KoreanLanguageProcessor class for any other repeated keys to deduplicate
them. Ensure only a single KeyValuePair for "류" -> "ryu" remains and preserve
existing order/behavior otherwise.

In `@WFInfo/LanguageProcessing/LanguageProcessor.cs`:
- Around line 34-39: The blueprint-removal cache init fails if BlueprintRemovals
or any element is null; in the LanguageProcessor constructor (the block using
BlueprintRemovals and building normalized), guard against null by treating
BlueprintRemovals as an empty array when null and replace null elements with
String.Empty (or skip them) before calling ToLowerInvariant; update the code
that creates normalized (referencing BlueprintRemovals, normalized, and the
for-loop index i) to null-check the source array and each entry so construction
cannot throw NullReferenceException.

In `@WFInfo/Services/TesseractService.cs`:
- Line 247: getLocaleTessdata is creating a WebClient via
CustomEntrypoint.CreateNewWebClient() without disposing it, which leaks handles;
wrap the creation in a using statement (or ensure Dispose in a try/finally) so
the WebClient is disposed after use, e.g., replace the raw allocation with a
using (var webClient = CustomEntrypoint.CreateNewWebClient()) { ... } pattern
inside getLocaleTessdata to guarantee proper resource cleanup.

In `@WFInfo/WFInfo.csproj`:
- Line 135: The GenerateUpdateXml MSBuild target (Target
Name="GenerateUpdateXml") currently runs for all configurations; restrict it to
Release-only by adding a configuration condition to the target (e.g., add
Condition="'$(Configuration)' == 'Release'" on the Target element) so the
GenerateUpdateXml target executes only when building the Release configuration.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 983504f and 039bda4.

📒 Files selected for processing (21)
  • TestCharacterRanges.cs
  • WFInfo/CustomEntrypoint.cs
  • WFInfo/Data.cs
  • WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs
  • WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs
  • WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs
  • WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs
  • WFInfo/LanguageProcessing/LanguageProcessor.cs
  • WFInfo/LanguageProcessing/PolishLanguageProcessor.cs
  • WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs
  • WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs
  • WFInfo/Main.cs
  • WFInfo/Ocr.cs
  • WFInfo/Services/TesseractService.cs
  • WFInfo/Tests/OCRTestRunner.cs
  • WFInfo/Tests/TestModels.cs
  • WFInfo/Tests/TestProgram.cs
  • WFInfo/WFInfo.csproj
  • WFInfo/errorDialogue.xaml.cs
  • tests/README.md
  • tests/run_tests.bat
🚧 Files skipped from review as they are similar to previous changes (4)
  • WFInfo/errorDialogue.xaml.cs
  • WFInfo/LanguageProcessing/PolishLanguageProcessor.cs
  • tests/README.md
  • WFInfo/Tests/TestModels.cs

/// </summary>
public class TestCharacterRanges
{
public static void TestCharacterRanges()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -e
python3 - <<'PY'
import pathlib, re
files = list(pathlib.Path('.').rglob('TestCharacterRanges.cs'))
if files:
    p = files[0]
    src = p.read_text(encoding='utf-8', errors='ignore')
    cls = re.search(r'\bclass\s+(\w+)\b', src)
    if cls:
        cls_name = cls.group(1)
        # Look for method with same name as class (non-constructor pattern)
        m = re.search(rf'\b(public|private|internal|protected)\s+static\s+\w+\s+{cls_name}\s*\(', src)
        print(f"file={p}")
        print(f"class={cls_name}")
        print(f"has_conflicting_member={bool(m)}")
        if m:
            line = src.count('\n', 0, m.start()) + 1
            print(f"conflict_line={line}")
            # Show context around the conflict
            lines = src.split('\n')
            start = max(0, line - 2)
            end = min(len(lines), line + 1)
            print(f"context:")
            for i in range(start, end):
                print(f"  {i+1}: {lines[i]}")
    else:
        print("No class found")
else:
    print("TestCharacterRanges.cs not found")
PY

Repository: WFCD/WFinfo

Length of output: 245


Rename TestCharacterRanges(); current signature is a compile error.

Line 12 declares a method with the same name as its enclosing class with a return type, which is invalid in C# (CS0542). In C#, only constructors may share the class name, and they cannot have explicit return types.

🛠️ Proposed fix
-        public static void TestCharacterRanges()
+        public static void RunCharacterRangeTests()
         {
             Console.WriteLine("Testing character range generation...");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public static void TestCharacterRanges()
public static void RunCharacterRangeTests()
{
Console.WriteLine("Testing character range generation...");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TestCharacterRanges.cs` at line 12, The method TestCharacterRanges()
conflicts with the enclosing class TestCharacterRanges because in C# a method
cannot have the same name as its class; rename the method (e.g., to
RunTestCharacterRanges or ExecuteTestCharacterRanges) and update any callers, or
if you intended a constructor remove the return type and make it parameterless
constructor; modify the method signature referenced as TestCharacterRanges()
accordingly and ensure references are updated to the new method name.

Comment on lines +53 to +57
public class TestApplicationSettings : IReadOnlyApplicationSettings
{
public string Locale => "en";
// Add other required properties as needed
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -e
python - <<'PY'
import pathlib, re

iface_file = None
iface_src = None
for p in pathlib.Path('.').rglob('*.cs'):
    s = p.read_text(encoding='utf-8', errors='ignore')
    if 'interface IReadOnlyApplicationSettings' in s:
        iface_file, iface_src = p, s
        break

test_file = next(pathlib.Path('.').rglob('TestCharacterRanges.cs'))
test_src = test_file.read_text(encoding='utf-8', errors='ignore')

print(f"interface_file={iface_file}")
print(f"test_file={test_file}")

iface_members = set(re.findall(r'\b(?:string|int|bool|double|float|decimal|long|short|byte|object|[A-Za-z_][A-Za-z0-9_<>,\[\]\?]*)\s+([A-Za-z_][A-Za-z0-9_]*)\s*\{\s*get;\s*\}', iface_src))
impl_match = re.search(r'class\s+TestApplicationSettings\s*:\s*IReadOnlyApplicationSettings\s*\{([\s\S]*?)\n\s*\}', test_src)
impl_body = impl_match.group(1) if impl_match else ''
impl_members = set(re.findall(r'\b([A-Za-z_][A-Za-z0-9_]*)\s*=>', impl_body))
impl_members |= set(re.findall(r'\b([A-Za-z_][A-Za-z0-9_]*)\s*\{\s*get;', impl_body))

missing = sorted(iface_members - impl_members)
print(f"interface_member_count={len(iface_members)}")
print(f"implemented_member_count={len(impl_members)}")
print(f"missing_sample={missing[:20]}")
PY

Repository: WFCD/WFinfo

Length of output: 500


TestApplicationSettings does not fully implement IReadOnlyApplicationSettings.

The interface defines 76 members, but only Locale is implemented. Missing 75 members including: ActivationKey, ActivationKeyKey, ActivationMouseButton, Auto, AutoCSV, AutoCount, AutoDelay, AutoList, CF_pBMax, CF_pBMin, and others. This is a compile-blocking error—all interface members must be implemented.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TestCharacterRanges.cs` around lines 53 - 57, The TestApplicationSettings
class currently only implements Locale but must implement all members of
IReadOnlyApplicationSettings; either implement every required property/method on
TestApplicationSettings (matching the interface signatures) by adding the
missing properties like ActivationKey, ActivationKeyKey, ActivationMouseButton,
Auto, AutoCSV, AutoCount, AutoDelay, AutoList, CF_pBMax, CF_pBMin, etc.,
providing appropriate test-friendly default values or auto-properties, or change
TestApplicationSettings to be abstract so it does not need to supply
implementations; ensure each member name and type exactly matches the interface
contracts and that any methods/properties required by
IReadOnlyApplicationSettings are present (you can use simple defaults or throw
NotImplementedException for members not used in tests).

Comment on lines +466 to +468
string response = File.ReadAllText(localeSpecificFallbackPath);
JObject data = JsonConvert.DeserializeObject<JObject>(response);
return (data, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate fallback payload shape before returning cached data.

If fallback JSON is malformed or deserializes to an unexpected shape, this currently returns bad data and fails later with less actionable errors. Validate fallback payload the same way as the online payload before returning.

🛠️ Proposed fix
                 if (File.Exists(localeSpecificFallbackPath))
                 {
                     string response = File.ReadAllText(localeSpecificFallbackPath);
                     JObject data = JsonConvert.DeserializeObject<JObject>(response);
+                    if (!(data?["data"] is JArray))
+                    {
+                        throw new InvalidDataException(
+                            $"Invalid fallback payload structure in '{localeSpecificFallbackPath}' for locale '{locale}'.");
+                    }
                     return (data, true);
                 }

Based on learnings: In WFInfo's Data.cs, D1firehail prefers letting JSON deserialization exceptions bubble up rather than gracefully degrading with try-catch blocks.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Data.cs` around lines 466 - 468, The fallback path currently reads and
deserializes JSON into JObject (response → data) and returns it without
validating its shape; update the fallback branch in the method that uses
localeSpecificFallbackPath to run the same payload validation routine you use
for the online payload (e.g., reuse the existing
ValidatePayload/ValidateResponse logic or parsing into the same DTO/type) before
returning (data, true), and do not catch/desuppress deserialization or
validation exceptions — allow them to bubble up so malformed fallback JSON fails
loudly.


public override string[] BlueprintRemovals => new[] { "Progetto", "Piano" };

public override string CharacterWhitelist => "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-()" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add space to Italian CharacterWhitelist.

Line [226] is missing ' ', unlike the other European processors. This can collapse multi-word OCR output and hurt matching quality.

🔤 Proposed fix
-        public override string CharacterWhitelist => "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-()" + 
+        public override string CharacterWhitelist => "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-() " + 
             GenerateCharacterRange(0x00C0, 0x00C0) + // À
             GenerateCharacterRange(0x00C8, 0x00C8) + // È
             GenerateCharacterRange(0x00C9, 0x00C9) + // É
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public override string CharacterWhitelist => "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-()" +
public override string CharacterWhitelist => "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-() " +
GenerateCharacterRange(0x00C0, 0x00C0) + // À
GenerateCharacterRange(0x00C8, 0x00C8) + // È
GenerateCharacterRange(0x00C9, 0x00C9) + // É
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs` at line 226, The
CharacterWhitelist property is missing the space character causing multi-word
OCR results to collapse; update the overridden CharacterWhitelist string in the
EuropeanLanguageProcessor (the CharacterWhitelist property) to include a space
(' ') alongside the existing characters so Italian text preserves word
boundaries, matching the other European processors' whitelists.

Comment on lines +87 to +129
private int GetJapaneseCharacterDifference(char a, char b)
{
if (a == b) return 0;

// Hiragana-Katakana equivalents (lower cost for similar characters)
var hiraganaKatakanaPairs = new Dictionary<char, char>
{
{'あ', 'ア'}, {'い', 'イ'}, {'う', 'ウ'}, {'え', 'エ'}, {'お', 'オ'},
{'か', 'カ'}, {'き', 'キ'}, {'く', 'ク'}, {'け', 'ケ'}, {'こ', 'コ'},
{'が', 'ガ'}, {'ぎ', 'ギ'}, {'ぐ', 'グ'}, {'げ', 'ゲ'}, {'ご', 'ゴ'},
{'さ', 'サ'}, {'し', 'シ'}, {'す', 'ス'}, {'せ', 'セ'}, {'そ', 'ソ'},
{'ざ', 'ザ'}, {'じ', 'ジ'}, {'ず', 'ズ'}, {'ぜ', 'ゼ'}, {'ぞ', 'ゾ'},
{'た', 'タ'}, {'ち', 'チ'}, {'つ', 'ツ'}, {'て', 'テ'}, {'と', 'ト'},
{'だ', 'ダ'}, {'ぢ', 'ヂ'}, {'づ', 'ヅ'}, {'で', 'デ'}, {'ど', 'ド'},
{'な', 'ナ'}, {'に', 'ニ'}, {'ぬ', 'ヌ'}, {'ね', 'ネ'}, {'の', 'ノ'},
{'は', 'ハ'}, {'ひ', 'ヒ'}, {'ふ', 'フ'}, {'へ', 'ヘ'}, {'ほ', 'ホ'},
{'ば', 'バ'}, {'び', 'ビ'}, {'ぶ', 'ブ'}, {'べ', 'ベ'}, {'ぼ', 'ボ'},
{'ぱ', 'パ'}, {'ぴ', 'ピ'}, {'ぷ', 'プ'}, {'ぺ', 'ペ'}, {'ぽ', 'ポ'},
{'ま', 'マ'}, {'み', 'ミ'}, {'む', 'ム'}, {'め', 'メ'}, {'も', 'モ'},
{'や', 'ヤ'}, {'ゆ', 'ユ'}, {'よ', 'ヨ'},
{'ら', 'ラ'}, {'り', 'リ'}, {'る', 'ル'}, {'れ', 'レ'}, {'ろ', 'ロ'},
{'わ', 'ワ'}, {'ゐ', 'ヰ'}, {'ゑ', 'ヱ'}, {'を', 'ヲ'}, {'ん', 'ン'},
{'っ', 'ッ'}, {'ゃ', 'ャ'}, {'ゅ', 'ュ'}, {'ょ', 'ョ'}
};

// Check if characters are hiragana-katakana equivalents
if (hiraganaKatakanaPairs.TryGetValue(a, out var katakanaEquiv) && katakanaEquiv == b)
return 1; // Low cost for hiragana-katakana equivalents
if (hiraganaKatakanaPairs.TryGetValue(b, out var hiraganaEquiv) && hiraganaEquiv == a)
return 1;

// Similar looking characters (common OCR confusions)
var similarChars = new[]
{
new[] {'シ', 'ツ'}, // shi/tsu confusion
new[] {'ソ', 'ン'}, // so/n confusion
new[] {'ク', 'ワ'}, // ku/wa confusion
new[] {'ヘ', 'へ'}, // he/he (different forms)
new[] {'ベ', 'べ'}, // be/be (different forms)
new[] {'ヲ', 'ヲ'}, // wo/wo (different forms)
new[] {'ヶ', 'ケ'}, // ke/ke variation
new[] {'ヵ', 'カ'}, // ka/ka variation
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid per-cell allocations in GetJapaneseCharacterDifference.

hiraganaKatakanaPairs and similarChars are recreated on every comparison inside the Levenshtein matrix loop. This can significantly increase GC pressure during OCR matching.

Suggested refactor direction
+        private static readonly Dictionary<char, char> HiraganaKatakanaPairs = new Dictionary<char, char>
+        {
+            // ... existing pairs ...
+        };
+
+        private static readonly (char A, char B)[] SimilarChars =
+        {
+            ('シ', 'ツ'),
+            ('ソ', 'ン'),
+            ('ク', 'ワ'),
+            ('ヘ', 'へ'),
+            ('ベ', 'べ'),
+            ('ヶ', 'ケ'),
+            ('ヵ', 'カ'),
+        };

         private int GetJapaneseCharacterDifference(char a, char b)
         {
             if (a == b) return 0;
-
-            var hiraganaKatakanaPairs = new Dictionary<char, char>
-            {
-                // ...
-            };
+            // use static caches above

-            if (hiraganaKatakanaPairs.TryGetValue(a, out var katakanaEquiv) && katakanaEquiv == b)
+            if (HiraganaKatakanaPairs.TryGetValue(a, out var katakanaEquiv) && katakanaEquiv == b)
                 return 1;
-            if (hiraganaKatakanaPairs.TryGetValue(b, out var hiraganaEquiv) && hiraganaEquiv == a)
+            if (HiraganaKatakanaPairs.TryGetValue(b, out var hiraganaEquiv) && hiraganaEquiv == a)
                 return 1;
-
-            var similarChars = new[]
-            {
-                // ...
-            };
-
-            foreach (var pair in similarChars)
+            foreach (var pair in SimilarChars)
             {
-                if ((a == pair[0] && b == pair[1]) || (a == pair[1] && b == pair[0]))
+                if ((a == pair.A && b == pair.B) || (a == pair.B && b == pair.A))
                     return 1;
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs` around lines 87 -
129, GetJapaneseCharacterDifference currently allocates hiraganaKatakanaPairs
and similarChars on every call, causing GC pressure; move these into class-level
static readonly fields (e.g., static readonly Dictionary<char,char>
HiraganaKatakanaPairs and static readonly HashSet<(char,char)> or static
readonly (char,char)[] SimilarCharPairs) and initialize them once (static
constructor or inline). Update GetJapaneseCharacterDifference to reference those
static fields (use TryGetValue on HiraganaKatakanaPairs and check membership in
SimilarCharPairs without creating new arrays) so no per-call allocations occur.

Comment on lines +25 to +30
public override int CalculateLevenshteinDistance(string s, string t)
{
// Check if both inputs contain Thai characters for Thai-aware comparison
bool sHasThai = ContainsThai(s);
bool tHasThai = ContainsThai(t);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Null input can throw in Thai detection path.

CalculateLevenshteinDistance calls ContainsThai directly on s/t; ContainsThai iterates without a null check.

Suggested fix
-            bool sHasThai = ContainsThai(s);
-            bool tHasThai = ContainsThai(t);
+            bool sHasThai = !string.IsNullOrEmpty(s) && ContainsThai(s);
+            bool tHasThai = !string.IsNullOrEmpty(t) && ContainsThai(t);
         private static bool ContainsThai(string input)
         {
+            if (string.IsNullOrEmpty(input)) return false;
             foreach (char c in input)
             {
                 // Thai Unicode range (0x0E00-0x0E7F)
                 if (c >= 0x0E00 && c <= 0x0E7F) return true;
             }
             return false;
         }

Also applies to: 150-156

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs` around lines 25 - 30,
CalculateLevenshteinDistance currently calls ContainsThai(s) and ContainsThai(t)
without guarding against null; update CalculateLevenshteinDistance to null-check
inputs (s and t) before calling ContainsThai and handle nulls appropriately
(e.g., treat null as empty string or short-circuit to default behavior), and
apply the same null-safety change to the other affected method around lines
150-156 that also calls ContainsThai; ensure the checks are added where
ContainsThai is invoked so ContainsThai is never iterated over a null value.

Comment on lines +56 to +72
private static string NormalizeTurkishCharacters(string input)
{
// Convert Turkish characters to standard equivalents for comparison
return input.ToLowerInvariant()
.Replace('ğ', 'g')
.Replace('Ğ', 'G')
.Replace('ş', 's')
.Replace('Ş', 'S')
.Replace('ç', 'c')
.Replace('Ç', 'C')
.Replace('ö', 'o')
.Replace('Ö', 'O')
.Replace('ü', 'u')
.Replace('Ü', 'U')
.Replace('ı', 'i')
.Replace('İ', 'I');
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "TurkishLanguageProcessor.cs" -type f

Repository: WFCD/WFinfo

Length of output: 110


🏁 Script executed:

# Once we find the file, let's examine it
git ls-files | grep -i turkish

Repository: WFCD/WFinfo

Length of output: 108


🌐 Web query:

.NET C# ToLowerInvariant Turkish character İ combining dot U+0307 normalization

💡 Result:

In .NET, ToLowerInvariant() uses culture-insensitive (invariant) Unicode case mapping, not Turkish-specific casing rules. Microsoft explicitly distinguishes culture-sensitive vs culture-insensitive casing and recommends using CultureInfo.InvariantCulture when you need consistent, culture-independent results. (learn.microsoft.com)

Why İ becomes i + U+0307

The Turkish capital dotted I U+0130 (İ) has a Unicode special-casing lowercase mapping (language-insensitive) to two code points: U+0069 (i) + U+0307 (COMBINING DOT ABOVE). (charlottebuff.com)

So in .NET you’ll see:

  • "\u0130".ToLowerInvariant()"\u0069\u0307" (renders as )

This also aligns with Unicode’s canonical decomposition data for U+0130 (it decomposes to I + U+0307), which is why the dot-above is represented as a combining mark. (charactercodes.net)

Why normalization doesn’t “fix” it

There is no single precomposed Unicode character for “Latin small letter i with dot above” that would replace i + U+0307, so Normalize(FormC) will typically keep it as two code points.

What to do (depending on intent)

1) If you actually want Turkish casing, use Turkish culture (when appropriate):

var tr = new CultureInfo("tr-TR");
var s = "\u0130".ToLower(tr);   // "i" (U+0069)

Unicode’s conditional special-casing for Turkish/Azeri maps U+0130 to plain i (without U+0307). (charlottebuff.com)

2) If you must use invariant casing but need to remove the extra dot (common for identifiers/keys):
Remove U+0307 after decomposition:

string FoldForKey(string input)
{
    var lower = input.ToLowerInvariant().Normalize(NormalizationForm.FormD);
    var sb = new StringBuilder(lower.Length);
    foreach (var ch in lower)
        if (ch != '\u0307') sb.Append(ch); // remove COMBINING DOT ABOVE
    return sb.ToString().Normalize(NormalizationForm.FormC);
}

(Only do this if it matches your domain rules, because it is no longer linguistically correct Turkish text handling.)

Citations:


🏁 Script executed:

cat -n WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs | sed -n '50,75p'

Repository: WFCD/WFinfo

Length of output: 1020


🏁 Script executed:

# Let's also check the full method and surrounding context
cat -n WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs | sed -n '1,80p'

Repository: WFCD/WFinfo

Length of output: 3301


NormalizeTurkishCharacters has two related bugs with Turkish character handling:

  1. The uppercase character replacements (lines 61, 63, 65, 67, 69) are unreachable—they execute after ToLowerInvariant() has already converted everything to lowercase.

  2. More critically, ToLowerInvariant() on "İ" (U+0130, Turkish capital dotted I) produces "i" (U+0069) plus U+0307 (combining dot above), not plain "i". This leaves the combining dot artifact in the normalized output, breaking comparisons.

The fix is to handle Turkish I/İ casing before ASCII folding, using Turkish culture:

// Turkish-safe case handling before ASCII folding
string normalized = input
    .Replace('İ', 'i')
    .Replace('I', 'ı')
    .ToLower(new CultureInfo("tr-TR"));

return normalized
    .Replace('ğ', 'g')
    .Replace('ş', 's')
    .Replace('ç', 'c')
    .Replace('ö', 'o')
    .Replace('ü', 'u')
    .Replace('ı', 'i');

This ensures Unicode's special-case mapping for Turkish is applied correctly: U+0130 (İ) → plain "i", not "i̇".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs` around lines 56 - 72,
NormalizeTurkishCharacters currently calls ToLowerInvariant which makes the
subsequent uppercase Replace(...) calls unreachable and also mishandles 'İ'
(U+0130) by producing an "i" plus a combining dot; fix by applying Turkish-aware
casing and explicit I/İ handling before ASCII-folding: in
NormalizeTurkishCharacters first map 'İ'→'i' and 'I'→'ı' (or call ToLower with
new CultureInfo("tr-TR")) so Unicode special-case mapping for Turkish I is
applied, then perform the ASCII-fold replacements for 'ğ','ş','ç','ö','ü' and
finally map 'ı'→'i'; remove the now-unnecessary uppercase Replace(...) calls.

Comment on lines +98 to +104
var wnd = MainWindow.INSTANCE;
var disp = wnd?.Dispatcher;
if (disp != null)
await disp.InvokeAsync(() =>
{
wnd.UpdateMarketStatus(msg);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find and examine WFInfo/Main.cs
find . -name "Main.cs" -path "*/WFInfo/*"

Repository: WFCD/WFinfo

Length of output: 71


🏁 Script executed:

# Get the file size first
wc -l WFInfo/Main.cs 2>/dev/null || find . -name "Main.cs" | head -5

Repository: WFCD/WFinfo

Length of output: 73


🏁 Script executed:

# Read the specific section around lines 98-104
sed -n '90,115p' WFInfo/Main.cs

Repository: WFCD/WFinfo

Length of output: 978


🏁 Script executed:

# Find the method signature containing these lines
sed -n '70,110p' WFInfo/Main.cs

Repository: WFCD/WFinfo

Length of output: 1616


🏁 Script executed:

# Search for UpdateMarketStatusAsync and how it's called
rg "UpdateMarketStatusAsync|UpdateMarketStatus" WFInfo/Main.cs -B 2 -A 2

Repository: WFCD/WFinfo

Length of output: 510


🌐 Web query:

WPF Dispatcher HasShutdownStarted HasShutdownFinished properties

💡 Result:

In WPF, these Dispatcher properties tell you where the dispatcher is in its shutdown lifecycle:

  • Dispatcher.HasShutdownStarted: true once shutdown has begun (the ShutdownStarted event is raised at that point). (learn.microsoft.com)
  • Dispatcher.HasShutdownFinished: true once shutdown has fully completed (the ShutdownFinished event is raised at that point). (learn.microsoft.com)

Key behavior to know:

  • The dispatcher doesn’t finish shutting down until its event queue “unwinds” (i.e., it finishes processing/returning from the current message loop work). (learn.microsoft.com)
  • After shutdown begins, pending queued work items are aborted, so Invoke/BeginInvoke work may be canceled/never run once HasShutdownStarted is true. (learn.microsoft.com)

Citations:


🌐 Web query:

WPF Dispatcher InvokeAsync shutdown exception fire and forget

💡 Result:

In WPF, Dispatcher.InvokeAsync queues work onto a Dispatcher. If the app (or that dispatcher thread) is shutting down, queued work may never run, or it may be canceled/rejected, commonly surfacing as exceptions like “The dispatcher has shutdown started/finished” / TaskCanceledException / OperationCanceledException (depending on timing). WPF itself has to defensively check for shutdown before running posted callbacks. [3]

What to do (safe “fire-and-forget”)

  1. Don’t post during shutdown
var d = Application.Current?.Dispatcher;
if (d is null || d.HasShutdownStarted || d.HasShutdownFinished) return;
  1. Use a shutdown cancellation token and pass it to InvokeAsync
    InvokeAsync supports a CancellationToken, which you can cancel during Application.Exit, Window.Closing, etc., to prevent pending UI work from running late. [2]
private readonly CancellationTokenSource _uiCts = new();

public void OnAppExit() => _uiCts.Cancel();

public void PostToUi(Action action)
{
    var d = Application.Current?.Dispatcher;
    if (d is null || d.HasShutdownStarted || d.HasShutdownFinished) return;

    _ = d.InvokeAsync(() =>
    {
        _uiCts.Token.ThrowIfCancellationRequested();
        action();
    }, DispatcherPriority.Normal, _uiCts.Token).Task
     .ContinueWith(t => Log(t.Exception),
         TaskContinuationOptions.OnlyOnFaulted);
}
  1. Always observe/log exceptions from fire-and-forget
    If you truly “fire and forget” and never observe the returned Task, failures can be missed or show up later as unobserved task exceptions. Attaching a fault-only continuation is a common pattern. (The WPF threading/dispatcher model is explicitly queue-based, so you need to treat these as real async operations.) [1]

If you’re seeing shutdown-related dispatcher exceptions frequently

It can also indicate a dispatcher running on a worker thread that isn’t being shut down cleanly before process/AppDomain exit. That’s a known cause of shutdown-time dispatcher exceptions; the correct fix is to manage that dispatcher’s lifetime and shut it down properly. [4]

Sources: WPF threading/Dispatcher overview [1], InvokeAsync cancellation guidance [2], WPF source showing shutdown checks around posted callbacks [3], shutdown-time exceptions when dispatchers aren’t properly shut down [4].


Guard InvokeAsync against dispatcher shutdown in UpdateMarketStatusAsync.

disp != null alone is insufficient during teardown. InvokeAsync can throw TaskCanceledException or OperationCanceledException after shutdown starts, and since UpdateMarketStatus fires asynchronously without observing exceptions, failures are silently lost.

Add checks for dispatcher shutdown state before invoking:

Suggested fix
             var wnd = MainWindow.INSTANCE;
             var disp = wnd?.Dispatcher;
-            if (disp != null)
-                await disp.InvokeAsync(() =>
-                {
-                    wnd.UpdateMarketStatus(msg);
-                });
+            if (disp == null || disp.HasShutdownStarted || disp.HasShutdownFinished)
+                return;
+
+            await disp.InvokeAsync(() =>
+            {
+                wnd.UpdateMarketStatus(msg);
+            });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
var wnd = MainWindow.INSTANCE;
var disp = wnd?.Dispatcher;
if (disp != null)
await disp.InvokeAsync(() =>
{
wnd.UpdateMarketStatus(msg);
});
var wnd = MainWindow.INSTANCE;
var disp = wnd?.Dispatcher;
if (disp == null || disp.HasShutdownStarted || disp.HasShutdownFinished)
return;
await disp.InvokeAsync(() =>
{
wnd.UpdateMarketStatus(msg);
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Main.cs` around lines 98 - 104, The dispatcher call in
UpdateMarketStatusAsync using MainWindow.INSTANCE.Dispatcher must be guarded
against shutdown: before calling disp.InvokeAsync(() =>
wnd.UpdateMarketStatus(msg)) check disp != null && !disp.HasShutdownStarted &&
!disp.HasShutdownFinished and ensure wnd is not null, then await the returned
DispatcherOperation.Task and wrap the await in a try/catch that catches
TaskCanceledException and OperationCanceledException (and optionally
InvalidOperationException) so exceptions during dispatcher shutdown are
observed/handled instead of being silently lost; reference MainWindow.INSTANCE,
Dispatcher, InvokeAsync, and UpdateMarketStatus when applying this change.

Comment on lines +133 to +138
var testCase = JsonConvert.DeserializeObject<TestCase>(File.ReadAllText(jsonPath));
result.Language = testCase.Language ?? "unknown";
result.Theme = testCase.Theme ?? "auto";
result.Category = testCase.Category ?? "reward";
result.ExpectedParts = testCase.Parts?.Values.ToList() ?? new List<string>();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard null TestCase after deserialization.

Line [134] dereferences testCase without a null check; malformed scenario JSON currently turns into an unstructured runner error.

🛠️ Proposed fix
                 // Load spec
                 var testCase = JsonConvert.DeserializeObject<TestCase>(File.ReadAllText(jsonPath));
+                if (testCase == null)
+                {
+                    result.ErrorMessage = $"Invalid test case JSON: {jsonPath}";
+                    result.Success = false;
+                    stopwatch.Stop();
+                    result.ProcessingTimeMs = stopwatch.ElapsedMilliseconds;
+                    return result;
+                }
                 result.Language = testCase.Language ?? "unknown";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/Tests/OCRTestRunner.cs` around lines 133 - 138, After deserializing
into TestCase, guard against testCase being null: check if testCase == null
right after JsonConvert.DeserializeObject<TestCase>(...) and handle it
explicitly (for example, throw a descriptive exception or set the runner result
to a failed/error state and return) instead of dereferencing; ensure you
reference the same symbols used in the diff (testCase, result.Language,
result.Theme, result.Category, result.ExpectedParts, and testCase.Parts) so the
code either assigns defaults or aborts cleanly when the JSON is malformed.

Comment on lines +50 to +52
<PropertyGroup>
<UpdateAssemblyVersion>9.8.0.0</UpdateAssemblyVersion>
</PropertyGroup>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "1) update.xml version source in WFInfo/WFInfo.csproj"
rg -nP '<UpdateAssemblyVersion>([^<]+)</UpdateAssemblyVersion>|<WriteLinesToFile[^>]*version' WFInfo/WFInfo.csproj || echo "No matches found"

echo
echo "2) Version properties declared in WFInfo/WFInfo.csproj"
rg -nP '<(Version|AssemblyVersion|FileVersion|InformationalVersion)>([^<]+)</(Version|AssemblyVersion|FileVersion|InformationalVersion)>' WFInfo/WFInfo.csproj || echo "No matches found"

echo
echo "3) Assembly version attributes in AssemblyInfo files"
fd -HI 'AssemblyInfo.cs$' -x rg -nP 'Assembly(File|Informational)?Version\("([^"]+)"\)' {} || echo "No AssemblyInfo.cs found"

echo
echo "4) Full GenerateUpdateXml target"
rg -A 30 'Target Name="GenerateUpdateXml"' WFInfo/WFInfo.csproj || echo "Target not found"

echo
echo "5) Check lines 50-52 as mentioned in review"
sed -n '48,54p' WFInfo/WFInfo.csproj

Repository: WFCD/WFinfo

Length of output: 2885


Eliminate version drift risk by deriving updater version from built assembly.

UpdateAssemblyVersion at line 51 is maintained separately from AssemblyVersion in WFInfo/Properties/AssemblyInfo.cs (lines 54-55). Though currently synchronized at 9.8.0.0, this dual maintenance point creates drift risk: if the assembly version is updated but UpdateAssemblyVersion is forgotten, update.xml at line 142 will advertise an incorrect version, breaking update behavior.

Derive the version from the compiled assembly using GetAssemblyIdentity instead:

💡 Proposed fix
-  <PropertyGroup>
-    <UpdateAssemblyVersion>9.8.0.0</UpdateAssemblyVersion>
-  </PropertyGroup>
+  <!-- Remove hard-coded updater version; resolve from built assembly -->

   <Target Name="GenerateUpdateXml" AfterTargets="Build">
+    <GetAssemblyIdentity AssemblyFiles="$(TargetPath)">
+      <Output TaskParameter="Assemblies" ItemName="BuiltAssemblyIdentity" />
+    </GetAssemblyIdentity>
     <PropertyGroup>
       <UpdateXmlPath>$(OutputPath)update.xml</UpdateXmlPath>
+      <ResolvedUpdateVersion>@(BuiltAssemblyIdentity->'%(Version)')</ResolvedUpdateVersion>
     </PropertyGroup>
     
     <WriteLinesToFile File="$(UpdateXmlPath)" Lines="&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;" Overwrite="true" />
     <WriteLinesToFile File="$(UpdateXmlPath)" Lines="&lt;item&gt;" Overwrite="false" />
-    <WriteLinesToFile File="$(UpdateXmlPath)" Lines="    &lt;version&gt;$(UpdateAssemblyVersion)&lt;/version&gt;" Overwrite="false" />
+    <WriteLinesToFile File="$(UpdateXmlPath)" Lines="    &lt;version&gt;$(ResolvedUpdateVersion)&lt;/version&gt;" Overwrite="false" />
     <WriteLinesToFile File="$(UpdateXmlPath)" Lines="    &lt;url&gt;https://github.com/WFCD/WFinfo/releases/latest/download/WFInfo.zip&lt;/url&gt;" Overwrite="false" />
     <WriteLinesToFile File="$(UpdateXmlPath)" Lines="    &lt;changelog&gt;https://github.com/WFCD/WFinfo/releases/latest&lt;/changelog&gt;" Overwrite="false" />
     <WriteLinesToFile File="$(UpdateXmlPath)" Lines="    &lt;mandatory&gt;false&lt;/mandatory&gt;" Overwrite="false" />
     <WriteLinesToFile File="$(UpdateXmlPath)" Lines="&lt;/item&gt;" Overwrite="false" />
     
-    <Message Text="Generated update.xml at $(UpdateXmlPath) with version $(UpdateAssemblyVersion)" Importance="high" />
+    <Message Text="Generated update.xml at $(UpdateXmlPath) with version $(ResolvedUpdateVersion)" Importance="high" />
   </Target>

Also applies to line 142 (where version is written).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WFInfo/WFInfo.csproj` around lines 50 - 52, The project currently keeps
UpdateAssemblyVersion in WFInfo.csproj in sync with AssemblyVersion in
Properties/AssemblyInfo.cs, which risks drift; remove the hardcoded
<UpdateAssemblyVersion> entry and instead derive the updater version by running
the MSBuild GetAssemblyIdentity task against the built assembly (the output
assembly from the project) and assign its Version metadata to the property used
when generating update.xml; update the MSBuild target that writes update.xml to
use the property produced by GetAssemblyIdentity (referencing the project output
assembly and the GetAssemblyIdentity task) so the value written to update.xml
always matches the compiled AssemblyVersion declared in AssemblyInfo.cs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant