Conversation
* Add foundation for test framework for OCR
📝 WalkthroughWalkthroughAdds a language-processing framework with per-locale processors and a factory, integrates language-aware normalization into Data, OCR, and Tesseract services, introduces a headless OCR test harness with test models and runner, expands locale UI options, bumps assembly version to 9.8.0.0, and adds test docs and scripts. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as WFInfo.exe<br/>(CustomEntrypoint)
participant TestProg as TestProgram.RunTests
participant Suite as OCRTestRunner.RunTestSuite
participant LPFactory as LanguageProcessorFactory
participant LP as LanguageProcessor<br/>(locale)
participant Tesseract as TesseractService
participant Data as Data.cs
participant Results as TestSuiteResult
CLI->>TestProg: args (map.json)
TestProg->>LPFactory: Initialize(settings)
TestProg->>Suite: RunTestSuite(map)
Suite->>Suite: Load scenarios from map.json
loop per scenario
Suite->>LPFactory: GetProcessor(language)
LPFactory-->>LP: locale-specific processor
Suite->>Tesseract: Init with LP.CharacterWhitelist
Suite->>Tesseract: OCR image -> raw text
Tesseract->>LP: NormalizeForPatternMatching(raw text)
LP-->>Data: normalized text
Data->>LP: CalculateLevenshteinDistance(candidates)
LP-->>Data: best match
Suite->>Results: record TestResult
end
Suite-->>TestProg: TestSuiteResult
TestProg->>Results: Save JSON + print summary
sequenceDiagram
participant App as WFInfo Main
participant LPFactory as LanguageProcessorFactory
participant LP as LanguageProcessor
participant OCR as OCR module
participant Tesseract as TesseractService
participant Data as Data.cs
App->>LPFactory: Initialize(settings)
App->>Tesseract: Init()
Tesseract->>LPFactory: GetCurrentProcessor()
LPFactory-->>LP: processor instance
Tesseract->>LP: request CharacterWhitelist
LP-->>Tesseract: whitelist
App->>OCR: Run OCR flow
OCR->>Tesseract: Extract text
Tesseract-->>OCR: raw OCR output
OCR->>LP: IsPartNameValid(text)
alt valid
OCR->>Data: GetLocaleNameData(text, useLevenshtein)
Data->>LP: CalculateLevenshteinDistance
LP-->>Data: distances -> matched name
Data-->>OCR: localized name
else invalid
OCR->>OCR: skip item
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 14
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
🟡 Minor comments (11)
tests/README.md-8-8 (1)
8-8:⚠️ Potential issue | 🟡 MinorCategory vocabulary is inconsistent (
fissurevsreward).
Line 8andLines 203-205present fissure as a distinct category, whileLines 69-72only definereward,inventory, andsnapit(and says fissure is included inreward). Please normalize this somap.jsonauthors know exactly whichcategoryvalues are valid.Also applies to: 69-72, 203-205
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/README.md` at line 8, The README has inconsistent category vocabulary: sometimes using "fissure" as its own category and other times saying fissures are included under "reward"; normalize and document the single canonical category set for map.json authors. Pick one canonical value (e.g., use "reward" and treat fissures as a subtype, or use "fissure" as a distinct category), then update the "Category Testing" section and the other occurrences that mention reward/inventory/snapit/fissure so they all list the same canonical category names and explain whether "fissure" is a subtype of "reward" or a standalone category; ensure the README's category examples and any mention of map.json "category" values (reward, inventory, snapit, fissure) are consistent and unambiguous.tests/usage_example.md-17-20 (1)
17-20:⚠️ Potential issue | 🟡 MinorAlign the docs with the currently shipped test entrypoint and language scope.
Line [19] uses
WFInfo.Tests.exe, while the provided runner script invokesWFInfo.exe.
Line [50] says “all 15 supported languages,” which conflicts with the stated exclusion of Thai/Japanese/Turkish in this PR.Also applies to: 50-50
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/usage_example.md` around lines 17 - 20, Update the documentation to reference the actual shipped test runner name and correct language support: replace any occurrence of "WFInfo.Tests.exe" with "WFInfo.exe" (the provided runner script), and change the phrase "all 15 supported languages" to the accurate supported-language statement reflecting the PR's exclusions (explicitly note that Thai, Japanese, and Turkish are excluded) so the examples and language-count text match the current release behavior.tests/TEST_EXECUTION_FLOW.md-79-88 (1)
79-88:⚠️ Potential issue | 🟡 MinorAdd a language identifier to the fenced code block.
The block starting on Line 79 is missing a language tag, which triggers markdownlint MD040.
Suggested fix
-``` +```text 1. User runs: WFInfo.exe map.json data/ results.json 2. CustomEntrypoint.Main() detects test arguments 3. Redirects to TestProgram.RunTests(args) ... 8. Exit with appropriate code (0=success, 1=partial failure, etc.)</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@tests/TEST_EXECUTION_FLOW.mdaround lines 79 - 88, The fenced code block
containing the numbered steps (the test execution flow) is missing a language
identifier which triggers markdownlint MD040; update the openingto include a language tag (e.g., "text") so it becomestext, leaving the content
unchanged; target the fenced block that lists steps 1–8 (the example starting
with "User runs: WFInfo.exe map.json data/ results.json") to resolve the lint
warning.</details> </blockquote></details> <details> <summary>WFInfo/Tests/OCRTestRunner.cs-171-283 (1)</summary><blockquote> `171-283`: _⚠️ Potential issue_ | _🟡 Minor_ **Test environment modifies global state without cleanup.** `SetupTestEnvironment` modifies `ApplicationSettings.GlobalSettings` and `OCR.uiScaling` without saving or restoring the original values. This can cause test interference when running multiple tests sequentially. Consider saving and restoring original settings, or implementing a test-scoped settings mechanism. <details> <summary>💡 Suggested approach</summary> ```csharp private (string locale, WFtheme theme, double scaling) SaveCurrentEnvironment() { var settings = ApplicationSettings.GlobalSettings; return (settings.Locale, settings.ThemeSelection, OCR.uiScaling); } private void RestoreEnvironment((string locale, WFtheme theme, double scaling) saved) { var settings = ApplicationSettings.GlobalSettings; settings.Locale = saved.locale; settings.ThemeSelection = saved.theme; OCR.uiScaling = saved.scaling; _tesseractService.ReloadEngines(); } ``` Then wrap test execution with save/restore calls. </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@WFInfo/Tests/OCRTestRunner.cs` around lines 171 - 283, SetupTestEnvironment mutates global state (ApplicationSettings.GlobalSettings and OCR.uiScaling) without restoring it; save the current values before changing them and restore after the test finishes to avoid test interference. Implement helper methods (e.g., SaveCurrentEnvironment and RestoreEnvironment) that capture settings.Locale, settings.ThemeSelection and OCR.uiScaling, call SaveCurrentEnvironment at the start of the test/setup, run the changes in SetupTestEnvironment, and ensure RestoreEnvironment is called in a finally/teardown path (including calling _tesseractService.ReloadEngines after restore) so the original global state is always reinstated. ``` </details> </blockquote></details> <details> <summary>WFInfo/Ocr.cs-1270-1270 (1)</summary><blockquote> `1270-1270`: _⚠️ Potential issue_ | _🟡 Minor_ **Character whitelist for number detection includes non-numeric characters.** This whitelist is used for OCR number detection (`tessedit_char_whitelist`) but includes many alphabetic characters like `âàéèêëïîôùûçÀÉÈÊËÏÎÔÙÛÇäöüßÄÖÜßñáéíóúüÁÉÍÓÚÜçãõẽẽÇÃÕĘĘ`. This appears to be incorrectly copied from a language character set. For counting item quantities, only digits should be needed. <details> <summary>🔧 Suggested fix</summary> ```diff //set OCR to numbers only - _tesseractService.FirstEngine.SetVariable("tessedit_char_whitelist", "0123456789âàéèêëïîôùûçÀÉÈÊËÏÎÔÙÛÇäöüßÄÖÜßñáéíóúüÁÉÍÓÚÜçãõẽẽÇÃÕĘĘ"); + _tesseractService.FirstEngine.SetVariable("tessedit_char_whitelist", "0123456789"); ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@WFInfo/Ocr.cs` at line 1270, The tessedit_char_whitelist used for numeric OCR currently contains many non-digit characters; update the call to _tesseractService.FirstEngine.SetVariable("tessedit_char_whitelist", ...) so the whitelist contains only the numeric digits (e.g. "0123456789") required for quantity detection; locate the existing call to SetVariable in WFInfo/Ocr.cs and replace the long language charset string with the digits-only whitelist (optionally include decimal/comma if your counts may contain them). ``` </details> </blockquote></details> <details> <summary>WFInfo/Tests/OCRTestRunner.cs-337-369 (1)</summary><blockquote> `337-369`: _⚠️ Potential issue_ | _🟡 Minor_ **Unused filtered image variable.** `filteredImage` is created on line 344 but never used. The method then performs OCR on the original `image` instead. <details> <summary>🔧 Either use the filtered image or remove unused variable</summary> ```diff private List<PartMatchResult> ProcessSnapIt(Bitmap image) { var results = new List<PartMatchResult>(); try { - // Use existing SnapIt logic - simulate process - var filteredImage = OCR.ScaleUpAndFilter(image, WFtheme.AUTO, out _, out _); - // Since FindAllParts is private, we'll simulate basic OCR on whole image var ocrText = OCR.GetTextFromImage(image, _tesseractService.FirstEngine); ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@WFInfo/Tests/OCRTestRunner.cs` around lines 337 - 369, In ProcessSnapIt, filteredImage is created but never used; update the OCR call to use the processed bitmap instead of the original image (replace OCR.GetTextFromImage(image, _tesseractService.FirstEngine) with OCR.GetTextFromImage(filteredImage, _tesseractService.FirstEngine)), ensuring any disposal or ownership semantics for filteredImage are respected, or if you intend not to use the filtered result, remove the filteredImage assignment entirely and its out parameters to eliminate the unused variable. ``` </details> </blockquote></details> <details> <summary>WFInfo/LanguageProcessing/PolishLanguageProcessor.cs-21-21 (1)</summary><blockquote> `21-21`: _⚠️ Potential issue_ | _🟡 Minor_ **Incomplete Polish character whitelist.** The character ranges are missing several Polish characters: - `ś` (U+015B) - only `Ś` (U+015A) is included - `ź` (U+0179) and `Ź` (U+017A) - `ż` (U+017B) and `Ż` (U+017C) - `ó` (U+00F3) and `Ó` (U+00D3) <details> <summary>🔧 Suggested fix</summary> ```diff - public override string CharacterWhitelist => "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" + GenerateCharacterRange(0x0104, 0x0107) + GenerateCharacterRange(0x0118, 0x0119) + GenerateCharacterRange(0x0141, 0x0144) + GenerateCharacterRange(0x015A, 0x015A); // Polish with ranges + public override string CharacterWhitelist => "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" + + GenerateCharacterRange(0x0104, 0x0107) + // Ą, ą, Ć, ć + GenerateCharacterRange(0x0118, 0x0119) + // Ę, ę + GenerateCharacterRange(0x0141, 0x0144) + // Ł, ł, Ń, ń + GenerateCharacterRange(0x015A, 0x015B) + // Ś, ś + GenerateCharacterRange(0x0179, 0x017C) + // Ź, ź, Ż, ż + GenerateCharacterRange(0x00D3, 0x00D3) + // Ó + GenerateCharacterRange(0x00F3, 0x00F3); // ó ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/PolishLanguageProcessor.cs` at line 21, The CharacterWhitelist in PolishLanguageProcessor currently omits several Polish characters; update the override of CharacterWhitelist (and the use of GenerateCharacterRange) to include the missing code points for ś (U+015B), ź/Ź (U+0179/U+017A), ż/Ż (U+017B/U+017C) and ó/Ó (U+00F3/U+00D3) — either by expanding the existing GenerateCharacterRange calls to cover those code points or by appending those specific characters explicitly to the string so the whitelist contains both upper- and lower-case forms; modify the CharacterWhitelist property accordingly. ``` </details> </blockquote></details> <details> <summary>WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs-72-88 (1)</summary><blockquote> `72-88`: _⚠️ Potential issue_ | _🟡 Minor_ **Redundant uppercase replacements after `ToLowerInvariant()`.** Same issue as Polish processor - uppercase replacements will never match after `ToLowerInvariant()`. <details> <summary>🔧 Suggested fix</summary> ```diff private static string NormalizeTurkishCharacters(string input) { - // Convert Turkish characters to standard equivalents for comparison return input.ToLowerInvariant() - .Replace('ğ', 'g') - .Replace('Ğ', 'G') - .Replace('ş', 's') - .Replace('Ş', 'S') - .Replace('ç', 'c') - .Replace('Ç', 'C') - .Replace('ö', 'o') - .Replace('Ö', 'O') - .Replace('ü', 'u') - .Replace('Ü', 'U') - .Replace('ı', 'i') - .Replace('İ', 'I'); + .Replace('ğ', 'g') + .Replace('ş', 's') + .Replace('ç', 'c') + .Replace('ö', 'o') + .Replace('ü', 'u') + .Replace('ı', 'i'); } ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs` around lines 72 - 88, The NormalizeTurkishCharacters method performs .ToLowerInvariant() before doing uppercase Replace calls (e.g., 'Ğ'->'G', 'Ş'->'S'), which makes those replacements dead; update NormalizeTurkishCharacters to either remove all uppercase Replace calls and perform only lowercase character replacements, or move the .ToLowerInvariant() call to the end so replacements can handle both cases—apply the change in the NormalizeTurkishCharacters method to ensure correct normalization of Turkish characters. ``` </details> </blockquote></details> <details> <summary>WFInfo/LanguageProcessing/PolishLanguageProcessor.cs-72-92 (1)</summary><blockquote> `72-92`: _⚠️ Potential issue_ | _🟡 Minor_ **Redundant uppercase replacements after `ToLowerInvariant()`.** The method calls `ToLowerInvariant()` first, which converts all characters to lowercase. The subsequent replacements for uppercase Polish characters (e.g., `'Ą'`, `'Ę'`, `'Ć'`) will never match. <details> <summary>🔧 Suggested fix - remove redundant replacements</summary> ```diff private static string NormalizePolishCharacters(string input) { - // Convert Polish characters to standard equivalents for comparison return input.ToLowerInvariant() - .Replace('ą', 'a') - .Replace('Ą', 'A') - .Replace('ę', 'e') - .Replace('Ę', 'E') - .Replace('ć', 'c') - .Replace('Ć', 'C') - .Replace('ł', 'l') - .Replace('Ł', 'L') - .Replace('ś', 's') - .Replace('Ś', 'S') - .Replace('ź', 'z') - .Replace('Ź', 'Z') - .Replace('ż', 'z') - .Replace('Ż', 'Z') - .Replace('ó', 'o') - .Replace('Ó', 'O'); + .Replace('ą', 'a') + .Replace('ę', 'e') + .Replace('ć', 'c') + .Replace('ł', 'l') + .Replace('ś', 's') + .Replace('ź', 'z') + .Replace('ż', 'z') + .Replace('ó', 'o'); } ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/PolishLanguageProcessor.cs` around lines 72 - 92, The NormalizePolishCharacters method calls ToLowerInvariant() so the subsequent .Replace calls for uppercase Polish letters (e.g., 'Ą','Ę','Ć','Ł','Ś','Ź','Ż','Ó') are redundant; remove all uppercase .Replace(...) invocations and keep only the lowercase replacements (or alternatively remove ToLowerInvariant() and keep both cases if case preservation is required) — update the NormalizePolishCharacters method to use only the lowercase replacements that match after ToLowerInvariant() so the normalization logic is correct and non-redundant. ``` </details> </blockquote></details> <details> <summary>WFInfo/Tests/TestProgram.cs-145-152 (1)</summary><blockquote> `145-152`: _⚠️ Potential issue_ | _🟡 Minor_ **Potential `NullReferenceException` when accessing `MissingParts` and `ExtraParts`.** The code accesses `.Count` on `MissingParts` and `ExtraParts` without null checks. If `TestResult` doesn't initialize these lists, this will throw. <details> <summary>🛡️ Proposed fix with null checks</summary> ```diff foreach (var failed in failedTests) { Console.WriteLine($" {failed.TestCaseName}: {failed.ErrorMessage}"); - if (failed.MissingParts.Count > 0) + if (failed.MissingParts?.Count > 0) Console.WriteLine($" Missing: {string.Join(", ", failed.MissingParts)}"); - if (failed.ExtraParts.Count > 0) + if (failed.ExtraParts?.Count > 0) Console.WriteLine($" Extra: {string.Join(", ", failed.ExtraParts)}"); } ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@WFInfo/Tests/TestProgram.cs` around lines 145 - 152, The loop printing failed tests can throw NullReferenceException because it uses failed.MissingParts.Count and failed.ExtraParts.Count without null checks; update the loop in TestProgram.cs to guard those accesses (e.g., replace checks with (failed.MissingParts?.Count ?? 0) > 0 and (failed.ExtraParts?.Count ?? 0) > 0) and pass a non-null sequence to string.Join (e.g., string.Join(", ", failed.MissingParts ?? Enumerable.Empty<string>()) and similarly for ExtraParts), or alternatively ensure TestResult initializes MissingParts and ExtraParts in its constructor; modify the code around the foreach (var failed in failedTests) block accordingly. ``` </details> </blockquote></details> <details> <summary>WFInfo/Tests/TestProgram.cs-168-176 (1)</summary><blockquote> `168-176`: _⚠️ Potential issue_ | _🟡 Minor_ **Provide a working mock implementation for `FirstEngine` in `TestTesseractService`.** The `OCRTestRunner` calls `FirstEngine` three times (lines 313, 347, 378), so returning `NotImplementedException` will crash the tests. `SecondEngine` and `Engines` remain unused and don't need implementation. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@WFInfo/Tests/TestProgram.cs` around lines 168 - 176, Replace the throwing property with a cached, usable Tesseract engine: add a private field (e.g. _firstEngine) and implement FirstEngine to lazily instantiate and return a Tesseract.TesseractEngine (for example new Tesseract.TesseractEngine("./tessdata", "eng", Tesseract.EngineMode.Default)) so calls from OCRTestRunner succeed; keep the instance cached and return the same object on subsequent accesses to avoid repeated construction. ``` </details> </blockquote></details> </blockquote></details> <details> <summary>🧹 Nitpick comments (19)</summary><blockquote> <details> <summary>WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs (1)</summary><blockquote> `77-77`: **Remove duplicated replacement in kana normalization.** Line [77] applies `.Replace('ヶ', 'ケ')` twice. <details> <summary>🧹 Small cleanup</summary> ```diff - result = result.Replace('ヶ', 'ケ').Replace('ヵ', 'カ').Replace('ヶ', 'ケ'); + result = result.Replace('ヶ', 'ケ').Replace('ヵ', 'カ'); ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs` at line 77, In JapaneseLanguageProcessor.cs remove the duplicated Replace call so the kana normalization only replaces 'ヶ' with 'ケ' once; locate the chain that currently calls .Replace('ヶ', 'ケ').Replace('ヵ', 'カ').Replace('ヶ', 'ケ') and change it to a single .Replace('ヶ', 'ケ').Replace('ヵ', 'カ') (within the method in class JapaneseLanguageProcessor where the Replace chain is used). ``` </details> </blockquote></details> <details> <summary>WFInfo/LanguageProcessing/CyrillicLanguageProcessor.cs (1)</summary><blockquote> `112-113`: **Ukrainian blueprint normalization line is currently dead code.** After `ToLower(_culture)` on Line 106, replacing `"(Кресленник)"` never matches. Consider removing or changing to lowercase literal for clarity. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/CyrillicLanguageProcessor.cs` around lines 112 - 113, The Replace call using the mixed-case literal normalized.Replace("(Кресленник)", "(кресленник)") is dead because normalized is already lowercased via ToLower(_culture); update the CyrillicLanguageProcessor normalization by either removing that Replace line entirely or changing the search literal to the lowercase form "(кресленник)" so it actually matches; locate the usage of normalized and the Replace call (the normalized.Replace(...) statement after ToLower(_culture)) and apply the change in that method. ``` </details> </blockquote></details> <details> <summary>WFInfo/LanguageProcessing/EnglishLanguageProcessor.cs (1)</summary><blockquote> `2-2`: **Unused import.** `System.Text.RegularExpressions` is imported but not used in this file. <details> <summary>🧹 Remove unused import</summary> ```diff using System; -using System.Text.RegularExpressions; using WFInfo.Settings; ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/EnglishLanguageProcessor.cs` at line 2, Remove the unused using directive for System.Text.RegularExpressions from the top of the EnglishLanguageProcessor.cs file; locate the using statement (using System.Text.RegularExpressions;) and delete it so the file only keeps required namespaces for the EnglishLanguageProcessor class. ``` </details> </blockquote></details> <details> <summary>WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs (1)</summary><blockquote> `2-2`: **Unused import.** `System.Text.RegularExpressions` is imported but not used. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs` at line 2, Remove the unused using directive for System.Text.RegularExpressions from the TurkishLanguageProcessor.cs file; locate the top-of-file import statement "using System.Text.RegularExpressions;" in the TurkishLanguageProcessor class and delete it (or alternatively, if regex is actually needed later, use it in the relevant method such as any regex-based text processing method), ensuring no unused using remains. ``` </details> </blockquote></details> <details> <summary>WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs (1)</summary><blockquote> `3-3`: **Unused import.** `System.Text.RegularExpressions` is imported but not used in this file. <details> <summary>🧹 Remove unused import</summary> ```diff using System; using System.Text; -using System.Text.RegularExpressions; using WFInfo.Settings; ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs` at line 3, Remove the unused using directive System.Text.RegularExpressions from the top of ThaiLanguageProcessor.cs; locate the using statement "using System.Text.RegularExpressions;" and delete it (or let your IDE auto-organize/remove unused usings) so the file no longer imports an unused namespace. ``` </details> </blockquote></details> <details> <summary>tests/EXTERNAL_DATA_STRUCTURE.md (1)</summary><blockquote> `9-21`: **Add language identifier to fenced code block.** The file structure diagram code block lacks a language identifier. While this is a visual representation rather than code, adding `text` or `plaintext` satisfies linting rules and improves consistency. <details> <summary>📝 Suggested fix</summary> ```diff -``` +```text tests/ ├── map.json # Main test map (scenario references) ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@tests/EXTERNAL_DATA_STRUCTURE.mdaround lines 9 - 21, The fenced code block
showing the tests/ directory tree in EXTERNAL_DATA_STRUCTURE.md is missing a
language identifier; update that fenced block (the triple-backtick block that
begins before "tests/") to include a language tag such as "text" or "plaintext"
(e.g., changetotext) so the linting rules pass and formatting is
consistent.</details> </blockquote></details> <details> <summary>WFInfo/LanguageProcessing/PolishLanguageProcessor.cs (1)</summary><blockquote> `29-37`: **Consider extracting shared `GenerateCharacterRange` helper.** This method is duplicated across multiple language processors (also in `EuropeanLanguageProcessor.cs` and `ThaiLanguageProcessor.cs`). Consider moving it to the base `LanguageProcessor` class or a shared utility. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/PolishLanguageProcessor.cs` around lines 29 - 37, The GenerateCharacterRange method is duplicated across PolishLanguageProcessor, EuropeanLanguageProcessor and ThaiLanguageProcessor; remove the duplicates and extract a single shared implementation into the base LanguageProcessor (or a new internal utility class) as a static or protected helper (e.g., protected static string GenerateCharacterRange(int start, int end) in LanguageProcessor) so all subclasses call the shared method; update each processor to call the centralized method and delete the duplicate implementations to avoid divergence. ``` </details> </blockquote></details> <details> <summary>WFInfo/Tests/OCRTestRunner.cs (1)</summary><blockquote> `545-548`: **Consider extracting `IDataService` interface to a separate file.** The interface is defined at the bottom of the test runner file. For better organization and discoverability, consider moving it to `WFInfo/Services/IDataService.cs` or similar location where other service interfaces reside. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@WFInfo/Tests/OCRTestRunner.cs` around lines 545 - 548, Move the IDataService interface out of the bottom of OCRTestRunner.cs into its own file containing the public interface IDataService so it can be discovered and reused; create a new file with the appropriate namespace, paste the interface signature (string GetPartName(string name, out int low, bool suppressLogging, out bool multipleLowest)), update OCRTestRunner (and any other consumers) to reference the new interface file via the correct namespace/import, and ensure the project/compile includes the new file. ``` </details> </blockquote></details> <details> <summary>tests/BUILD_INSTRUCTIONS.md (1)</summary><blockquote> `45-52`: **Add language identifier to fenced code block.** The code block ending at line 52 lacks a language identifier. This should be `bash` to match the surrounding context. <details> <summary>📝 Suggested fix</summary> ```diff -``` +```bash # Create new test project dotnet new console -n WFInfo.Tests -f net48 # Copy test files to new project # Copy Tests/ folder to WFInfo.Tests/ # Add necessary references to WFInfo.Tests.csproj ``` ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@tests/BUILD_INSTRUCTIONS.mdaround lines 45 - 52, The fenced code block that
shows shell commands is missing a language identifier; update the opening fence
fromtobash so the block is marked as bash (i.e., add the "bash"
identifier to the triple-backtick that precedes the lines starting with "#
Create new test project"), ensuring syntax highlighting and consistency with
surrounding blocks.</details> </blockquote></details> <details> <summary>WFInfo/Ocr.cs (1)</summary><blockquote> `1006-1023`: **Zone fragmentation threshold logic has redundant fallback.** When `zones.Count > 12`, the code creates a new single-zone list but then immediately sets `snapThreads = 1` in the next condition (`zones.Count > 8`). The comment says "Keep the zones but process them single-threaded" but the zones were just replaced with a single zone. Consider simplifying: <details> <summary>♻️ Suggested simplification</summary> ```diff zones = DivideSnapZones(filteredImage, filteredImageClean, rowHits, colHits); - // Fallback to single-threaded for large layouts to avoid threading issues - if (zones.Count > 12) // Too many zones means fragmentation is occurring + if (zones.Count > 12) // Too many zones - fragmentation; fallback to single zone { - // Fallback to single-threaded for large layouts to avoid threading issues zones = new List<Tuple<Bitmap, Rectangle>>(); zones.Add( Tuple.Create(filteredImageClean, new Rectangle(0, 0, filteredImageClean.Width, filteredImageClean.Height) ) ); snapThreads = 1; - // Keep the zones but process them single-threaded } - else if (zones.Count > 8) // Large but reasonable number of zones + else if (zones.Count > 8) // Large layout - use single thread { - // Large but reasonable number of zones snapThreads = 1; } else ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@WFInfo/Ocr.cs` around lines 1006 - 1023, The current logic mutates zones then re-reads zones.Count which can be confusing and leads to redundant/ambiguous behavior; capture the original count into a local (e.g. int originalZoneCount = zones.Count) and base the branching on that value: if originalZoneCount > 12 then replace zones with the single full-image tuple (using filteredImageClean and Rectangle) and set snapThreads = 1; else if originalZoneCount > 8 set snapThreads = 1; else set snapThreads = 4. This preserves the intended "replace zones only when >12" behavior and avoids reading the modified zones.Count when deciding snapThreads. ``` </details> </blockquote></details> <details> <summary>WFInfo/Tests/TestModels.cs (2)</summary><blockquote> `69-87`: **Dictionary and List properties should be initialized.** Similar to `TestResult`, the collection properties in `TestSuiteResult` should be initialized to prevent null reference issues. <details> <summary>♻️ Initialize collection properties</summary> ```diff public class TestSuiteResult { public string TestSuiteName { get; set; } public DateTime StartTime { get; set; } public DateTime EndTime { get; set; } - public List<TestResult> TestResults { get; set; } + public List<TestResult> TestResults { get; set; } = new List<TestResult>(); public int TotalTests { get; set; } public int PassedTests { get; set; } public int FailedTests { get; set; } public double OverallAccuracy { get; set; } public double PassRate { get; set; } - public Dictionary<string, double> LanguageAccuracy { get; set; } - public Dictionary<string, double> ThemeAccuracy { get; set; } - public Dictionary<string, double> CategoryAccuracy { get; set; } - public Dictionary<string, TestCoverage> CategoryCoverage { get; set; } - public Dictionary<string, TestCoverage> LanguageCoverage { get; set; } + public Dictionary<string, double> LanguageAccuracy { get; set; } = new Dictionary<string, double>(); + public Dictionary<string, double> ThemeAccuracy { get; set; } = new Dictionary<string, double>(); + public Dictionary<string, double> CategoryAccuracy { get; set; } = new Dictionary<string, double>(); + public Dictionary<string, TestCoverage> CategoryCoverage { get; set; } = new Dictionary<string, TestCoverage>(); + public Dictionary<string, TestCoverage> LanguageCoverage { get; set; } = new Dictionary<string, TestCoverage>(); public TestCoverage OverallCoverage { get; set; } public string ErrorMessage { get; set; } } ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@WFInfo/Tests/TestModels.cs` around lines 69 - 87, The collection properties on TestSuiteResult are not initialized which can cause NREs; update the TestSuiteResult class to initialize TestResults (List<TestResult>), LanguageAccuracy, ThemeAccuracy, CategoryAccuracy (Dictionary<string,double>), CategoryCoverage, LanguageCoverage (Dictionary<string,TestCoverage>) either via inline property initializers or in a parameterless constructor so they default to empty collections; leave scalar properties like TotalTests/OverallCoverage/ErrorMessage as-is unless you also want to set a sensible default. ``` </details> --- `46-58`: **Collection properties should be initialized to prevent `NullReferenceException`.** `ExpectedParts`, `ActualParts`, `MissingParts`, and `ExtraParts` are not initialized. Callers accessing these without initialization will throw. Consider initializing them inline. <details> <summary>♻️ Initialize collection properties</summary> ```diff public class TestResult { public string TestCaseName { get; set; } public string ImagePath { get; set; } public bool Success { get; set; } - public List<PartMatchResult> ExpectedParts { get; set; } - public List<PartMatchResult> ActualParts { get; set; } - public List<string> MissingParts { get; set; } - public List<string> ExtraParts { get; set; } + public List<PartMatchResult> ExpectedParts { get; set; } = new List<PartMatchResult>(); + public List<PartMatchResult> ActualParts { get; set; } = new List<PartMatchResult>(); + public List<string> MissingParts { get; set; } = new List<string>(); + public List<string> ExtraParts { get; set; } = new List<string>(); public double AccuracyScore { get; set; } public long ProcessingTimeMs { get; set; } public string ErrorMessage { get; set; } } ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@WFInfo/Tests/TestModels.cs` around lines 46 - 58, The collection properties on the TestResult class (ExpectedParts, ActualParts, MissingParts, ExtraParts) are not initialized and can be null; update the TestResult definition to initialize these properties inline to empty lists (e.g., ExpectedParts = new List<PartMatchResult>(), ActualParts = new List<PartMatchResult>(), MissingParts = new List<string>(), ExtraParts = new List<string>), keeping the property types and names unchanged so callers can safely enumerate or add items without null checks. ``` </details> </blockquote></details> <details> <summary>WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs (2)</summary><blockquote> `85-93`: **`GenerateCharacterRange` is duplicated across processor classes.** This utility method is also duplicated in `ChineseLanguageProcessor.cs`. Consider moving it to the base `LanguageProcessor` class or a shared utility. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs` around lines 85 - 93, The GenerateCharacterRange method is duplicated (also present in ChineseLanguageProcessor); move this utility into the shared base so there's a single implementation: add GenerateCharacterRange as a protected static method on the LanguageProcessor base class (or a shared utility class used by both processors), delete the duplicate in EuropeanLanguageProcessor.cs and ChineseLanguageProcessor.cs, and update any references to call LanguageProcessor.GenerateCharacterRange (or the new utility) to ensure consumers still compile and behavior is unchanged. ``` </details> --- `57-77`: **Redundant uppercase diacritic handling.** Lines 69-76 handle uppercase accented characters, but line 60 already converts the input to lowercase via `ToLowerInvariant()`. The uppercase replacements will never match. <details> <summary>♻️ Remove redundant uppercase handling</summary> ```diff protected static string NormalizeEuropeanCharacters(string input) { // Convert common European diacritics to standard equivalents for comparison return input.ToLowerInvariant() .Replace('à', 'a').Replace('á', 'a').Replace('â', 'a').Replace('ã', 'a').Replace('ä', 'a').Replace('å', 'a') .Replace('è', 'e').Replace('é', 'e').Replace('ê', 'e').Replace('ë', 'e') .Replace('ì', 'i').Replace('í', 'i').Replace('î', 'i').Replace('ï', 'i') .Replace('ò', 'o').Replace('ó', 'o').Replace('ô', 'o').Replace('õ', 'o').Replace('ö', 'o') .Replace('ù', 'u').Replace('ú', 'u').Replace('û', 'u').Replace('ü', 'u') .Replace('ñ', 'n') .Replace('ç', 'c') - .Replace('ÿ', 'y') - .Replace('À', 'A').Replace('Á', 'A').Replace('Â', 'A').Replace('Ã', 'A').Replace('Ä', 'A').Replace('Å', 'A') - .Replace('È', 'E').Replace('É', 'E').Replace('Ê', 'E').Replace('Ë', 'E') - .Replace('Ì', 'I').Replace('Í', 'I').Replace('Î', 'I').Replace('Ï', 'I') - .Replace('Ò', 'O').Replace('Ó', 'O').Replace('Ô', 'O').Replace('Õ', 'O').Replace('Ö', 'O') - .Replace('Ù', 'U').Replace('Ú', 'U').Replace('Û', 'U').Replace('Ü', 'U') - .Replace('Ñ', 'N') - .Replace('Ç', 'C') - .Replace('Ÿ', 'Y'); + .Replace('ÿ', 'y'); } ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs` around lines 57 - 77, The NormalizeEuropeanCharacters method lowercases the input with ToLowerInvariant(), so the subsequent Replace calls for uppercase accented characters (the replacements using 'À','Á',...,'Ÿ') are unreachable; remove those redundant uppercase Replace chains from NormalizeEuropeanCharacters while keeping the initial ToLowerInvariant() and all lowercase diacritic replacements intact to preserve behavior; locate the uppercase replacement sequences in the NormalizeEuropeanCharacters method and delete them. ``` </details> </blockquote></details> <details> <summary>WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs (2)</summary><blockquote> `21-21`: **Large character whitelist may impact performance.** `GenerateCharacterRange(0x4E00, 0x9FAF)` generates a string with ~20,000 characters. If this whitelist is used frequently (e.g., in Tesseract configuration), consider caching the generated string as a static readonly field. <details> <summary>♻️ Cache the character whitelist</summary> ```diff +private static readonly string ChineseCharacterRange = GenerateCharacterRange(0x4E00, 0x9FAF); +private static readonly string _characterWhitelist = ChineseCharacterRange + "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"; -public override string CharacterWhitelist => GenerateCharacterRange(0x4E00, 0x9FAF) + "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"; +public override string CharacterWhitelist => _characterWhitelist; ``` </details> Also applies to: 92-92 <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs` at line 21, The CharacterWhitelist property in ChineseLanguageProcessor currently builds a ~20k-char string each access via GenerateCharacterRange(0x4E00, 0x9FAF) which can hurt performance; change this to compute and store the whitelist once (e.g., as a static readonly string field) and have the CharacterWhitelist getter return that cached field combined with the ASCII alphanumerics, ensuring GenerateCharacterRange is only called during static initialization. ``` </details> --- `11-147`: **Significant code duplication between Simplified and Traditional Chinese processors.** Both processors share nearly identical implementations for: - `GenerateCharacterRange` (also duplicated in `EuropeanLanguageProcessor.cs`) - `NormalizeForPatternMatching` - `IsPartNameValid` - `NormalizeChineseCharacters` Consider extracting a `ChineseLanguageProcessorBase` class to reduce duplication and maintenance burden, similar to how `EuropeanLanguageProcessorBase` consolidates European languages. <details> <summary>♻️ Proposed base class extraction</summary> ```csharp public abstract class ChineseLanguageProcessorBase : LanguageProcessor { protected ChineseLanguageProcessorBase(IReadOnlyApplicationSettings settings) : base(settings) { } protected static string GenerateCharacterRange(int start, int end) { var chars = new char[end - start + 1]; for (int i = 0; i <= end - start; i++) { chars[i] = (char)(start + i); } return new string(chars); } public override string NormalizeForPatternMatching(string input) { if (string.IsNullOrEmpty(input)) return input; string normalized = input.ToLower(_culture).Trim(); normalized = normalized.Replace("prime", " prime "); normalized = RemoveAccents(normalized); var parts = normalized.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries); return string.Join(" ", parts); } public override bool IsPartNameValid(string partName) { return !string.IsNullOrEmpty(partName) && partName.Replace(" ", "").Length >= 4; } protected static string NormalizeChineseCharacters(string input) { return NormalizeFullWidthCharacters(input).ToLowerInvariant(); } } ``` Then both processors simply inherit and provide locale-specific overrides for `Locale`, `BlueprintRemovals`, and `CharacterWhitelist`. </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs` around lines 11 - 147, There’s duplicated logic in SimplifiedChineseLanguageProcessor and TraditionalChineseLanguageProcessor (GenerateCharacterRange, NormalizeForPatternMatching, IsPartNameValid, NormalizeChineseCharacters); extract a ChineseLanguageProcessorBase (similar to EuropeanLanguageProcessorBase) that implements GenerateCharacterRange, NormalizeForPatternMatching, IsPartNameValid, and NormalizeChineseCharacters, have both SimplifiedChineseLanguageProcessor and TraditionalChineseLanguageProcessor inherit that base and only override Locale, BlueprintRemovals, and CharacterWhitelist (using the base GenerateCharacterRange) to remove the duplicated implementations. ``` </details> </blockquote></details> <details> <summary>WFInfo/LanguageProcessing/LanguageProcessorFactory.cs (2)</summary><blockquote> `68-88`: **Consider caching the supported locales array.** `GetSupportedLocales()` allocates a new array on every call. Since this is static data, caching it would reduce allocations. <details> <summary>♻️ Proposed optimization</summary> ```diff +private static readonly string[] _supportedLocales = new[] +{ + "en", "ko", "ja", "zh-hans", "zh-hant", "th", "ru", "uk", "tr", "pl", "fr", "de", "es", "pt", "it" +}; public static string[] GetSupportedLocales() { - return new[] - { - "en", // English - "ko", // Korean - "ja", // Japanese - "zh-hans", // Simplified Chinese - "zh-hant", // Traditional Chinese - "th", // Thai - "ru", // Russian - "uk", // Ukrainian - "tr", // Turkish - "pl", // Polish - "fr", // French - "de", // German - "es", // Spanish - "pt", // Portuguese - "it" // Italian - }; + return _supportedLocales; } ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/LanguageProcessorFactory.cs` around lines 68 - 88, GetSupportedLocales currently allocates a new string[] on each call; add a private static readonly cached field in LanguageProcessorFactory (e.g. private static readonly string[] _supportedLocales = { ... }) and change GetSupportedLocales() to return that cached array (return _supportedLocales;) so callers reuse the same instance and avoid repeated allocations; ensure the field is initialized inline (static readonly) for thread-safe one-time setup. ``` </details> --- `22-28`: **Consider clearing the processor cache when re-initializing with new settings.** If `Initialize()` is called multiple times (e.g., when settings change), cached processors will retain old settings since the cache isn't cleared. This could cause stale behavior. <details> <summary>♻️ Proposed fix to clear cache on re-initialization</summary> ```diff public static void Initialize(IReadOnlyApplicationSettings settings) { if (settings == null) throw new ArgumentNullException(nameof(settings)); + lock (_lock) + { + _processors.Clear(); + } _settings = settings; } ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/LanguageProcessorFactory.cs` around lines 22 - 28, The Initialize method sets _settings but does not clear the processor cache, so when Initialize(IReadOnlyApplicationSettings settings) is called with new settings any cached processors will remain stale; update Initialize to reset/clear the processor cache (e.g., call Clear() on the cache collection or reassign a new dictionary/list) after setting _settings so any cached entries (the processor cache field such as _processors or _processorCache) are dropped and new processors will be created with the new settings. ``` </details> </blockquote></details> <details> <summary>WFInfo/Tests/TestProgram.cs (1)</summary><blockquote> `137-143`: **Consider using LINQ for filtering failed tests.** The manual loop can be simplified with LINQ's `Where()` clause for better readability. <details> <summary>♻️ Simplified with LINQ</summary> ```diff Console.WriteLine("Failed Tests:"); -var failedTests = new System.Collections.Generic.List<TestResult>(); -foreach (var test in results.TestResults) -{ - if (!test.Success) - failedTests.Add(test); -} +var failedTests = results.TestResults.Where(t => !t.Success).ToList(); ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@WFInfo/Tests/TestProgram.cs` around lines 137 - 143, Replace the manual loop that builds failedTests by using LINQ to filter results.TestResults; specifically, set failedTests to results.TestResults.Where(t => !t.Success).ToList() (or use System.Linq extension methods) so you remove the foreach and make the intent clearer while preserving the failedTests List<TestResult> variable. ``` </details> </blockquote></details> </blockquote></details> --- <details> <summary>ℹ️ Review info</summary> **Configuration used**: defaults **Review profile**: CHILL **Plan**: Pro <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between b167a2c3b03e02500c062c49eaf644622ff648f4 and 8e8a880519fbcffb9a11d3b01a5cf194fa66a6a1. </details> <details> <summary>📒 Files selected for processing (31)</summary> * `WFInfo/CustomEntrypoint.cs` * `WFInfo/Data.cs` * `WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs` * `WFInfo/LanguageProcessing/CyrillicLanguageProcessor.cs` * `WFInfo/LanguageProcessing/EnglishLanguageProcessor.cs` * `WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs` * `WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs` * `WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs` * `WFInfo/LanguageProcessing/LanguageProcessor.cs` * `WFInfo/LanguageProcessing/LanguageProcessorFactory.cs` * `WFInfo/LanguageProcessing/PolishLanguageProcessor.cs` * `WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs` * `WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs` * `WFInfo/Ocr.cs` * `WFInfo/Properties/AssemblyInfo.cs` * `WFInfo/Services/TesseractService.cs` * `WFInfo/Settings/SettingsWindow.xaml` * `WFInfo/Tests/OCRTestRunner.cs` * `WFInfo/Tests/TestModels.cs` * `WFInfo/Tests/TestProgram.cs` * `tests/BUILD_INSTRUCTIONS.md` * `tests/COVERAGE_FEATURES.md` * `tests/EXTERNAL_DATA_STRUCTURE.md` * `tests/README.md` * `tests/TEST_EXECUTION_FLOW.md` * `tests/data/test1.json` * `tests/data/test2.json` * `tests/data/test3.json` * `tests/map.json` * `tests/run_tests.bat` * `tests/usage_example.md` </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
WFInfo/CustomEntrypoint.cs (1)
89-91: Tighten test-mode detection to avoid accidental activation.Line 91 treats any first argument containing
"map"as test mode, which can misroute normal launches when paths/args include that substring.Suggested adjustment
- if (args.Length >= 2 && - (args[0].EndsWith(".json", StringComparison.OrdinalIgnoreCase) || - args[0].IndexOf("map", StringComparison.OrdinalIgnoreCase) >= 0)) + if (args.Length >= 2 && + args[0].EndsWith(".json", StringComparison.OrdinalIgnoreCase))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/CustomEntrypoint.cs` around lines 89 - 91, The current test-mode detection in the if condition (inspecting args[0]) treats any argument containing "map" as test mode; update the check in CustomEntrypoint.cs so it only triggers for explicit test-mode tokens (e.g., args[0].Equals("map", StringComparison.OrdinalIgnoreCase) or args[0].Equals("-map", StringComparison.OrdinalIgnoreCase) or args[0].Equals("--map", StringComparison.OrdinalIgnoreCase)) or for well-defined prefixes like args[0].StartsWith("map:", StringComparison.OrdinalIgnoreCase) rather than using IndexOf("map") >= 0; modify the if condition that references args to use these exact-token/prefix checks so normal paths containing "map" are not misdetected.WFInfo/LanguageProcessing/CyrillicLanguageProcessor.cs (2)
68-76: DuplicateGenerateCharacterRangemethod in both classes.The identical helper method is defined in both
RussianLanguageProcessor(lines 68-76) andUkrainianLanguageProcessor(lines 138-146). Consider moving it to theLanguageProcessorbase class as aprotected staticmethod to eliminate duplication.♻️ Proposed refactor
Move to
LanguageProcessor.cs:protected static string GenerateCharacterRange(int start, int end) { var chars = new char[end - start + 1]; for (int i = 0; i <= end - start; i++) { chars[i] = (char)(start + i); } return new string(chars); }Then remove the private copies from both Cyrillic processors.
Also applies to: 138-146
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/CyrillicLanguageProcessor.cs` around lines 68 - 76, Duplicate private method GenerateCharacterRange exists in RussianLanguageProcessor and UkrainianLanguageProcessor; move a single protected static GenerateCharacterRange(int start, int end) implementation into the LanguageProcessor base class and remove the private copies from both Cyrillic processors, then update any internal calls in RussianLanguageProcessor and UkrainianLanguageProcessor to use the inherited protected static method (no signature changes required).
93-93: Ukrainian-specific ranges are redundant.The ranges
0x0490-0x0491,0x0406-0x0407,0x0456-0x0457, and0x0492-0x0493are all within0x0400-0x04FF(Cyrillic block). These additionalGenerateCharacterRangecalls add duplicate characters to the whitelist.Consider simplifying to just the two main ranges plus symbols:
♻️ Proposed simplification
- public override string CharacterWhitelist => GenerateCharacterRange(0x0400, 0x04FF) + GenerateCharacterRange(0x0500, 0x052F) + GenerateCharacterRange(0x0490, 0x0491) + GenerateCharacterRange(0x0406, 0x0407) + GenerateCharacterRange(0x0456, 0x0457) + GenerateCharacterRange(0x0492, 0x0493) + "0123456789:-()"; // Cyrillic + Ukrainian specific + public override string CharacterWhitelist => GenerateCharacterRange(0x0400, 0x04FF) + GenerateCharacterRange(0x0500, 0x052F) + "0123456789:-()"; // Cyrillic + Cyrillic Supplement (includes Ukrainian-specific characters)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/CyrillicLanguageProcessor.cs` at line 93, The CharacterWhitelist property in CyrillicLanguageProcessor.cs currently includes redundant Ukrainian-specific ranges that are already covered by GenerateCharacterRange(0x0400, 0x04FF); remove the extra GenerateCharacterRange calls for 0x0490-0x0491, 0x0406-0x0407, 0x0456-0x0457, and 0x0492-0x0493 so CharacterWhitelist becomes GenerateCharacterRange(0x0400, 0x04FF) + GenerateCharacterRange(0x0500, 0x052F) + "0123456789:-()"; update the CharacterWhitelist getter in the CyrillicLanguageProcessor class accordingly to avoid duplicate characters.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@WFInfo/LanguageProcessing/CyrillicLanguageProcessor.cs`:
- Around line 106-112: In CyrillicLanguageProcessor, the
Replace("(Кресленник)","(кресленник)") call is dead because input is already
lowercased by input.ToLower(_culture) into the variable normalized; remove that
Replace line (or if the intent was to transform the original casing before
lowercasing, move the Replace to operate on input before calling
input.ToLower(_culture)). Target the block where normalized is assigned (the
input.ToLower(_culture).Trim() line and the subsequent Replace) and either
delete the Replace or relocate it before the ToLower(_culture) call.
---
Nitpick comments:
In `@WFInfo/CustomEntrypoint.cs`:
- Around line 89-91: The current test-mode detection in the if condition
(inspecting args[0]) treats any argument containing "map" as test mode; update
the check in CustomEntrypoint.cs so it only triggers for explicit test-mode
tokens (e.g., args[0].Equals("map", StringComparison.OrdinalIgnoreCase) or
args[0].Equals("-map", StringComparison.OrdinalIgnoreCase) or
args[0].Equals("--map", StringComparison.OrdinalIgnoreCase)) or for well-defined
prefixes like args[0].StartsWith("map:", StringComparison.OrdinalIgnoreCase)
rather than using IndexOf("map") >= 0; modify the if condition that references
args to use these exact-token/prefix checks so normal paths containing "map" are
not misdetected.
In `@WFInfo/LanguageProcessing/CyrillicLanguageProcessor.cs`:
- Around line 68-76: Duplicate private method GenerateCharacterRange exists in
RussianLanguageProcessor and UkrainianLanguageProcessor; move a single protected
static GenerateCharacterRange(int start, int end) implementation into the
LanguageProcessor base class and remove the private copies from both Cyrillic
processors, then update any internal calls in RussianLanguageProcessor and
UkrainianLanguageProcessor to use the inherited protected static method (no
signature changes required).
- Line 93: The CharacterWhitelist property in CyrillicLanguageProcessor.cs
currently includes redundant Ukrainian-specific ranges that are already covered
by GenerateCharacterRange(0x0400, 0x04FF); remove the extra
GenerateCharacterRange calls for 0x0490-0x0491, 0x0406-0x0407, 0x0456-0x0457,
and 0x0492-0x0493 so CharacterWhitelist becomes GenerateCharacterRange(0x0400,
0x04FF) + GenerateCharacterRange(0x0500, 0x052F) + "0123456789:-()"; update the
CharacterWhitelist getter in the CyrillicLanguageProcessor class accordingly to
avoid duplicate characters.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
.gitattributesWFInfo/CustomEntrypoint.csWFInfo/LanguageProcessing/CyrillicLanguageProcessor.csWFInfo/LanguageProcessing/TurkishLanguageProcessor.cs
🚧 Files skipped from review as they are similar to previous changes (1)
- WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs
There was a problem hiding this comment.
Actionable comments posted: 12
♻️ Duplicate comments (2)
tests/run_tests.bat (2)
19-20:⚠️ Potential issue | 🟠 MajorQuote
SETassignments to avoid metacharacter injection and path breakage.Line [19] and Line [20] still use unquoted
setforms. Incmd, this can misparse values containing&,|,(,)and break valid paths.Suggested hardening patch
-set TEST_IMAGES_DIR=%~1 -if "%TEST_IMAGES_DIR%"=="" set TEST_IMAGES_DIR=data +set "TEST_IMAGES_DIR=%~1" +if "%TEST_IMAGES_DIR%"=="" set "TEST_IMAGES_DIR=data"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/run_tests.bat` around lines 19 - 20, Change the unquoted SET assignments to quoted form to prevent cmd metacharacter injection and path-breaking: replace set TEST_IMAGES_DIR=%~1 with set "TEST_IMAGES_DIR=%~1" and replace set TEST_IMAGES_DIR=data (used when TEST_IMAGES_DIR is empty) with set "TEST_IMAGES_DIR=data" (or change the if-block to if "%TEST_IMAGES_DIR%"=="" set "TEST_IMAGES_DIR=data"); reference the TEST_IMAGES_DIR environment variable and the SET statements in the batch script when making this change.
32-36:⚠️ Potential issue | 🟠 MajorMake results filename generation locale-safe and always quote the output path.
Line [32]/Line [36] depend on
%date%/%time%substring positions (locale-dependent), and the output filename is unquoted at execution. Single-digit hours can introduce a leading space and split arguments.Suggested robust patch
-echo Output: test_results_%date:~-4,4%%date:~-10,2%%date:~-7,2%_%time:~0,2%%time:~3,2%%time:~6,2%.json +set "HH=%time:~0,2%" +if "%HH:~0,1%"==" " set "HH=0%HH:~1,1%" +set "RESULT_FILE=test_results_%date:~-4,4%%date:~-10,2%%date:~-7,2%_%HH%%time:~3,2%%time:~6,2%.json" +echo Output: %RESULT_FILE% @@ -..\bin\Release\net48\WFInfo.exe map.json "%TEST_IMAGES_DIR%" test_results_%date:~-4,4%%date:~-10,2%%date:~-7,2%_%time:~0,2%%time:~3,2%%time:~6,2%.json +"..\bin\Release\net48\WFInfo.exe" "map.json" "%TEST_IMAGES_DIR%" "%RESULT_FILE%"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/run_tests.bat` around lines 32 - 36, The current echo and WFInfo.exe invocation rely on locale-specific %DATE%/%TIME% substrings and leave the output filename unquoted; replace those lines to generate a locale-safe timestamp (e.g., call PowerShell: for /f "usebackq delims=" %%T in (`powershell -NoProfile -Command "Get-Date -Format 'yyyyMMdd_HHmmss'"`) do set TIMESTAMP=%%T) and then use "%TIMESTAMP%" when building filenames, and always wrap the output path/filename and "%TEST_IMAGES_DIR%" in quotes in both the echo and the WFInfo.exe call so single-digit hours or spaces don't split arguments (update the echo line and the ..\bin\Release\net48\WFInfo.exe map.json "%TEST_IMAGES_DIR%" "test_results_%TIMESTAMP%.json" accordingly).
🧹 Nitpick comments (4)
WFInfo/errorDialogue.xaml.cs (1)
50-74: Consider using standardforeachloops and clearer path construction.The
.Where(...).ToList().ForEach(...)pattern creates unnecessary intermediate lists. Standardforeachloops are more idiomatic in C# and avoid the allocation. Additionally, usingPath.GetDirectoryName(startPath)would be clearer than the\..\navigation.♻️ Suggested refactor
- // Priority files: debug.log and settings JSON files - var priorityFiles = new List<string> - { - startPath + @"\..\debug.log", - startPath + @"\..\settings.json" - }; - - // Other data files - var otherDataFiles = new List<string> - { - startPath + @"\..\eqmt_data.json", - startPath + @"\..\market_data.json", - startPath + @"\..\market_items.json", - startPath + @"\..\name_data.json", - startPath + @"\..\relic_data.json" - }; - - // Add priority files first - priorityFiles.Where(path => File.Exists(path)).ToList().ForEach(filename => zip.AddFile(filename, "")); - - // Add other data files - otherDataFiles.Where(path => File.Exists(path)).ToList().ForEach(filename => zip.AddFile(filename, "")); - - // Add debug folder files last - files.Select(file => file.FullName).ToList().ForEach(filename => zip.AddFile(filename, "")); + var appPath = Path.GetDirectoryName(startPath); + + // Priority files: debug.log and settings JSON files + var priorityFiles = new[] + { + Path.Combine(appPath, "debug.log"), + Path.Combine(appPath, "settings.json") + }; + + // Other data files + var otherDataFiles = new[] + { + Path.Combine(appPath, "eqmt_data.json"), + Path.Combine(appPath, "market_data.json"), + Path.Combine(appPath, "market_items.json"), + Path.Combine(appPath, "name_data.json"), + Path.Combine(appPath, "relic_data.json") + }; + + // Add priority files first + foreach (var path in priorityFiles.Where(File.Exists)) + zip.AddFile(path, ""); + + // Add other data files + foreach (var path in otherDataFiles.Where(File.Exists)) + zip.AddFile(path, ""); + + // Add debug folder files last + foreach (var file in files) + zip.AddFile(file.FullName, "");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/errorDialogue.xaml.cs` around lines 50 - 74, Replace the LINQ .Where(...).ToList().ForEach(...) allocations with simple foreach loops: build the parent directory using Path.GetDirectoryName(startPath) and construct each candidate path with Path.Combine (referencing priorityFiles and otherDataFiles construction) then use if (File.Exists(path)) zip.AddFile(path, ""); do the same for the files collection by iterating files (the FileInfo list) and calling zip.AddFile(file.FullName, ""); update the code locations that call zip.AddFile to use these foreach loops and Path.Combine instead of string "\..\” concatenation.WFInfo/SnapItOverlay.xaml.cs (1)
81-92: ReuseResetRectangle()insidecloseOverlay()to prevent state drift.
closeOverlay()duplicates the same rectangle reset/removal logic already inResetRectangle(). Consolidating this avoids future divergence.♻️ Proposed refactor
public void closeOverlay() { - rectangle.Width = 0; - rectangle.Height = 0; - rectangle.RenderTransform = new TranslateTransform(0, 0); - rectangle.Visibility = Visibility.Hidden; - - // Properly clean up canvas by removing the rectangle - if (canvas.Children.Contains(rectangle)) - { - canvas.Children.Remove(rectangle); - } + ResetRectangle(); Topmost = false; isEnabled = false; // Force immediate hide without delay to prevent rectangle persistence Hide(); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/SnapItOverlay.xaml.cs` around lines 81 - 92, The closeOverlay method duplicates rectangle cleanup logic; replace its manual reset/remove steps with a single call to ResetRectangle() to avoid drift—update closeOverlay() to call ResetRectangle() (which handles setting rectangle.Width/Height, RenderTransform, Visibility and removing from canvas) and remove the duplicated block that manipulates rectangle and canvas directly.WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs (1)
64-64: Remove duplicate kana replacement call.Line 64 repeats
.Replace('ヶ', 'ケ')twice; keep one for clarity.🧹 Proposed cleanup
- result = result.Replace('ヶ', 'ケ').Replace('ヵ', 'カ').Replace('ヶ', 'ケ'); + result = result.Replace('ヶ', 'ケ').Replace('ヵ', 'カ');🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs` at line 64, The chained string replacements on the variable result contain a duplicated call to .Replace('ヶ', 'ケ'); update the code in JapaneseLanguageProcessor (where result is manipulated) to remove the redundant .Replace('ヶ', 'ケ') call so each kana mapping appears only once (keep one .Replace('ヶ', 'ケ') alongside the existing .Replace('ヵ', 'カ')).WFInfo/LanguageProcessing/LanguageProcessor.cs (1)
295-299: Add a null/empty guard inNormalizeFullWidthCharacters.At Line 297,
input.Lengthwill throw if this helper ever receives null. This utility is shared enough to warrant defensive handling.🛡️ Proposed fix
protected static string NormalizeFullWidthCharacters(string input) { + if (string.IsNullOrEmpty(input)) return input; var result = new System.Text.StringBuilder(input.Length);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/LanguageProcessor.cs` around lines 295 - 299, NormalizeFullWidthCharacters should guard against null/empty inputs before using input.Length; add a check at the start of the method (e.g., if string.IsNullOrEmpty(input)) and return the original input (or String.Empty when input is null) so you don't call input.Length on null. This keeps the current behavior for empty strings and prevents NullReferenceException while leaving the rest of the method (the StringBuilder usage and loop over chars) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/README.md`:
- Line 78: Replace the incorrect Japanese script string in the language list
entry for "Japanese (`japanese`)" by changing the displayed text from "日本语" to
the correct "日本語" in README.md so the language label uses the proper kanji;
update the single line that contains "**Japanese** (`japanese`) - 日本语" to use
"**Japanese** (`japanese`) - 日本語".
In `@tests/run_tests.bat`:
- Around line 10-15: Replace the silent "goto :eof" exits in the setup failure
branches with explicit non-zero exit codes so CI sees failures; locate the
failing-paths that currently end with "goto :eof" (the branches that echo
"ERROR: map.json not found..." and the other setup failure branch) and change
them to use "exit /b 2" for the first failure and "exit /b 3" for the other so
the script returns distinct non-zero exit codes instead of possibly 0.
In `@tests/usage_example.md`:
- Line 50: The language list in the sentence "automated regression testing for
all supported languages (English, Korean, Japanese, Chinese
Simplified/Traditional, French, Ukrainian, Italian, German, Spanish, Portuguese,
Polish, Russian)" incorrectly includes Japanese; update that parenthetical (or
the surrounding sentence) so it excludes Japanese and matches the PR objective
"except Thai, Japanese, and Turkish" — e.g., remove "Japanese" from the language
list or add a clarifying clause "except Thai, Japanese, and Turkish" to the
sentence.
In `@WFInfo/CustomEntrypoint.cs`:
- Around line 89-104: The flag-style map arguments (-map, --map, map) are being
allowed but not normalized before calling TestProgram.RunTests, so when args[0]
is a flag the test runner treats the flag as the JSON path; update the code in
CustomEntrypoint to detect if args[0] is one of the map flags and, if so,
remove/shift that element (e.g., replace args with args.Skip(1).ToArray() or
construct a new array where args[1] becomes args[0] and the rest follow) so that
TestProgram.RunTests receives the actual JSON map path as args[0]; ensure this
normalization covers "map:" prefix handling as well before invoking
TestProgram.RunTests.
In `@WFInfo/Data.cs`:
- Around line 212-224: In ReloadItems, stop silently falling back to English
when item["i18n"][_settings.Locale]["name"] is missing: instead of using
Main.AddLog + English fallback, validate presence of item["i18n"],
item["i18n"][_settings.Locale], and the "name" field and throw a clear exception
(or propagate one) that includes itemId and _settings.Locale so upstream can
detect/regressions; update the block around
item["i18n"][_settings.Locale]["name"] and tempMarketItems assignments in
ReloadItems to remove the English substitution and surface errors (use the
existing Main.AddLog only for context if desired, but do not append englishName
to tempMarketItems).
- Around line 42-43: ReloadItems and GetWfmItemList are writing/reading a single
fallback file (fallback_names.json) for all locales which causes English data to
overwrite localized data; change the fallback implementation to be locale-scoped
by including the locale in the fallback file name or path (e.g.,
fallback_names.{locale}.json) and update the wfmItemsFallbackPath usage so reads
and writes in GetWfmItemList and ReloadItems use the locale-specific path;
ensure the logic that writes English then locale-specific data writes each to
its own locale file and the fallback load (currently around
GetWfmItemList/ReloadItems) only reads the matching locale file so English
fallbacks never get loaded as localized data.
In `@WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs`:
- Around line 44-52: The override chain creates infinite recursion because
DefaultLevenshteinDistance calls LevenshteinDistanceWithPreprocessing which in
turn calls the virtual DefaultLevenshteinDistance override; fix by making the
preprocessing helper call a non-virtual core implementation instead of the
virtual method: add or use a private/core method (e.g., ComputeLevenshteinCore
or call base.DefaultLevenshteinDistance) from
LevenshteinDistanceWithPreprocessing so that CalculateLevenshteinDistance ->
DefaultLevenshteinDistance -> LevenshteinDistanceWithPreprocessing ->
ComputeLevenshteinCore executes without re-entering the override; update
references to BlueprintRemovals and NormalizeEuropeanCharacters accordingly.
In `@WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs`:
- Around line 41-43: The Japanese normalization path is calling RemoveAccents
which can strip Japanese combining marks (e.g., dakuten) and corrupt tokens; in
the JapaneseLanguageProcessor (the Normalize/NormalizeText method where
`normalized` is set) remove or gate the call to RemoveAccents so it is not
applied for Japanese processing—either delete the RemoveAccents(normalized)
invocation or wrap it behind a language check (e.g., only call RemoveAccents for
non-Japanese languages); ensure any downstream code still receives the original
normalized string without accent stripping.
In `@WFInfo/LanguageProcessing/PolishLanguageProcessor.cs`:
- Line 21: The CharacterWhitelist in PolishLanguageProcessor is missing several
Polish letters (Óó, Śś, Źź, Żż); update the CharacterWhitelist property to
include those characters or add the appropriate GenerateCharacterRange calls so
the whitelist contains full Polish diacritics (e.g., include Ó/ó, Ś/ś, Ź/ź, Ż/ż)
alongside the existing ranges — modify the CharacterWhitelist getter in
PolishLanguageProcessor and use GenerateCharacterRange or literal additions to
ensure all Polish letters are present.
In `@WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs`:
- Around line 39-40: The code in ThaiLanguageProcessor that calls
RemoveAccents(normalized) during normalization strips Thai combining marks and
should be removed or skipped for Thai text; update the normalization logic in
the method where normalized is set to stop calling RemoveAccents(normalized) for
Thai (either delete that call or guard it with a language check), and add a
brief comment explaining that Thai tone/vowel marks must be preserved to avoid
breaking OCR/token matching.
In `@WFInfo/Ocr.cs`:
- Around line 1292-1293: The code calls _tesseractService.SetNumbersOnlyMode()
but only resets the FirstEngine whitelist on the success path, which can leave
OCR stuck in numeric mode on exceptions; move the whitelist-reset logic into a
finally block so Reset/Restore of the FirstEngine whitelist always runs
regardless of exceptions, update the block that currently restores the whitelist
(the counterpart to _tesseractService.SetNumbersOnlyMode()) to be executed from
a finally and apply the same change to the other occurrence noted (the second
SetNumbersOnlyMode/restore pair).
- Around line 258-264: The distance filter currently uses >
Math.Max(part.Length, 6) which enforces ~100% instead of the intended 50%
cutoff; update the condition(s) that reference Math.Max(part.Length, 6) (e.g.,
the checks using firstProximity[i], part, correctName in the WFInfo.Ocr logic)
to compare against half the string length with a sensible floor, e.g. use >
Math.Max((int)Math.Ceiling(part.Length * 0.5), 3) (or Math.Max(part.Length/2, 3)
if you prefer integer math) so any Levenshtein distance greater than 50% of part
length is rejected; apply the same change to both occurrences noted in the diff.
---
Duplicate comments:
In `@tests/run_tests.bat`:
- Around line 19-20: Change the unquoted SET assignments to quoted form to
prevent cmd metacharacter injection and path-breaking: replace set
TEST_IMAGES_DIR=%~1 with set "TEST_IMAGES_DIR=%~1" and replace set
TEST_IMAGES_DIR=data (used when TEST_IMAGES_DIR is empty) with set
"TEST_IMAGES_DIR=data" (or change the if-block to if "%TEST_IMAGES_DIR%"=="" set
"TEST_IMAGES_DIR=data"); reference the TEST_IMAGES_DIR environment variable and
the SET statements in the batch script when making this change.
- Around line 32-36: The current echo and WFInfo.exe invocation rely on
locale-specific %DATE%/%TIME% substrings and leave the output filename unquoted;
replace those lines to generate a locale-safe timestamp (e.g., call PowerShell:
for /f "usebackq delims=" %%T in (`powershell -NoProfile -Command "Get-Date
-Format 'yyyyMMdd_HHmmss'"`) do set TIMESTAMP=%%T) and then use "%TIMESTAMP%"
when building filenames, and always wrap the output path/filename and
"%TEST_IMAGES_DIR%" in quotes in both the echo and the WFInfo.exe call so
single-digit hours or spaces don't split arguments (update the echo line and the
..\bin\Release\net48\WFInfo.exe map.json "%TEST_IMAGES_DIR%"
"test_results_%TIMESTAMP%.json" accordingly).
---
Nitpick comments:
In `@WFInfo/errorDialogue.xaml.cs`:
- Around line 50-74: Replace the LINQ .Where(...).ToList().ForEach(...)
allocations with simple foreach loops: build the parent directory using
Path.GetDirectoryName(startPath) and construct each candidate path with
Path.Combine (referencing priorityFiles and otherDataFiles construction) then
use if (File.Exists(path)) zip.AddFile(path, ""); do the same for the files
collection by iterating files (the FileInfo list) and calling
zip.AddFile(file.FullName, ""); update the code locations that call zip.AddFile
to use these foreach loops and Path.Combine instead of string "\..\”
concatenation.
In `@WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs`:
- Line 64: The chained string replacements on the variable result contain a
duplicated call to .Replace('ヶ', 'ケ'); update the code in
JapaneseLanguageProcessor (where result is manipulated) to remove the redundant
.Replace('ヶ', 'ケ') call so each kana mapping appears only once (keep one
.Replace('ヶ', 'ケ') alongside the existing .Replace('ヵ', 'カ')).
In `@WFInfo/LanguageProcessing/LanguageProcessor.cs`:
- Around line 295-299: NormalizeFullWidthCharacters should guard against
null/empty inputs before using input.Length; add a check at the start of the
method (e.g., if string.IsNullOrEmpty(input)) and return the original input (or
String.Empty when input is null) so you don't call input.Length on null. This
keeps the current behavior for empty strings and prevents NullReferenceException
while leaving the rest of the method (the StringBuilder usage and loop over
chars) unchanged.
In `@WFInfo/SnapItOverlay.xaml.cs`:
- Around line 81-92: The closeOverlay method duplicates rectangle cleanup logic;
replace its manual reset/remove steps with a single call to ResetRectangle() to
avoid drift—update closeOverlay() to call ResetRectangle() (which handles
setting rectangle.Width/Height, RenderTransform, Visibility and removing from
canvas) and remove the duplicated block that manipulates rectangle and canvas
directly.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
WFInfo/CustomEntrypoint.csWFInfo/Data.csWFInfo/LanguageProcessing/ChineseLanguageProcessor.csWFInfo/LanguageProcessing/CyrillicLanguageProcessor.csWFInfo/LanguageProcessing/EnglishLanguageProcessor.csWFInfo/LanguageProcessing/EuropeanLanguageProcessor.csWFInfo/LanguageProcessing/JapaneseLanguageProcessor.csWFInfo/LanguageProcessing/KoreanLanguageProcessor.csWFInfo/LanguageProcessing/LanguageProcessor.csWFInfo/LanguageProcessing/PolishLanguageProcessor.csWFInfo/LanguageProcessing/ThaiLanguageProcessor.csWFInfo/LanguageProcessing/TurkishLanguageProcessor.csWFInfo/Ocr.csWFInfo/Services/TesseractService.csWFInfo/SnapItOverlay.xaml.csWFInfo/Tests/TestProgram.csWFInfo/errorDialogue.xaml.cstests/README.mdtests/run_tests.battests/usage_example.md
🚧 Files skipped from review as they are similar to previous changes (5)
- WFInfo/Tests/TestProgram.cs
- WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs
- WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs
- WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs
- WFInfo/LanguageProcessing/CyrillicLanguageProcessor.cs
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
WFInfo/Ocr.cs (1)
266-270:⚠️ Potential issue | 🟠 MajorReward-screen market data lookup can still null-dereference and abort processing.
If
GetValue(correctName)returns null, Line [269] will throw before you can skip safely.🛡️ Proposed guard
- JObject job = (JObject)Main.dataBase.marketData.GetValue(correctName); + JObject job = Main.dataBase.marketData.GetValue(correctName) as JObject; + if (job == null) + { + Main.AddLog($"MARKET DATA: No market data found for '{correctName}', skipping item"); + continue; + } JObject primeSet = (JObject)Main.dataBase.marketData.GetValue(primeSetName); string ducats = job["ducats"].ToObject<string>();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Ocr.cs` around lines 266 - 270, The code can null-dereference when Main.dataBase.marketData.GetValue(correctName) or GetValue(primeSetName) returns null; update the WFInfo.Ocr logic around Data.GetSetName, the job and primeSet variables to guard: after obtaining job and primeSet check for null (and also check job["ducats"] exists) and skip/return when missing, and replace int.Parse(ducats, Main.culture) with a safe parse (int.TryParse with Main.culture) so missing or non-numeric ducats do not throw; ensure all downstream uses of job/primeSet only run when those checks pass.
♻️ Duplicate comments (2)
WFInfo/Data.cs (1)
102-103:⚠️ Potential issue | 🟠 MajorFallback storage is still not request-locale scoped.
GetWfmItemList("en")andGetWfmItemList(_settings.Locale)both use the samewfmItemsFallbackPath, so English payloads can still overwrite localized fallback content.🔧 Suggested fix
- private readonly string wfmItemsFallbackPath; + private readonly string wfmItemsFallbackDirectory; ... - wfmItemsFallbackPath = applicationDirectory + $@"\fallback_names.{_settings.Locale}.json"; + wfmItemsFallbackDirectory = applicationDirectory; ... private async Task<(JObject Data, bool IsFallback)> GetWfmItemList(string locale) { + string fallbackPath = Path.Combine(wfmItemsFallbackDirectory, $"fallback_names.{locale}.json"); try { ... - File.WriteAllText(wfmItemsFallbackPath, body); + File.WriteAllText(fallbackPath, body); return (data, false); } catch (Exception ex) { - Main.AddLog("Failed to fetch/parse " + wfmItemsUrl + ", using file " + wfmItemsFallbackPath + Environment.NewLine + ex.ToString()); - if (File.Exists(wfmItemsFallbackPath)) + Main.AddLog("Failed to fetch/parse " + wfmItemsUrl + ", using file " + fallbackPath + Environment.NewLine + ex.ToString()); + if (File.Exists(fallbackPath)) { - string response = File.ReadAllText(wfmItemsFallbackPath); + string response = File.ReadAllText(fallbackPath); JObject data = JsonConvert.DeserializeObject<JObject>(response); return (data, true); } ... } }Based on learnings: Do not fall back to English locale data when a specific locale's fallback file is missing, because it breaks locale OCR matching.
Also applies to: 425-453
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Data.cs` around lines 102 - 103, The fallback file path currently uses the global _settings.Locale (wfmItemsFallbackPath) causing GetWfmItemList(requestedLocale) to share the same file for different requests; change the logic so the fallback path is computed per-request using the requested locale parameter (e.g., derive a locale-specific fallback path inside GetWfmItemList or a helper that accepts the locale), ensure any references to wfmItemsFallbackPath use the per-request value, and remove any implicit default-to-English behavior so a missing locale fallback does not fall back to "en".WFInfo/Ocr.cs (1)
704-708:⚠️ Potential issue | 🟠 MajorSnapIt distance cutoff still uses a ~100% threshold instead of the documented 50%.
Line [707] contradicts the comment in Lines [704]-[706], so weak matches are still accepted.
🎯 Proposed fix
- if (levenDist == 9999 || levenDist > Math.Max(part.Name.Length, 6) || string.IsNullOrEmpty(name)) + int maxDistance = Math.Max((int)Math.Ceiling(part.Name.Length * 0.5), 3); + if (levenDist == 9999 || levenDist > maxDistance || string.IsNullOrEmpty(name)) { foundParts.RemoveAt(i); // remove invalid part from list i--; // Adjust index since we removed an item resultCount--; continue; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Ocr.cs` around lines 704 - 708, The cutoff condition incorrectly uses Math.Max(part.Name.Length, 6) (allowing ~100% distance) instead of a 50% threshold; update the if condition that checks levenDist (and currently compares against Math.Max(part.Name.Length, 6)) to compare against a half-length threshold, e.g. replace Math.Max(part.Name.Length, 6) with Math.Max(part.Name.Length / 2, 6) (or Math.Max((int)Math.Ceiling(part.Name.Length / 2.0), 6) for rounding) so the check becomes: if (levenDist == 9999 || levenDist > Math.Max(part.Name.Length / 2, 6) || string.IsNullOrEmpty(name)) to enforce the documented ~50% cutoff for weak matches.
🧹 Nitpick comments (6)
WFInfo/SnapItOverlay.xaml.cs (1)
46-59: Consider simplifying by keeping the rectangle in the canvas.The current approach of removing and re-adding the rectangle works correctly, but is heavier than necessary. Simply resetting the properties and hiding the rectangle (without removing it from the canvas) would achieve the same visual result with less overhead.
♻️ Optional simplification
private void ResetRectangle() { // Reset rectangle properties to ensure it doesn't persist from previous session rectangle.Width = 0; rectangle.Height = 0; rectangle.RenderTransform = new TranslateTransform(0, 0); rectangle.Visibility = Visibility.Hidden; - - // Remove rectangle from canvas to ensure clean state - if (canvas.Children.Contains(rectangle)) - { - canvas.Children.Remove(rectangle); - } }If you adopt this, you can also remove the re-add logic in
canvas_MouseDown(lines 67-70).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/SnapItOverlay.xaml.cs` around lines 46 - 59, ResetRectangle currently removes the rectangle from the canvas; instead keep the rectangle as a persistent child and only reset its properties and hide it: leave rectangle.Width/Height, RenderTransform and Visibility resets as-is but remove the conditional canvas.Children.Remove(rectangle) call; then remove the re-add logic in canvas_MouseDown (the block that checks/executes canvas.Children.Add(rectangle)) since the rectangle will always remain in the canvas, ensuring you only toggle Visibility and update transforms when showing it.tests/run_tests.bat (1)
55-55: Avoid unconditionalpauseto keep CI/non-interactive runs from hanging.Line [55] should be gated for interactive usage only.
♻️ Suggested CI-safe pause
-pause +if not defined CI pause🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/run_tests.bat` at line 55, Replace the unconditional "pause" with an interactive-only guard so CI/non-interactive runs don't hang: detect an interactive environment (e.g. check a CI env var like %CI% or %GITHUB_ACTIONS% or use "if exist CON" / checks) and only call the "pause" command when running interactively (for example: if "%CI%"=="" pause). Locate the lone "pause" token in the batch script and wrap it with that conditional or an equivalent guard.WFInfo/Data.cs (1)
975-978: Normalize the OCR input once before iterating localized market items.
processor.NormalizeForPatternMatching(name)is invariant inside the loop and can be computed once to reduce repeated work on this hot path.♻️ Suggested refactor
- foreach (var marketItem in marketItems) + string normalizedName = processor.NormalizeForPatternMatching(name); + foreach (var marketItem in marketItems) { ... - string normalizedName = processor.NormalizeForPatternMatching(name); string normalizedStored = processor.NormalizeForPatternMatching(split[2]); int val = processor.SimpleLevenshteinDistance(normalizedName, normalizedStored); ... }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Data.cs` around lines 975 - 978, Compute normalized OCR input once before iterating localized market items instead of calling processor.NormalizeForPatternMatching(name) inside the loop; precompute a variable (e.g., normalizedName = processor.NormalizeForPatternMatching(name)) outside the loop that contains the comparison logic and then reuse that variable when calling processor.SimpleLevenshteinDistance(normalizedName, normalizedStored). Update the loop that currently computes normalizedName per-iteration (the block with processor.NormalizeForPatternMatching(split[2]) and processor.SimpleLevenshteinDistance) to only normalize the stored value per-item and reuse the precomputed normalizedName to avoid redundant work on this hot path.WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs (1)
469-549: Avoid rebuilding and resorting the replacement map on every normalization call.This method allocates a very large list and sorts it each invocation; moving it to a static pre-sorted field will reduce allocation and CPU churn.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs` around lines 469 - 549, The replacement map is being rebuilt and resorted on every call (the local variable replacements and the foreach using replacements.OrderByDescending), causing allocations and CPU overhead; move that large list into a static, precomputed readonly field (e.g., a static readonly List<KeyValuePair<string,string>> PrecomputedReplacements) that is initialized once and sorted by descending Key.Length at type initialization, then change the method to iterate that static list (replace the use of replacements.OrderByDescending with PrecomputedReplacements) so normalization only does the Replace loop and no longer rebuilds or sorts the map per-call.KoreanProcessorTest.cs (1)
9-63: Console output checks here are not automated tests.These checks only print status and won’t fail CI on regressions. Consider moving them into the existing test project with assertions so regressions are caught automatically.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@KoreanProcessorTest.cs` around lines 9 - 63, Current Main in KoreanProcessorTest.cs only prints results and won't fail CI; replace these manual console checks with automated unit tests in the test project that assert expected outcomes. Create test cases that call NormalizeKoreanCharacters (via public wrapper or make it internal and use InternalsVisibleTo), CalculateLevenshteinDistance, and DecomposeHangul and verify outputs with assertions (e.g., no exception for the duplicate-keys input, expected distances for Korean/Korean/Latin/mixed pairs, and correct decomposition for '가'); use the existing MockApplicationSettings to instantiate KoreanLanguageProcessor in test setup and add these tests to the test project's test runner so regressions fail CI.WFInfo/LanguageProcessing/LanguageProcessor.cs (1)
254-258: Harden preprocessing against nullblueprintRemovals/entries.A null array or null element will throw in the shared base path.
🧰 Proposed hardening
- foreach (string removal in blueprintRemovals) + foreach (string removal in blueprintRemovals ?? Array.Empty<string>()) { + if (string.IsNullOrEmpty(removal)) continue; s = System.Text.RegularExpressions.Regex.Replace(s, System.Text.RegularExpressions.Regex.Escape(removal), "", System.Text.RegularExpressions.RegexOptions.IgnoreCase | System.Text.RegularExpressions.RegexOptions.CultureInvariant); t = System.Text.RegularExpressions.Regex.Replace(t, System.Text.RegularExpressions.Regex.Escape(removal), "", System.Text.RegularExpressions.RegexOptions.IgnoreCase | System.Text.RegularExpressions.RegexOptions.CultureInvariant); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/LanguageProcessor.cs` around lines 254 - 258, The loop over blueprintRemovals can throw if the array or an entry is null; before iterating in the LanguageProcessor method, null-check blueprintRemovals (e.g., if null, treat as empty or return) and inside the foreach skip null/empty entries using string.IsNullOrEmpty(removal) to continue, then call System.Text.RegularExpressions.Regex.Escape and Regex.Replace only for non-null strings so the replacements on s and t are safe (update the foreach that references blueprintRemovals, s, t, and the Regex.Replace calls).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@KoreanProcessorTest.cs`:
- Around line 199-223: In MockApplicationSettings remove the repeated duplicate
property declarations so each property name appears only once: keep a single
definition each for UseCustomRevertLayer, UseCustomUnsetLayer,
UseCustomInitialLayer, and UseCustomInheritLayer; delete the extra repeated
lines and ensure the remaining properties have correct accessibility and return
values consistent with the class/interface (e.g., implement the intended getter
logic) so the class compiles.
In `@tests/run_tests.bat`:
- Around line 34-35: The PowerShell-based timestamp assignment (the for /f
invoking powershell) can fail leaving TIMESTAMP empty and causing
test_results_.json; after that for /f line check ERRORLEVEL and whether
TIMESTAMP is defined (if not defined "%TIMESTAMP%" or use if not defined
TIMESTAMP) and assign a safe fallback (for example build a sanitized fallback
using %DATE% and %TIME% or a unique value using %RANDOM%) and use that ensured
TIMESTAMP for the echo and output filename; update the script to set TIMESTAMP
when the powershell command fails and ensure both places that reference
TIMESTAMP use the guaranteed value.
- Around line 8-10: The script checks for "map.json" and uses a relative
executable path which makes it CWD-dependent; change the script to resolve its
own directory first (e.g., via %~dp0) and use that base for all path checks and
invocations so the if not exist "map.json" check and the later relative
executable call use %~dp0\map.json and %~dp0\<exe> instead of the current
working directory; update the ERROR echo to reference the anchored path so
failures are accurate.
In `@WFInfo/errorDialogue.xaml.cs`:
- Around line 50-90: The loop that adds FileInfo items from files to the ZIP
root can collide with entries from priorityFiles and otherDataFiles because all
are added with an empty directory path; before the foreach over files, filter
out any FileInfo whose Name (or Path.GetFileName(file.FullName)) matches any
Path.GetFileName value in priorityFiles or otherDataFiles (or whose full path
equals those targets) so only unique root filenames are added; update the code
around the files foreach (referencing variables files, priorityFiles,
otherDataFiles and zip.AddFile) to perform that exclusion check prior to calling
zip.AddFile.
In `@WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs`:
- Line 21: The whitelist in the CharacterWhitelist property
(ChineseLanguageProcessor.CharacterWhitelist) stops at 0x9FAF and omits many
valid CJK ideographs; update the whitelist to include the full primary CJK block
(0x4E00–0x9FFF) and add other relevant ranges (CJK Unified Ideographs Extension
A 0x3400–0x4DBF, CJK Compatibility Ideographs 0xF900–0xFAFF and the
supplementary extensions 0x20000–0x2CEAF) or otherwise switch to a Unicode-aware
classification that supports supplementary planes; apply the same fix to the
other occurrence of the whitelist in this file (mentioned at the second
location) and ensure GenerateCharacterRange (or its caller) can represent
supplementary-plane characters (surrogate pairs) if needed.
In `@WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs`:
- Line 94: The German CharacterWhitelist property currently only includes
uppercase umlauts (via GenerateCharacterRange(0x00C4,0x00C4), 0x00D6, 0x00DC)
and thus strips valid lowercase umlauts; update the CharacterWhitelist returned
by the CharacterWhitelist property to also include the lowercase umlauts by
adding GenerateCharacterRange(0x00E4,0x00E4),
GenerateCharacterRange(0x00F6,0x00F6) and GenerateCharacterRange(0x00FC,0x00FC)
alongside the existing ranges (keep GenerateCharacterRange(0x00DF,0x00DF) for ß)
so German OCR output retains ä, ö, ü.
In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs`:
- Around line 445-453: The similarity lookup in KoreanLanguageProcessor.cs
(inside the loop over Korean[groupType] where variables groupType,
similarityGroup, indexA, indexB are used) only checks similarityGroup.Value but
ignores similarityGroup.Key, so declared key pairs are missed; update the check
to also inspect the dictionary key (similarityGroup.Key) for the pair
(indexA,indexB) in either order—e.g., parse or compare the key representation
used for declared pairs and return the same similarity score when the key
matches, in addition to the existing check against similarityGroup.Value.
In `@WFInfo/Ocr.cs`:
- Around line 1015-1021: Before replacing the existing zones list in the
high-fragmentation branch, dispose the Bitmaps held in the current zones to
avoid leaking GDI resources: iterate the existing zones (the Tuple<Bitmap,
Rectangle> entries returned by DivideSnapZones) and call Dispose() on each
Tuple.Item1 (the Bitmap) before assigning a new List and setting snapThreads =
1; ensure this cleanup also handles null checks and swallows/disposes safely if
an exception occurs.
---
Outside diff comments:
In `@WFInfo/Ocr.cs`:
- Around line 266-270: The code can null-dereference when
Main.dataBase.marketData.GetValue(correctName) or GetValue(primeSetName) returns
null; update the WFInfo.Ocr logic around Data.GetSetName, the job and primeSet
variables to guard: after obtaining job and primeSet check for null (and also
check job["ducats"] exists) and skip/return when missing, and replace
int.Parse(ducats, Main.culture) with a safe parse (int.TryParse with
Main.culture) so missing or non-numeric ducats do not throw; ensure all
downstream uses of job/primeSet only run when those checks pass.
---
Duplicate comments:
In `@WFInfo/Data.cs`:
- Around line 102-103: The fallback file path currently uses the global
_settings.Locale (wfmItemsFallbackPath) causing GetWfmItemList(requestedLocale)
to share the same file for different requests; change the logic so the fallback
path is computed per-request using the requested locale parameter (e.g., derive
a locale-specific fallback path inside GetWfmItemList or a helper that accepts
the locale), ensure any references to wfmItemsFallbackPath use the per-request
value, and remove any implicit default-to-English behavior so a missing locale
fallback does not fall back to "en".
In `@WFInfo/Ocr.cs`:
- Around line 704-708: The cutoff condition incorrectly uses
Math.Max(part.Name.Length, 6) (allowing ~100% distance) instead of a 50%
threshold; update the if condition that checks levenDist (and currently compares
against Math.Max(part.Name.Length, 6)) to compare against a half-length
threshold, e.g. replace Math.Max(part.Name.Length, 6) with
Math.Max(part.Name.Length / 2, 6) (or
Math.Max((int)Math.Ceiling(part.Name.Length / 2.0), 6) for rounding) so the
check becomes: if (levenDist == 9999 || levenDist > Math.Max(part.Name.Length /
2, 6) || string.IsNullOrEmpty(name)) to enforce the documented ~50% cutoff for
weak matches.
---
Nitpick comments:
In `@KoreanProcessorTest.cs`:
- Around line 9-63: Current Main in KoreanProcessorTest.cs only prints results
and won't fail CI; replace these manual console checks with automated unit tests
in the test project that assert expected outcomes. Create test cases that call
NormalizeKoreanCharacters (via public wrapper or make it internal and use
InternalsVisibleTo), CalculateLevenshteinDistance, and DecomposeHangul and
verify outputs with assertions (e.g., no exception for the duplicate-keys input,
expected distances for Korean/Korean/Latin/mixed pairs, and correct
decomposition for '가'); use the existing MockApplicationSettings to instantiate
KoreanLanguageProcessor in test setup and add these tests to the test project's
test runner so regressions fail CI.
In `@tests/run_tests.bat`:
- Line 55: Replace the unconditional "pause" with an interactive-only guard so
CI/non-interactive runs don't hang: detect an interactive environment (e.g.
check a CI env var like %CI% or %GITHUB_ACTIONS% or use "if exist CON" / checks)
and only call the "pause" command when running interactively (for example: if
"%CI%"=="" pause). Locate the lone "pause" token in the batch script and wrap it
with that conditional or an equivalent guard.
In `@WFInfo/Data.cs`:
- Around line 975-978: Compute normalized OCR input once before iterating
localized market items instead of calling
processor.NormalizeForPatternMatching(name) inside the loop; precompute a
variable (e.g., normalizedName = processor.NormalizeForPatternMatching(name))
outside the loop that contains the comparison logic and then reuse that variable
when calling processor.SimpleLevenshteinDistance(normalizedName,
normalizedStored). Update the loop that currently computes normalizedName
per-iteration (the block with processor.NormalizeForPatternMatching(split[2])
and processor.SimpleLevenshteinDistance) to only normalize the stored value
per-item and reuse the precomputed normalizedName to avoid redundant work on
this hot path.
In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs`:
- Around line 469-549: The replacement map is being rebuilt and resorted on
every call (the local variable replacements and the foreach using
replacements.OrderByDescending), causing allocations and CPU overhead; move that
large list into a static, precomputed readonly field (e.g., a static readonly
List<KeyValuePair<string,string>> PrecomputedReplacements) that is initialized
once and sorted by descending Key.Length at type initialization, then change the
method to iterate that static list (replace the use of
replacements.OrderByDescending with PrecomputedReplacements) so normalization
only does the Replace loop and no longer rebuilds or sorts the map per-call.
In `@WFInfo/LanguageProcessing/LanguageProcessor.cs`:
- Around line 254-258: The loop over blueprintRemovals can throw if the array or
an entry is null; before iterating in the LanguageProcessor method, null-check
blueprintRemovals (e.g., if null, treat as empty or return) and inside the
foreach skip null/empty entries using string.IsNullOrEmpty(removal) to continue,
then call System.Text.RegularExpressions.Regex.Escape and Regex.Replace only for
non-null strings so the replacements on s and t are safe (update the foreach
that references blueprintRemovals, s, t, and the Regex.Replace calls).
In `@WFInfo/SnapItOverlay.xaml.cs`:
- Around line 46-59: ResetRectangle currently removes the rectangle from the
canvas; instead keep the rectangle as a persistent child and only reset its
properties and hide it: leave rectangle.Width/Height, RenderTransform and
Visibility resets as-is but remove the conditional
canvas.Children.Remove(rectangle) call; then remove the re-add logic in
canvas_MouseDown (the block that checks/executes canvas.Children.Add(rectangle))
since the rectangle will always remain in the canvas, ensuring you only toggle
Visibility and update transforms when showing it.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
KoreanProcessorTest.csWFInfo/CustomEntrypoint.csWFInfo/Data.csWFInfo/LanguageProcessing/ChineseLanguageProcessor.csWFInfo/LanguageProcessing/EuropeanLanguageProcessor.csWFInfo/LanguageProcessing/JapaneseLanguageProcessor.csWFInfo/LanguageProcessing/KoreanLanguageProcessor.csWFInfo/LanguageProcessing/LanguageProcessor.csWFInfo/LanguageProcessing/PolishLanguageProcessor.csWFInfo/LanguageProcessing/ThaiLanguageProcessor.csWFInfo/Ocr.csWFInfo/SnapItOverlay.xaml.csWFInfo/Tests/KoreanProcessorTests.csWFInfo/errorDialogue.xaml.cstests/README.mdtests/run_tests.battests/usage_example.md
✅ Files skipped from review due to trivial changes (1)
- WFInfo/Tests/KoreanProcessorTests.cs
🚧 Files skipped from review as they are similar to previous changes (4)
- WFInfo/CustomEntrypoint.cs
- WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs
- tests/usage_example.md
- tests/README.md
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
WFInfo/Ocr.cs (1)
1010-1011:⚠️ Potential issue | 🟡 Minor
numberTooFewCharactersis now dead and skews noise diagnostics.After the refactor, filtered lines no longer increment this counter, so the noise heuristic at Line [1210] is no longer meaningful.
Suggested fix
if (filteredWords.Count == 0) { + numberTooFewCharacters++; continue; }Also applies to: 1105-1108, 1210-1214
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Ocr.cs` around lines 1010 - 1011, The variable numberTooFewCharacters is no longer being incremented after the refactor, which breaks the noise heuristic that uses it alongside numberTooLargeButEnoughCharacters; fix by either removing numberTooFewCharacters and updating the heuristic to only use numberTooLargeButEnoughCharacters, or (preferred) restore incrementing numberTooFewCharacters in the same places where lines are filtered for having too few characters so the noise-count logic (the heuristic that references numberTooFewCharacters and numberTooLargeButEnoughCharacters) receives correct values; update all related spots that previously incremented it (see the filtering logic that now bypasses those increments) and adjust the heuristic computation accordingly.
♻️ Duplicate comments (3)
tests/run_tests.bat (1)
37-45:⚠️ Potential issue | 🟡 MinorInitialize
TIMESTAMPbefore PowerShell call to avoid stale inherited value.If the environment already defines
TIMESTAMP, Line [40] won’t trigger fallback on PowerShell failure, and the script can reuse an old filename suffix.🛠️ Proposed fix
REM Generate locale-safe timestamp with fallback +set "TIMESTAMP=" for /f "usebackq delims=" %%T in (`powershell -NoProfile -Command "Get-Date -Format 'yyyyMMdd_HHmmss'" 2^>nul`) do set TIMESTAMP=%%T🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/run_tests.bat` around lines 37 - 45, Initialize the TIMESTAMP variable to an empty value before running the PowerShell for /f call so an inherited TIMESTAMP won't prevent the fallback branch from executing; specifically, add a statement to clear TIMESTAMP prior to the for /f `powershell -NoProfile -Command "Get-Date ..."` invocation so the subsequent check if "%TIMESTAMP%"=="" properly detects PowerShell failure and triggers the fallback logic that constructs TIMESTAMP from %DATE% and %TIME%.WFInfo/errorDialogue.xaml.cs (1)
70-77:⚠️ Potential issue | 🟡 MinorUse case-insensitive dedupe for ZIP root filenames.
The collision filter can miss case-variant duplicates because
ToHashSet()defaults to case-sensitive comparisons. On Windows, that can still cause root-entry conflicts during extraction.Suggested fix
- var priorityFileNames = priorityFiles.Select(Path.GetFileName).ToHashSet(); - var otherDataFileNames = otherDataFiles.Select(Path.GetFileName).ToHashSet(); + var priorityFileNames = priorityFiles + .Select(Path.GetFileName) + .ToHashSet(StringComparer.OrdinalIgnoreCase); + var otherDataFileNames = otherDataFiles + .Select(Path.GetFileName) + .ToHashSet(StringComparer.OrdinalIgnoreCase);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/errorDialogue.xaml.cs` around lines 70 - 77, The current dedupe uses ToHashSet() which is case-sensitive and can miss case-variant ZIP root filenames; change the creation of priorityFileNames and otherDataFileNames to use a case-insensitive comparer (e.g. StringComparer.OrdinalIgnoreCase) when calling ToHashSet so the subsequent loop that checks priorityFileNames.Contains(fileName) and otherDataFileNames.Contains(fileName) correctly treats "File.txt" and "file.txt" as duplicates; update the code that builds those sets (symbols: priorityFileNames, otherDataFileNames, files, FileInfo, Path.GetFileName) to use the case-insensitive HashSet.WFInfo/Ocr.cs (1)
1022-1053:⚠️ Potential issue | 🟠 MajorZone bitmap cleanup is partial; regular multi-zone path still leaks.
Only the fragmentation fallback disposes zone bitmaps. In normal multi-zone execution, cloned zone bitmaps are never released.
Suggested fix direction
Task.WaitAll(snapTasks); +foreach (var zone in zones) +{ + // Keep shared buffer disposal owned by existing end-of-method cleanup. + if (!ReferenceEquals(zone.Item1, filteredImageClean)) + { + zone.Item1?.Dispose(); + } +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Ocr.cs` around lines 1022 - 1053, The code currently only disposes zone bitmaps in the "fragmentation fallback" branch, causing cloned bitmaps from DivideSnapZones to leak in the normal multi-zone path; update the handling around DivideSnapZones and the zones list so that every Bitmap stored as zone.Item1 is explicitly disposed after its use (for example, after per-zone OCR processing or right before replacing the zones list), and ensure you do not dispose shared originals like filteredImageClean unintentionally; modify the routines that iterate/process zones (the code using zones returned by DivideSnapZones and the block that sets snapThreads) to call zone.Item1?.Dispose() in a finally/cleanup section so all paths release cloned Bitmaps.
🧹 Nitpick comments (4)
WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs (1)
60-76: Remove unreachable uppercase replacement mappings.After
ToLowerInvariant(), the uppercase.Replace(...)calls never execute meaningfully. This is safe to simplify.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs` around lines 60 - 76, The code in EuropeanLanguageProcessor (the chain starting with input.ToLowerInvariant()) performs ToLowerInvariant() then applies uppercase replacement mappings which are unreachable; remove the redundant uppercase .Replace calls (those replacing 'À','Á',... 'Ÿ') from the normalization chain and keep only the lowercase diacritic replacements (and the final semicolon/return) so Normalize/that method no longer contains unnecessary uppercase mappings.WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs (1)
360-363:validKoreanFragmentscontains unreachable entries.The filter only consults the list for single-character words, so multi-character fragments in the array are never used.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs` around lines 360 - 363, validKoreanFragments currently contains multi-character strings but is only checked when word.Length == 1, making those entries unreachable; either reduce validKoreanFragments to only single-character fragments or change the predicate to consult fragments for multi-character words (e.g., replace the (word.Length == 1 && !validKoreanFragments.Contains(word)) check with a check that uses validKoreanFragments.Any(f => word.Contains(f)) or similar) so that the multi-character entries in validKoreanFragments are actually used; update the logic around the word variable and validKoreanFragments accordingly.WFInfo/Data.cs (2)
962-1000: Consider lock duration for performance-sensitive scenarios.The
marketItemsLockis held while iterating throughmarketItemsand computing Levenshtein distances (lines 963-1000). For large item databases, this could block other threads attempting to accessmarketItems.If OCR processing becomes a bottleneck, consider copying the relevant data outside the lock and performing distance calculations on the copy.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Data.cs` around lines 962 - 1000, Hold the lock only to snapshot the minimal data needed and do the heavy Levenshtein work outside it: inside the marketItemsLock (around marketItems) build a local list of the relevant fields (e.g., a list of tuples of split[0], split[2] for each marketItem where key != "version" and split.Length >= 3) and capture the LanguageProcessorFactory.GetCurrentProcessor() reference if required, then release the lock and iterate that local copy calling NormalizeForPatternMatching and SimpleLevenshteinDistance to compute val and update low, lowest, lowest_unfiltered and multipleLowest; ensure you apply the same pre-filtering logic (lengthDiff and distance < 50% of string length) on the snapshot so behavior remains identical.
958-1001: Inconsistent filtering logic between English and non-English branches.The non-English branch uses different filtering criteria than the English branch:
- Length filter (line 978):
lengthDiff > split[2].Length / 2uses integer division and is based only on the stored string length- English branch (line 1008):
lengthDiff > Math.Max(prop.Key.Length, name.Length) / 2uses the maximum of both lengthsThis inconsistency could cause different matching behavior depending on locale. For short localized names (e.g., 3 characters), integer division yields a threshold of 1, which may be too restrictive.
♻️ Suggested alignment
- // Pre-filter: only check items with reasonable length difference (matching GetLocalizedNameData logic) - int lengthDiff = Math.Abs(split[2].Length - name.Length); - if (lengthDiff > split[2].Length / 2) continue; + // Pre-filter: only check items with reasonable length difference + int lengthDiff = Math.Abs(split[2].Length - name.Length); + if (lengthDiff > Math.Max(split[2].Length, name.Length) / 2) continue;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Data.cs` around lines 958 - 1001, The non-English branch uses an inconsistent, overly strict length filter (int division against split[2].Length) — update the lengthDiff check inside the marketItems loop to mirror the English logic by comparing against Math.Max(englishNameLength, name.Length)/2; specifically compute englishNameLength from split[0].Length and use Math.Max(split[0].Length, name.Length) / 2.0 (or equivalent) as the threshold so the length filter and division semantics match the English branch; keep the rest of the loop (NormalizeForPatternMatching, SimpleLevenshteinDistance, low/lowest handling) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/run_tests.bat`:
- Around line 23-29: The TEST_IMAGES_DIR default and existence check are using
the current working directory; change the script to resolve TEST_IMAGES_DIR
relative to the script directory (%~dp0) so relative paths work when run from
elsewhere. Specifically, when TEST_IMAGES_DIR is empty, set it to the
script-dir-relative path for "data" (use the run_tests.bat script's directory
variable) and if TEST_IMAGES_DIR is provided as a relative path, prefix it with
the script directory before the existence check; keep the existence validation
using TEST_IMAGES_DIR so the subsequent echo/exit logic remains the same.
In `@WFInfo/Data.cs`:
- Around line 238-240: ReloadItems currently calls SaveAllJSONs() after updating
only marketItems, which risks overwriting other databases (equipmentData,
relicData, nameData, marketData) with null or stale state; change ReloadItems to
persist only the updated marketItems (e.g., call a new or existing
SaveMarketItems-like routine or modify SaveAllJSONs to accept a target list)
instead of SaveAllJSONs(), and ensure this call occurs after marketItems is
fully updated (and do not touch equipmentData/relicData/nameData/marketData
here); reference ReloadItems, SaveAllJSONs, marketItems and UpdateInner when
making the change.
- Line 982: The code uses processor.SimpleLevenshteinDistance(normalizedName,
normalizedStored) for non-English branches which is inconsistent with the
English branch that uses processor.CalculateLevenshteinDistance; replace the
SimpleLevenshteinDistance call with processor.CalculateLevenshteinDistance so
the same language-aware algorithm and preprocessing are used (call
CalculateLevenshteinDistance with the same inputs used in this
branch—normalizedName/normalizedStored or the raw name/stored variables to match
the English-branch usage).
In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs`:
- Around line 317-318: The global replacement entry {"버", ""} in
KoreanLanguageProcessor should not remove the syllable inside valid names;
change the logic so "버" is stripped only when it appears as a standalone OCR
artifact (e.g., an entire token) or matches an OCR pattern, not when embedded in
a larger string. Locate the replacement map/logic in KoreanLanguageProcessor
(the {"버",""} entry) and replace the unconditional replace with a conditional
rule: either apply a word-boundary/regex removal (e.g., \b버\b) or split on
tokens and drop tokens equal to "버" (and mirror this change for the similar
entries referenced at the other occurrence), so embedded syllables in real part
names are preserved.
- Around line 126-127: The Replace calls on s and t can throw if either is null;
update the code in KoreanLanguageProcessor (where s and t are processed) to
guard against null by using a null-coalescing or explicit check before calling
Replace (e.g. replace s and t with (s ?? "") and (t ?? "") or return/handle null
inputs earlier), then proceed with the existing Replace("설계도","").Replace("
","") and prefixing the leading space so the distance calculation won't crash on
null inputs.
In `@WFInfo/LanguageProcessing/LanguageProcessor.cs`:
- Around line 103-115: IsBlueprintTerm currently does case-sensitive checks and
misses mixed/upper-case OCR variants; update IsBlueprintTerm to perform
case-insensitive matching by normalizing text and each BlueprintRemovals entry
(e.g., use ToLowerInvariant or IndexOf with StringComparison.OrdinalIgnoreCase)
and then check the same patterns (contains, parentheses, starts/ends) against
the normalized text and removal; ensure you reference the BlueprintRemovals
collection and all the existing checks (Contains, StartsWith, EndsWith) so they
use the case-insensitive comparison method.
In `@WFInfo/Ocr.cs`:
- Around line 1096-1099: Null check the processor returned from
LanguageProcessorFactory.GetCurrentProcessor() before calling
processor.ShouldFilterWord(word) in the SnapIt word filtering loop; if
GetCurrentProcessor() returns null, skip filtering or use a safe fallback (e.g.,
treat all words as not filtered) so the OCR flow does not throw. Update the loop
that iterates over words to obtain the processor once, verify processor != null,
and only call ShouldFilterWord when non-null (or replace the call with a
null-safe alternative) to prevent runtime exceptions.
---
Outside diff comments:
In `@WFInfo/Ocr.cs`:
- Around line 1010-1011: The variable numberTooFewCharacters is no longer being
incremented after the refactor, which breaks the noise heuristic that uses it
alongside numberTooLargeButEnoughCharacters; fix by either removing
numberTooFewCharacters and updating the heuristic to only use
numberTooLargeButEnoughCharacters, or (preferred) restore incrementing
numberTooFewCharacters in the same places where lines are filtered for having
too few characters so the noise-count logic (the heuristic that references
numberTooFewCharacters and numberTooLargeButEnoughCharacters) receives correct
values; update all related spots that previously incremented it (see the
filtering logic that now bypasses those increments) and adjust the heuristic
computation accordingly.
---
Duplicate comments:
In `@tests/run_tests.bat`:
- Around line 37-45: Initialize the TIMESTAMP variable to an empty value before
running the PowerShell for /f call so an inherited TIMESTAMP won't prevent the
fallback branch from executing; specifically, add a statement to clear TIMESTAMP
prior to the for /f `powershell -NoProfile -Command "Get-Date ..."` invocation
so the subsequent check if "%TIMESTAMP%"=="" properly detects PowerShell failure
and triggers the fallback logic that constructs TIMESTAMP from %DATE% and
%TIME%.
In `@WFInfo/errorDialogue.xaml.cs`:
- Around line 70-77: The current dedupe uses ToHashSet() which is case-sensitive
and can miss case-variant ZIP root filenames; change the creation of
priorityFileNames and otherDataFileNames to use a case-insensitive comparer
(e.g. StringComparer.OrdinalIgnoreCase) when calling ToHashSet so the subsequent
loop that checks priorityFileNames.Contains(fileName) and
otherDataFileNames.Contains(fileName) correctly treats "File.txt" and "file.txt"
as duplicates; update the code that builds those sets (symbols:
priorityFileNames, otherDataFileNames, files, FileInfo, Path.GetFileName) to use
the case-insensitive HashSet.
In `@WFInfo/Ocr.cs`:
- Around line 1022-1053: The code currently only disposes zone bitmaps in the
"fragmentation fallback" branch, causing cloned bitmaps from DivideSnapZones to
leak in the normal multi-zone path; update the handling around DivideSnapZones
and the zones list so that every Bitmap stored as zone.Item1 is explicitly
disposed after its use (for example, after per-zone OCR processing or right
before replacing the zones list), and ensure you do not dispose shared originals
like filteredImageClean unintentionally; modify the routines that
iterate/process zones (the code using zones returned by DivideSnapZones and the
block that sets snapThreads) to call zone.Item1?.Dispose() in a finally/cleanup
section so all paths release cloned Bitmaps.
---
Nitpick comments:
In `@WFInfo/Data.cs`:
- Around line 962-1000: Hold the lock only to snapshot the minimal data needed
and do the heavy Levenshtein work outside it: inside the marketItemsLock (around
marketItems) build a local list of the relevant fields (e.g., a list of tuples
of split[0], split[2] for each marketItem where key != "version" and
split.Length >= 3) and capture the
LanguageProcessorFactory.GetCurrentProcessor() reference if required, then
release the lock and iterate that local copy calling NormalizeForPatternMatching
and SimpleLevenshteinDistance to compute val and update low, lowest,
lowest_unfiltered and multipleLowest; ensure you apply the same pre-filtering
logic (lengthDiff and distance < 50% of string length) on the snapshot so
behavior remains identical.
- Around line 958-1001: The non-English branch uses an inconsistent, overly
strict length filter (int division against split[2].Length) — update the
lengthDiff check inside the marketItems loop to mirror the English logic by
comparing against Math.Max(englishNameLength, name.Length)/2; specifically
compute englishNameLength from split[0].Length and use Math.Max(split[0].Length,
name.Length) / 2.0 (or equivalent) as the threshold so the length filter and
division semantics match the English branch; keep the rest of the loop
(NormalizeForPatternMatching, SimpleLevenshteinDistance, low/lowest handling)
unchanged.
In `@WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs`:
- Around line 60-76: The code in EuropeanLanguageProcessor (the chain starting
with input.ToLowerInvariant()) performs ToLowerInvariant() then applies
uppercase replacement mappings which are unreachable; remove the redundant
uppercase .Replace calls (those replacing 'À','Á',... 'Ÿ') from the
normalization chain and keep only the lowercase diacritic replacements (and the
final semicolon/return) so Normalize/that method no longer contains unnecessary
uppercase mappings.
In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs`:
- Around line 360-363: validKoreanFragments currently contains multi-character
strings but is only checked when word.Length == 1, making those entries
unreachable; either reduce validKoreanFragments to only single-character
fragments or change the predicate to consult fragments for multi-character words
(e.g., replace the (word.Length == 1 && !validKoreanFragments.Contains(word))
check with a check that uses validKoreanFragments.Any(f => word.Contains(f)) or
similar) so that the multi-character entries in validKoreanFragments are
actually used; update the logic around the word variable and
validKoreanFragments accordingly.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
WFInfo/Data.csWFInfo/LanguageProcessing/ChineseLanguageProcessor.csWFInfo/LanguageProcessing/EuropeanLanguageProcessor.csWFInfo/LanguageProcessing/KoreanLanguageProcessor.csWFInfo/LanguageProcessing/LanguageProcessor.csWFInfo/Ocr.csWFInfo/SnapItOverlay.xaml.csWFInfo/errorDialogue.xaml.cstests/run_tests.bat
🚧 Files skipped from review as they are similar to previous changes (1)
- WFInfo/SnapItOverlay.xaml.cs
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
WFInfo/Ocr.cs (1)
1112-1122:⚠️ Potential issue | 🟠 MajorInverted null-check logic causes all words to be filtered when processor is null.
The condition
processor != null && !processor.ShouldFilterWord(word)filters out all words whenprocessoris null, because the condition is never true. This is the opposite of the intended safe fallback behavior.When
processoris null, words should be kept (not filtered), since we can't determine which to filter.🐛 Proposed fix
// Filter individual words as intended foreach (var word in words) { - if (processor != null && !processor.ShouldFilterWord(word)) + if (processor == null || !processor.ShouldFilterWord(word)) { filteredWords.Add(word); } - else if (word.Length <= 3) + else if (processor != null && word.Length <= 3) { numberTooFewCharacters++; } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Ocr.cs` around lines 1112 - 1122, The null-check on processor in the foreach over words is inverted causing all words to be dropped when processor is null; change the condition in the loop that currently reads "processor != null && !processor.ShouldFilterWord(word)" so that when processor is null words are treated as not filtered (i.e., keep the word), by either checking "processor == null || !processor.ShouldFilterWord(word)" before adding to filteredWords; update the block that increments numberTooFewCharacters (and preserve the existing short-word logic) so the behavior of filteredWords, numberTooFewCharacters, and the use of ShouldFilterWord are correct for both null and non-null processor cases.
🧹 Nitpick comments (2)
WFInfo/LanguageProcessing/LanguageProcessor.cs (1)
234-264: Consider consolidating duplicate Levenshtein implementations.
SimpleLevenshteinDistanceandDefaultLevenshteinDistance(lines 199-229) have identical implementations. Consider having one delegate to the other or extracting a shared private method to reduce code duplication.♻️ Proposed refactor
public int SimpleLevenshteinDistance(string s, string t) { - s = s.ToLower(_culture); - t = t.ToLower(_culture); - int n = s.Length; - int m = t.Length; - - if (n == 0) return m; - if (m == 0) return n; - - int[,] d = new int[n + 1, m + 1]; - - for (int i = 0; i <= n; i++) - d[i, 0] = i; - - for (int j = 0; j <= m; j++) - d[0, j] = j; - - for (int i = 1; i <= n; i++) - { - for (int j = 1; j <= m; j++) - { - int cost = (s[i - 1] == t[j - 1]) ? 0 : 1; - d[i, j] = Math.Min( - Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1), - d[i - 1, j - 1] + cost); - } - } - - return d[n, m]; + return DefaultLevenshteinDistance(s, t); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/LanguageProcessor.cs` around lines 234 - 264, SimpleLevenshteinDistance duplicates DefaultLevenshteinDistance; remove the duplication by extracting the shared logic into a single private helper (e.g., ComputeLevenshtein or LevenshteinCore) or by having one method call the other (e.g., DefaultLevenshteinDistance calls SimpleLevenshteinDistance or vice versa). Ensure the new helper accepts the same parameters (string s, string t) and uses the existing _culture normalization, then update both SimpleLevenshteinDistance and DefaultLevenshteinDistance to delegate to that helper so behavior remains identical and no business logic is changed.WFInfo/Ocr.cs (1)
711-721: Inconsistent Levenshtein distance threshold between methods.The floor value differs between
ProcessRewardScreen(Line 260: floor of 3) andProcessSnapIt(Line 715: floor of 6). For a 6-character string:
- Reward screen: allows distance ≤ 3
- SnapIt: allows distance ≤ 6
This makes SnapIt significantly more permissive for short part names. Please verify this is intentional.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Ocr.cs` around lines 711 - 721, The Levenshtein distance floor is inconsistent between ProcessRewardScreen and ProcessSnapIt causing different acceptance for short names; locate the check using levenDist (in ProcessSnapIt) and the analogous check in ProcessRewardScreen and make their floor logic consistent—either by using the same Math.Max(part.Name.Length / 2, 3) (or whichever minimum you choose) in both methods or by extracting a shared helper/constant (e.g., GetMaxAllowedLevenshtein or MaxLevenThreshold) and use it when comparing levenDist and 9999 so both methods apply the same minimum floor for short part names.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs`:
- Line 446: The mapping entry new KeyValuePair<string, string>("류", "hyu") is
incorrect; update the romanization so "류" maps to "ryu" (replace "hyu" with
"ryu" for the ("류", ...) pair) and verify the "휴" entry maps to "hyu" so "hyu"
is used for ("휴", ...); adjust the two KeyValuePair entries accordingly to
correct the romanization.
In `@WFInfo/LanguageProcessing/LanguageProcessor.cs`:
- Around line 18-43: The static field normalizedBlueprintRemovals is shared
across all concrete LanguageProcessor types causing cross-language corruption;
change the caching to be per-concrete-type (e.g., replace the static string[]
normalizedBlueprintRemovals with a static ConcurrentDictionary<Type,string[]> or
ConditionalWeakTable<Type,string[]>) and initialize the entry keyed by GetType()
inside the protected LanguageProcessor constructor using that key; ensure you
populate the per-type normalized array from the instance's BlueprintRemovals
(using ToLowerInvariant) and then read from the per-type cache wherever
normalizedBlueprintRemovals was previously used so each concrete processor
(different BlueprintRemovals) gets its own normalized copy.
In `@WFInfo/Ocr.cs`:
- Around line 1084-1095: The disposal loop currently disposes every zone bitmap
including filteredImageClean, causing a use-after-dispose in GetItemCounts() and
a double-dispose later; modify the logic in the foreach over zones so it does
not dispose filteredImageClean (either by checking reference equality to
filteredImageClean and skipping disposal, or by ensuring filteredImageClean is
removed from zones before this loop), and keep the final disposal of
filteredImageClean at its intended single disposal site (the existing disposal
at Line 1222) so GetItemCounts() can safely use filteredImageClean; references
to update: the zones collection, the foreach disposal block, the
filteredImageClean variable, and GetItemCounts().
---
Duplicate comments:
In `@WFInfo/Ocr.cs`:
- Around line 1112-1122: The null-check on processor in the foreach over words
is inverted causing all words to be dropped when processor is null; change the
condition in the loop that currently reads "processor != null &&
!processor.ShouldFilterWord(word)" so that when processor is null words are
treated as not filtered (i.e., keep the word), by either checking "processor ==
null || !processor.ShouldFilterWord(word)" before adding to filteredWords;
update the block that increments numberTooFewCharacters (and preserve the
existing short-word logic) so the behavior of filteredWords,
numberTooFewCharacters, and the use of ShouldFilterWord are correct for both
null and non-null processor cases.
---
Nitpick comments:
In `@WFInfo/LanguageProcessing/LanguageProcessor.cs`:
- Around line 234-264: SimpleLevenshteinDistance duplicates
DefaultLevenshteinDistance; remove the duplication by extracting the shared
logic into a single private helper (e.g., ComputeLevenshtein or LevenshteinCore)
or by having one method call the other (e.g., DefaultLevenshteinDistance calls
SimpleLevenshteinDistance or vice versa). Ensure the new helper accepts the same
parameters (string s, string t) and uses the existing _culture normalization,
then update both SimpleLevenshteinDistance and DefaultLevenshteinDistance to
delegate to that helper so behavior remains identical and no business logic is
changed.
In `@WFInfo/Ocr.cs`:
- Around line 711-721: The Levenshtein distance floor is inconsistent between
ProcessRewardScreen and ProcessSnapIt causing different acceptance for short
names; locate the check using levenDist (in ProcessSnapIt) and the analogous
check in ProcessRewardScreen and make their floor logic consistent—either by
using the same Math.Max(part.Name.Length / 2, 3) (or whichever minimum you
choose) in both methods or by extracting a shared helper/constant (e.g.,
GetMaxAllowedLevenshtein or MaxLevenThreshold) and use it when comparing
levenDist and 9999 so both methods apply the same minimum floor for short part
names.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
WFInfo/Data.csWFInfo/LanguageProcessing/EuropeanLanguageProcessor.csWFInfo/LanguageProcessing/KoreanLanguageProcessor.csWFInfo/LanguageProcessing/LanguageProcessor.csWFInfo/Ocr.csWFInfo/errorDialogue.xaml.cstests/run_tests.bat
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/run_tests.bat
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
WFInfo/Data.cs (1)
1063-1084:⚠️ Potential issue | 🟡 MinorPotential null reference if
GetLocaleNameDatareturns null.For non-English locales,
GetLocaleNameData(name, false)could return null if no match is found. This would cause aNullReferenceExceptionwhenLevenshteinDistanceis called at lines 1071 and 1084.🛡️ Proposed fix to handle null resolvedName
// Resolve OCR text to English once before loops to avoid repeated expensive database searches // Only resolve for non-English locales to avoid regression in English string resolvedName = _settings.Locale == "en" ? name : GetLocaleNameData(name, false); + // Fall back to original name if resolution failed + if (string.IsNullOrEmpty(resolvedName)) + { + resolvedName = name; + } foreach (KeyValuePair<string, JToken> prop in nameData)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Data.cs` around lines 1063 - 1084, GetLocaleNameData(name, false) can return null for non-English locales, causing LevenshteinDistance to throw when passed resolvedName; update the logic around resolvedName (set by GetLocaleNameData) to guard against null by falling back to the original OCR string (name) before using it in comparisons and LevenshteinDistance calls (references: _settings.Locale, resolvedName, GetLocaleNameData, LevenshteinDistance, name). Ensure resolvedName is non-null prior to the first foreach and again before the second LevenshteinDistance loop so both usages use the fallback if needed.
🧹 Nitpick comments (3)
WFInfo/LanguageProcessing/LanguageProcessor.cs (2)
25-28: Empty static constructor can be removed.The static constructor at lines 25-28 is empty and serves no purpose. Static field initializers (
_normalizedBlueprintRemovalsCache) run before the static constructor regardless.♻️ Suggested removal
- static LanguageProcessor() - { - // Will be initialized when first concrete instance is created - } -🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/LanguageProcessor.cs` around lines 25 - 28, Remove the empty static constructor from the LanguageProcessor class: delete the static LanguageProcessor() { /* ... */ } block since it does nothing and static field initializers like _normalizedBlueprintRemovalsCache already run before type initialization; ensure no static initialization logic is left only in that constructor before removal.
136-145: Minor redundancy in blueprint term checks.The
Contains($"({normalizedRemoval})")check on line 139 is redundant sinceContains(normalizedRemoval)on line 138 would already match any string containing the removal term, including when wrapped in parentheses.♻️ Simplified condition
for (int i = 0; i < normalizedBlueprintRemovals.Length; i++) { string normalizedRemoval = normalizedBlueprintRemovals[i]; if (normalizedText.Contains(normalizedRemoval) || - normalizedText.Contains($"({normalizedRemoval})") || normalizedText.StartsWith($"({normalizedRemoval}") || normalizedText.EndsWith($"{normalizedRemoval})")) { return true; } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/LanguageProcessor.cs` around lines 136 - 145, The conditional contains a redundant check: remove the Contains($"({normalizedRemoval})") clause inside the loop that iterates normalizedBlueprintRemovals so the logic uses normalizedText.Contains(normalizedRemoval) and the surrounding StartsWith/EndsWith parenthesis checks only; update the conditional in the method that references normalizedText, normalizedRemoval, and normalizedBlueprintRemovals to eliminate the duplicate Contains($"({normalizedRemoval})") term while keeping the other checks intact.WFInfo/Data.cs (1)
1044-1048: Consider documenting or generalizing the Gara/Ivara workaround.This hardcoded special case handles OCR confusion between "Gara" and "Ivara" for English. If similar OCR ambiguities exist in other languages, they won't be handled by this workaround since it's only in the English path. Consider either:
- Adding a comment explaining why this specific workaround exists
- Moving this logic into the language processor for potential extension to other locales
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Data.cs` around lines 1044 - 1048, The hardcoded Gara/Ivara workaround inside the block that sets lowest and lowest_unfiltered (checking val == low && lowest.StartsWith("Gara") && prop.Key.StartsWith("Ivara")) needs clarification and future-proofing: add a concise comment explaining the OCR ambiguity and why this specific check exists, and refactor the conditional into a dedicated, language-aware helper (e.g., a method on the language processor like ResolveOcrAmbiguity or NormalizeOperatorName) so the logic can be extended for other locales; update usages to call that helper instead of keeping the special-case inline.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/run_tests.bat`:
- Around line 26-30: The findstr regex currently used to detect absolute Windows
paths fails because echo outputs the value in quotes; update the pattern used
when testing %TEST_IMAGES_DIR% so it accounts for the leading quote (e.g., match
"^\".:\\") and thereby correctly detect quoted absolute paths like "C:\...";
modify the findstr invocation that uses the pattern ^.\:\\.* to use a pattern
that begins with a quote before the drive letter and backslash so the branch
that prefixes %SCRIPT_DIR% runs only for relative paths.
---
Outside diff comments:
In `@WFInfo/Data.cs`:
- Around line 1063-1084: GetLocaleNameData(name, false) can return null for
non-English locales, causing LevenshteinDistance to throw when passed
resolvedName; update the logic around resolvedName (set by GetLocaleNameData) to
guard against null by falling back to the original OCR string (name) before
using it in comparisons and LevenshteinDistance calls (references:
_settings.Locale, resolvedName, GetLocaleNameData, LevenshteinDistance, name).
Ensure resolvedName is non-null prior to the first foreach and again before the
second LevenshteinDistance loop so both usages use the fallback if needed.
---
Nitpick comments:
In `@WFInfo/Data.cs`:
- Around line 1044-1048: The hardcoded Gara/Ivara workaround inside the block
that sets lowest and lowest_unfiltered (checking val == low &&
lowest.StartsWith("Gara") && prop.Key.StartsWith("Ivara")) needs clarification
and future-proofing: add a concise comment explaining the OCR ambiguity and why
this specific check exists, and refactor the conditional into a dedicated,
language-aware helper (e.g., a method on the language processor like
ResolveOcrAmbiguity or NormalizeOperatorName) so the logic can be extended for
other locales; update usages to call that helper instead of keeping the
special-case inline.
In `@WFInfo/LanguageProcessing/LanguageProcessor.cs`:
- Around line 25-28: Remove the empty static constructor from the
LanguageProcessor class: delete the static LanguageProcessor() { /* ... */ }
block since it does nothing and static field initializers like
_normalizedBlueprintRemovalsCache already run before type initialization; ensure
no static initialization logic is left only in that constructor before removal.
- Around line 136-145: The conditional contains a redundant check: remove the
Contains($"({normalizedRemoval})") clause inside the loop that iterates
normalizedBlueprintRemovals so the logic uses
normalizedText.Contains(normalizedRemoval) and the surrounding
StartsWith/EndsWith parenthesis checks only; update the conditional in the
method that references normalizedText, normalizedRemoval, and
normalizedBlueprintRemovals to eliminate the duplicate
Contains($"({normalizedRemoval})") term while keeping the other checks intact.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
WFInfo/Data.csWFInfo/LanguageProcessing/EuropeanLanguageProcessor.csWFInfo/LanguageProcessing/KoreanLanguageProcessor.csWFInfo/LanguageProcessing/LanguageProcessor.csWFInfo/Ocr.csWFInfo/errorDialogue.xaml.cstests/run_tests.bat
🚧 Files skipped from review as they are similar to previous changes (1)
- WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (5)
WFInfo/Main.cs (1)
265-266: Good defensive null-check, but inconsistent with other dispatcher usages.The null-check prevents NullReferenceException in headless/test scenarios. However, similar dispatcher calls elsewhere in this file lack the same protection:
- Line 98:
UpdateMarketStatusAsync-MainWindow.INSTANCE.Dispatcher.InvokeAsync(...)- Line 545:
LoggedIn()-MainWindow.INSTANCE.Dispatcher.Invoke(...)- Line 615:
SignOut()-MainWindow.INSTANCE.Dispatcher.Invoke(...)Consider applying the same pattern consistently to avoid NullReferenceException in those code paths during headless execution.
♻️ Suggested fix for UpdateMarketStatusAsync (Line 98)
// Use async UI dispatcher call - await MainWindow.INSTANCE.Dispatcher.InvokeAsync(() => + if (MainWindow.INSTANCE?.Dispatcher != null) { - MainWindow.INSTANCE.UpdateMarketStatus(msg); - }); + await MainWindow.INSTANCE.Dispatcher.InvokeAsync(() => + { + MainWindow.INSTANCE.UpdateMarketStatus(msg); + }); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Main.cs` around lines 265 - 266, Several Dispatcher.Invoke/InvokeAsync calls lack the defensive null-check used elsewhere and can throw in headless/test runs; update the calls in UpdateMarketStatusAsync, LoggedIn, and SignOut to use the same pattern as the existing safe call around MainWindow.INSTANCE.Dispatcher (e.g., check MainWindow.INSTANCE?.Dispatcher != null or use the safe navigation to the Dispatcher before calling Invoke/InvokeAsync) so each call only executes when the Dispatcher is present; locate the calls by the method names UpdateMarketStatusAsync, LoggedIn, and SignOut and apply the null-guard consistently.WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs (1)
52-80: Consider extracting shared logic to reduce duplication.
ShouldFilterWordimplementations are nearly identical betweenSimplifiedChineseLanguageProcessorandTraditionalChineseLanguageProcessor. Consider extracting to a shared base class or helper method.♻️ Optional refactor to reduce duplication
+ /// <summary> + /// Base class for Chinese language processors with shared logic + /// </summary> + public abstract class ChineseLanguageProcessorBase : LanguageProcessor + { + protected ChineseLanguageProcessorBase(IReadOnlyApplicationSettings settings) : base(settings) { } + + public override bool ShouldFilterWord(string word) + { + if (string.IsNullOrEmpty(word)) return true; + + bool hasCJK = ContainsCJK(word); + bool hasLatin = word.Any(c => (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z')); + + if (hasCJK && !hasLatin) return false; + if (hasLatin && !hasCJK) return word.Length <= 2; + if (hasCJK && hasLatin && word.Length <= 2) return true; + + return false; + } + + public static bool ContainsCJK(string text) { /* existing impl */ } + }Also applies to: 150-178
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs` around lines 52 - 80, The ShouldFilterWord logic is duplicated across SimplifiedChineseLanguageProcessor and TraditionalChineseLanguageProcessor; extract the shared logic into a single protected method (e.g., a protected bool FilterWordCore(string word) or into a static helper class) and have both ShouldFilterWord implementations delegate to it; reuse the existing ContainsCJK helper from ChineseLanguageProcessor and preserve the exact rule checks for pure CJK, pure Latin, and mixed cases so behavior remains unchanged while removing duplicate code.tests/README.md (1)
17-27: Add language specifier to fenced code block.The directory structure code block lacks a language specifier. While this doesn't affect functionality, adding one improves syntax highlighting and satisfies linting.
📝 Suggested fix
-``` +```text tests/ ├── map.json # Lists scenarios to run🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/README.md` around lines 17 - 27, The fenced code block in tests/README.md presenting the directory tree is missing a language specifier; update the opening fence (the line with ```) to include a language token such as text or console (e.g., change ``` to ```text) so the block is labeled for syntax highlighting; look for the directory-tree fenced block (the one containing "tests/ ├── map.json ...") and add the language specifier to the opening fence.WFInfo/Services/TesseractService.cs (1)
226-268: Consider defensive handling for unsupported locales in checksum lookup.Line 255 calls
traineddata_checksums.GetValue(Locale).ToObject<string>()which will throw if the locale is not in the checksums dictionary. While the UI dropdown gates valid locale selection, a defensive check would improve robustness for edge cases (manual config edits, future locale additions).🛡️ Optional defensive guard
+ var checksumToken = traineddata_checksums.GetValue(Locale); + if (checksumToken == null) + { + Main.AddLog($"No traineddata checksum configured for locale '{Locale}'. Skipping download."); + return; + } + - if (!File.Exists(app_data_traineddata_path) || CustomEntrypoint.GetMD5hash(app_data_traineddata_path) != traineddata_checksums.GetValue(Locale).ToObject<string>()) + if (!File.Exists(app_data_traineddata_path) || CustomEntrypoint.GetMD5hash(app_data_traineddata_path) != checksumToken.ToObject<string>())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Services/TesseractService.cs` around lines 226 - 268, The getLocaleTessdata method assumes traineddata_checksums contains Locale and calls traineddata_checksums.GetValue(Locale).ToObject<string>(), which will throw for unsupported locales; change it to first check for the key (e.g., use traineddata_checksums.TryGetValue / traineddata_checksums[Locale] != null or JToken token = traineddata_checksums.GetValue(Locale)) and only compute MD5/compare if the token exists, otherwise skip download (or log a warning) so unsupported/unknown Locale values are handled defensively; reference getLocaleTessdata, traineddata_checksums, Locale, and the MD5 comparison block to apply this guard.WFInfo/Ocr.cs (1)
1029-1165: Consider removing or wiring the new OCR scoring helpers.
AnalyzeOCRQuality,SelectBestPSM, andCalculateVarianceare currently unused, which adds maintenance surface without runtime benefit.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Ocr.cs` around lines 1029 - 1165, These three helpers (AnalyzeOCRQuality, SelectBestPSM, CalculateVariance) are unused and should either be removed or wired into the OCR flow: either delete them to avoid dead code, or call them from the PSM selection path by (1) collecting modeResults (Dictionary<PageSegMode, List<Tuple<string, Rectangle>>>) after running OCR for each PageSegMode, (2) computing modeScores by calling AnalyzeOCRQuality(textLines, mode, imageWidth, imageHeight) for each mode, and (3) replacing the current PSM pick with SelectBestPSM(modeResults, modeScores); keep CalculateVariance as-is because AnalyzeOCRQuality uses it, and add or update unit tests for the new selection behavior if you choose to wire them in.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@WFInfo/Data.cs`:
- Around line 983-1009: The current non-English branch assumes split[2] is
localized to _settings.Locale but may be stale if cached marketItems were built
under a different locale; modify the logic in the block that builds
marketItemsSnapshot (inside the lock using marketItems, marketItemsLock,
processor.NormalizeForPatternMatching, and split[]) to first detect the cache
locale/version (e.g., a metadata entry in marketItems like "locale" or a version
string) and if it does not match _settings.Locale either trigger/flag a reload
of market_items.json or skip using split[2] for localized matching and fall back
to English split[0]; ensure the code uses the detected metadata to decide
whether to use processor-normalized split[2] or to use split[0] so matching
remains correct until a fresh locale-specific cache is loaded.
- Around line 443-445: Before writing the body to the fallback cache, validate
the deserialized payload (the JObject returned by
JsonConvert.DeserializeObject<JObject>(body)) to ensure it matches the expected
shape (non-null and contains required keys/structure for an item payload); only
call File.WriteAllText(localeSpecificFallbackPath, body) when the validation
passes; if validation fails, do not overwrite the existing fallback file—return
without changing the cache or return the previous cached data—so update the code
around the variables data, body, and localeSpecificFallbackPath to perform this
check and skip the write when invalid.
In `@WFInfo/Ocr.cs`:
- Around line 1022-1025: The catch block in WFInfo/Ocr.cs that currently
swallows exceptions must be changed to capture and log the exception instead of
silently returning empty results: change the bare catch to catch(Exception ex)
and use the module's existing logging facility (e.g., _logger.Error /
Logger.Error / Trace.WriteLine) to emit a descriptive message including
ex.Message and ex.ToString() and the context (method name where OCR extraction
occurs), then either rethrow or return a clearly documented failure result;
update the OCR extraction method (the try/catch around the OCR extraction code)
to preserve telemetry for debugging.
In `@WFInfo/Services/TesseractService.cs`:
- Around line 120-186: The CreateEngine method sets an unsupported Tesseract
variable "smooth_scaling_factor" in two places (CJK branch and English branch);
remove both engine.SetVariable("smooth_scaling_factor", "...") calls (or replace
them with a supported parameter if smoothing is required) and keep all other
SetVariable calls unchanged so the engine no longer attempts to set an
unrecognized option.
In `@WFInfo/Tests/OCRTestRunner.cs`:
- Around line 230-253: CompareResults uses HashSet-based membership which drops
multiplicity and never treats ExtraParts as a failure; update it to perform
multiset-aware comparisons by counting occurrences from result.ExpectedParts and
result.ActualParts (e.g., dictionaries/counters) so duplicates are detected as
missing or extra items, populate result.MissingParts and result.ExtraParts based
on differing counts rather than set membership, compute matched by summing
min(expectedCount, actualCount) across keys to derive AccuracyScore, and make
result.Success require both MissingParts.Count == 0 and ExtraParts.Count == 0
and no ErrorMessage.
- Around line 69-72: Resolve scenarioPath against a canonical testMapDir and
reject any resulting path that escapes that base: compute string baseDir =
Path.GetFullPath(testMapDir), then build jsonFull =
Path.GetFullPath(Path.Combine(baseDir, scenarioPath + ".json")) and imageFull =
Path.GetFullPath(Path.Combine(baseDir, scenarioPath + ".png")); verify each path
is either equal to baseDir or starts with baseDir + Path.DirectorySeparatorChar
(use case-insensitive comparison on Windows) and throw an exception or fail the
test if the check fails; then assign jsonPath = jsonFull, imagePath = imageFull
and keep testName = Path.GetFileName(scenarioPath).
---
Nitpick comments:
In `@tests/README.md`:
- Around line 17-27: The fenced code block in tests/README.md presenting the
directory tree is missing a language specifier; update the opening fence (the
line with ```) to include a language token such as text or console (e.g., change
``` to ```text) so the block is labeled for syntax highlighting; look for the
directory-tree fenced block (the one containing "tests/ ├── map.json ...") and
add the language specifier to the opening fence.
In `@WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs`:
- Around line 52-80: The ShouldFilterWord logic is duplicated across
SimplifiedChineseLanguageProcessor and TraditionalChineseLanguageProcessor;
extract the shared logic into a single protected method (e.g., a protected bool
FilterWordCore(string word) or into a static helper class) and have both
ShouldFilterWord implementations delegate to it; reuse the existing ContainsCJK
helper from ChineseLanguageProcessor and preserve the exact rule checks for pure
CJK, pure Latin, and mixed cases so behavior remains unchanged while removing
duplicate code.
In `@WFInfo/Main.cs`:
- Around line 265-266: Several Dispatcher.Invoke/InvokeAsync calls lack the
defensive null-check used elsewhere and can throw in headless/test runs; update
the calls in UpdateMarketStatusAsync, LoggedIn, and SignOut to use the same
pattern as the existing safe call around MainWindow.INSTANCE.Dispatcher (e.g.,
check MainWindow.INSTANCE?.Dispatcher != null or use the safe navigation to the
Dispatcher before calling Invoke/InvokeAsync) so each call only executes when
the Dispatcher is present; locate the calls by the method names
UpdateMarketStatusAsync, LoggedIn, and SignOut and apply the null-guard
consistently.
In `@WFInfo/Ocr.cs`:
- Around line 1029-1165: These three helpers (AnalyzeOCRQuality, SelectBestPSM,
CalculateVariance) are unused and should either be removed or wired into the OCR
flow: either delete them to avoid dead code, or call them from the PSM selection
path by (1) collecting modeResults (Dictionary<PageSegMode, List<Tuple<string,
Rectangle>>>) after running OCR for each PageSegMode, (2) computing modeScores
by calling AnalyzeOCRQuality(textLines, mode, imageWidth, imageHeight) for each
mode, and (3) replacing the current PSM pick with SelectBestPSM(modeResults,
modeScores); keep CalculateVariance as-is because AnalyzeOCRQuality uses it, and
add or update unit tests for the new selection behavior if you choose to wire
them in.
In `@WFInfo/Services/TesseractService.cs`:
- Around line 226-268: The getLocaleTessdata method assumes
traineddata_checksums contains Locale and calls
traineddata_checksums.GetValue(Locale).ToObject<string>(), which will throw for
unsupported locales; change it to first check for the key (e.g., use
traineddata_checksums.TryGetValue / traineddata_checksums[Locale] != null or
JToken token = traineddata_checksums.GetValue(Locale)) and only compute
MD5/compare if the token exists, otherwise skip download (or log a warning) so
unsupported/unknown Locale values are handled defensively; reference
getLocaleTessdata, traineddata_checksums, Locale, and the MD5 comparison block
to apply this guard.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
WFInfo/CustomEntrypoint.csWFInfo/Data.csWFInfo/LanguageProcessing/ChineseLanguageProcessor.csWFInfo/LanguageProcessing/EuropeanLanguageProcessor.csWFInfo/LanguageProcessing/KoreanLanguageProcessor.csWFInfo/LanguageProcessing/LanguageProcessor.csWFInfo/Main.csWFInfo/Ocr.csWFInfo/Services/TesseractService.csWFInfo/Tests/OCRTestRunner.csWFInfo/Tests/TestModels.csWFInfo/Tests/TestProgram.csWFInfo/errorDialogue.xaml.cstests/README.mdtests/run_tests.bat
🚧 Files skipped from review as they are similar to previous changes (3)
- WFInfo/CustomEntrypoint.cs
- WFInfo/errorDialogue.xaml.cs
- WFInfo/Tests/TestModels.cs
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (6)
WFInfo/Main.cs (1)
266-267: Consider centralizing dispatcher invocation logic.The same guard+invoke pattern is repeated across multiple methods. A shared helper would reduce duplication and keep behavior consistent.
♻️ Proposed refactor
+ private static bool TryInvokeMainWindow(Action<MainWindow> action) + { + var window = MainWindow.INSTANCE; + if (window?.Dispatcher == null) return false; + window.Dispatcher.Invoke(() => action(window)); + return true; + } public static void RunOnUIThread(Action act) { - if (MainWindow.INSTANCE?.Dispatcher != null) - MainWindow.INSTANCE.Dispatcher.Invoke(act); + _ = TryInvokeMainWindow(_ => act()); } public static void StatusUpdate(string message, int severity) { - if (MainWindow.INSTANCE?.Dispatcher != null) - MainWindow.INSTANCE.Dispatcher.Invoke(() => { MainWindow.INSTANCE.ChangeStatus(message, severity); }); + _ = TryInvokeMainWindow(window => window.ChangeStatus(message, severity)); } public void LoggedIn() { - if (MainWindow.INSTANCE?.Dispatcher != null) - MainWindow.INSTANCE.Dispatcher.Invoke(() => { MainWindow.INSTANCE.LoggedIn(); }); + _ = TryInvokeMainWindow(window => window.LoggedIn()); // start the AFK timer... } public static void SignOut() { - if (MainWindow.INSTANCE?.Dispatcher != null) - MainWindow.INSTANCE.Dispatcher.Invoke(() => { MainWindow.INSTANCE.SignOut(); }); + _ = TryInvokeMainWindow(window => window.SignOut()); }Also applies to: 303-304, 546-547, 617-618
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Main.cs` around lines 266 - 267, Create a single helper to centralize the guard+invoke pattern (e.g., a static method InvokeOnUI(Action act) placed on MainWindow or a UI helper class) and replace all occurrences of "if (MainWindow.INSTANCE?.Dispatcher != null) MainWindow.INSTANCE.Dispatcher.Invoke(act);" with that helper; the helper should check MainWindow.INSTANCE and its Dispatcher, and call Dispatcher.Invoke (or BeginInvoke if non-blocking behavior is preferred) to preserve current semantics so all call sites like the current uses of MainWindow.INSTANCE.Dispatcher.Invoke(act) use the consistent centralized logic.WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs (1)
419-423: Move replacement table to static precomputed data.Line [419]-Line [483] rebuilds and re-sorts a large mapping on every call to
NormalizeKoreanCharacters. This is avoidable overhead in a hot path.♻️ Refactor direction
+ private static readonly KeyValuePair<string, string>[] KoreanReplacements = + BuildKoreanReplacements() + .OrderByDescending(r => r.Key.Length) + .ToArray(); @@ - var replacements = new List<KeyValuePair<string, string>> - { - // huge mapping... - }; - string result = input; - foreach (var replacement in replacements.OrderByDescending(r => r.Key.Length)) + foreach (var replacement in KoreanReplacements) { result = result.Replace(replacement.Key, replacement.Value); }Also applies to: 480-483
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs` around lines 419 - 423, The replacements list is being rebuilt and sorted on every call to NormalizeKoreanCharacters causing unnecessary allocation and CPU work; extract the mapping into a static readonly precomputed field (e.g., a static readonly List<KeyValuePair<string,string>> or an immutable array) and initialize and sort it once (static constructor or inline static initializer), then have NormalizeKoreanCharacters reference that static replacements collection instead of creating a local one; update any other local usages (the variable named replacements around lines 419 and similar block at 480-483) to use the static field.WFInfo/Data.cs (1)
989-993: Drop the unused locale-resolution call in the non-English path.Line [991]-Line [993] computes
resolvedNameviaGetLocaleNameData(...)for non-English locales, but that value is not used in the non-English branch (Line [995]+). This adds unnecessary lookup/lock overhead in a hot path.♻️ Proposed cleanup
- // Resolve OCR text to English once before loops to avoid repeated expensive database searches - // Only resolve for non-English locales to avoid regression in English - string resolvedName = _settings.Locale == "en" ? name : GetLocaleNameData(name, false); - resolvedName = resolvedName ?? name; // Fallback to original OCR string if resolution fails + // Keep the original OCR text for direct matching paths.If you still want
resolvedName, compute it only inside the English branch where it is actually used.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Data.cs` around lines 989 - 993, The code performs a locale-resolution call into GetLocaleNameData for non-English paths but never uses the result; remove that unnecessary lookup by only assigning resolvedName (with fallback to name) inside the English branch where resolvedName is actually consumed, keep the existing conditional check on _settings.Locale and preserve the fallback behavior (resolvedName = name) and null-safety around GetLocaleNameData when computed.WFInfo/Ocr.cs (1)
779-812: Consider consolidating repetitive try-catch blocks.The three consecutive try-catch blocks for
IsPartVaulted,IsPartMastered, andPartsOwnedfollow an identical pattern. While functionally correct, this could be refactored to reduce duplication.♻️ Optional: Extract helper method
private static T SafeCall<T>(Func<T> func, T defaultValue, string operationName, string itemName) { try { return func(); } catch (Exception ex) { Main.AddLog($"ERROR: {operationName} failed for '{itemName}': {ex.Message}"); return defaultValue; } } // Usage: bool vaulted = SafeCall(() => Main.dataBase.IsPartVaulted(name), false, "IsPartVaulted", name); bool mastered = SafeCall(() => Main.dataBase.IsPartMastered(name), false, "IsPartMastered", name); string partsOwned = SafeCall(() => Main.dataBase.PartsOwned(name), "0", "PartsOwned", name);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Ocr.cs` around lines 779 - 812, Replace the three repetitive try/catch blocks around Main.dataBase.IsPartVaulted(name), Main.dataBase.IsPartMastered(name) and Main.dataBase.PartsOwned(name) with a single reusable helper (eg. SafeCall<T>) that accepts a Func<T>, a default value, an operation name and the item name; have SafeCall invoke the func, catch Exception, call Main.AddLog($"ERROR: {operationName} failed for '{itemName}': {ex.Message}") and return the default on error, then call SafeCall for vaulted, mastered and partsOwned (default false/false/"0") to remove duplication and keep the same logging/behavior.WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs (1)
28-44: Consider extracting shared normalization logic to reduce duplication.
NormalizeForPatternMatchingis nearly identical in bothSimplifiedChineseLanguageProcessorandTraditionalChineseLanguageProcessor. Similarly,IsPartNameValidandNormalizeChineseCharactersare duplicated.♻️ Optional: Create shared base class
public abstract class ChineseLanguageProcessorBase : LanguageProcessor { protected ChineseLanguageProcessorBase(IReadOnlyApplicationSettings settings) : base(settings) { } public override string CharacterWhitelist => GenerateCharacterRange(0x4E00, 0x9FFF) + GenerateCharacterRange(0x3400, 0x4DBF) + GenerateCharacterRange(0xF900, 0xFAFF) + "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz "; public override string NormalizeForPatternMatching(string input) { if (string.IsNullOrEmpty(input)) return input; string normalized = input.ToLower(_culture).Trim(); normalized = normalized.Replace("prime", " prime "); normalized = RemoveAccents(normalized); var parts = normalized.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries); return string.Join(" ", parts); } public override bool IsPartNameValid(string partName) => !string.IsNullOrEmpty(partName) && partName.Replace(" ", "").Length >= 4; public override bool ShouldFilterWord(string word) => SimplifiedChineseLanguageProcessor.FilterWordCore(word); protected static string NormalizeChineseCharacters(string input) => NormalizeFullWidthCharacters(input).ToLowerInvariant(); }Also applies to: 134-150
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs` around lines 28 - 44, The SimplifiedChineseLanguageProcessor and TraditionalChineseLanguageProcessor duplicate NormalizeForPatternMatching, IsPartNameValid, and NormalizeChineseCharacters; extract these shared behaviors into a new abstract base class (e.g., ChineseLanguageProcessorBase : LanguageProcessor) that provides a constructor accepting IReadOnlyApplicationSettings, implements CharacterWhitelist, NormalizeForPatternMatching, IsPartNameValid, ShouldFilterWord (delegating to any existing core filter method like FilterWordCore), and a protected NormalizeChineseCharacters helper; then have both SimplifiedChineseLanguageProcessor and TraditionalChineseLanguageProcessor inherit from ChineseLanguageProcessorBase and remove the duplicated implementations so they only contain language-specific overrides or additions.WFInfo/LanguageProcessing/LanguageProcessor.cs (1)
336-344: Consider potential memory allocation for large Unicode ranges.
GenerateCharacterRangeallocates achar[]for the entire range. For very large ranges (e.g., CJK blocks with ~20,000+ characters), this creates significant memory pressure. Since whitelists are typically created once at startup, this is acceptable but worth noting.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/LanguageProcessor.cs` around lines 336 - 344, GenerateCharacterRange currently allocates a char[] for the entire Unicode range which can spike memory for very large ranges; replace or augment it by either (A) adding a sanity check constant (e.g., MaxGeneratedRangeSize) and throw ArgumentOutOfRangeException when (end - start + 1) exceeds that limit to prevent accidental huge allocations in GenerateCharacterRange, or (B) add a streaming alternative method GenerateCharacterRangeIterator(int start, int end) that yields chars with yield return (no large array allocation) and update callers to use the iterator for large ranges; reference the existing GenerateCharacterRange method name when making the change so callers can be located and migrated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@WFInfo/Data.cs`:
- Around line 444-457: The code validates the JSON payload but still returns
(data, false) on invalid shape, which lets malformed data reach ReloadItems;
instead, after logging the invalid payload (use Main.AddLog and mention
wfmItemsUrl and localeSpecificFallbackPath), throw an exception (e.g.,
InvalidDataException or HttpRequestException) so the outer catch path will run
and the fallback file will be used, or alternatively return a clearly failing
sentinel (such as null with a flag that the caller interprets as error) — update
the return behavior in this method to not return the malformed data but to
trigger the fallback handling in ReloadItems.
In `@WFInfo/Tests/OCRTestRunner.cs`:
- Around line 186-198: The OCR engines are only reinitialized when
localeChanged, so changes to HDR (testCase.HDR) for the same locale leave the
previous HeadlessHDRDetector in place; update the condition to also trigger
reinitialization when the HDR setting changed (e.g., compute hdrChanged by
comparing the new HDR value against the currently active detector/settings) and
call OCR.InitForTest(...) with new HeadlessHDRDetector(testCase.HDR) when
localeChanged || hdrChanged, and still call
Main.dataBase.ReloadItems().GetAwaiter().GetResult() after reinitialization.
In `@WFInfo/Tests/TestProgram.cs`:
- Around line 20-24: The Main invocation path currently prints usage and returns
as success; update the code in TestProgram.cs (within the Main method around the
args.Length check that calls PrintUsage()) to set a non-zero exit code before
returning—e.g., assign Environment.ExitCode = 1 or call Environment.Exit(1)—so
invalid invocations are reported as failure instead of success.
---
Nitpick comments:
In `@WFInfo/Data.cs`:
- Around line 989-993: The code performs a locale-resolution call into
GetLocaleNameData for non-English paths but never uses the result; remove that
unnecessary lookup by only assigning resolvedName (with fallback to name) inside
the English branch where resolvedName is actually consumed, keep the existing
conditional check on _settings.Locale and preserve the fallback behavior
(resolvedName = name) and null-safety around GetLocaleNameData when computed.
In `@WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs`:
- Around line 28-44: The SimplifiedChineseLanguageProcessor and
TraditionalChineseLanguageProcessor duplicate NormalizeForPatternMatching,
IsPartNameValid, and NormalizeChineseCharacters; extract these shared behaviors
into a new abstract base class (e.g., ChineseLanguageProcessorBase :
LanguageProcessor) that provides a constructor accepting
IReadOnlyApplicationSettings, implements CharacterWhitelist,
NormalizeForPatternMatching, IsPartNameValid, ShouldFilterWord (delegating to
any existing core filter method like FilterWordCore), and a protected
NormalizeChineseCharacters helper; then have both
SimplifiedChineseLanguageProcessor and TraditionalChineseLanguageProcessor
inherit from ChineseLanguageProcessorBase and remove the duplicated
implementations so they only contain language-specific overrides or additions.
In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs`:
- Around line 419-423: The replacements list is being rebuilt and sorted on
every call to NormalizeKoreanCharacters causing unnecessary allocation and CPU
work; extract the mapping into a static readonly precomputed field (e.g., a
static readonly List<KeyValuePair<string,string>> or an immutable array) and
initialize and sort it once (static constructor or inline static initializer),
then have NormalizeKoreanCharacters reference that static replacements
collection instead of creating a local one; update any other local usages (the
variable named replacements around lines 419 and similar block at 480-483) to
use the static field.
In `@WFInfo/LanguageProcessing/LanguageProcessor.cs`:
- Around line 336-344: GenerateCharacterRange currently allocates a char[] for
the entire Unicode range which can spike memory for very large ranges; replace
or augment it by either (A) adding a sanity check constant (e.g.,
MaxGeneratedRangeSize) and throw ArgumentOutOfRangeException when (end - start +
1) exceeds that limit to prevent accidental huge allocations in
GenerateCharacterRange, or (B) add a streaming alternative method
GenerateCharacterRangeIterator(int start, int end) that yields chars with yield
return (no large array allocation) and update callers to use the iterator for
large ranges; reference the existing GenerateCharacterRange method name when
making the change so callers can be located and migrated.
In `@WFInfo/Main.cs`:
- Around line 266-267: Create a single helper to centralize the guard+invoke
pattern (e.g., a static method InvokeOnUI(Action act) placed on MainWindow or a
UI helper class) and replace all occurrences of "if
(MainWindow.INSTANCE?.Dispatcher != null)
MainWindow.INSTANCE.Dispatcher.Invoke(act);" with that helper; the helper should
check MainWindow.INSTANCE and its Dispatcher, and call Dispatcher.Invoke (or
BeginInvoke if non-blocking behavior is preferred) to preserve current semantics
so all call sites like the current uses of
MainWindow.INSTANCE.Dispatcher.Invoke(act) use the consistent centralized logic.
In `@WFInfo/Ocr.cs`:
- Around line 779-812: Replace the three repetitive try/catch blocks around
Main.dataBase.IsPartVaulted(name), Main.dataBase.IsPartMastered(name) and
Main.dataBase.PartsOwned(name) with a single reusable helper (eg. SafeCall<T>)
that accepts a Func<T>, a default value, an operation name and the item name;
have SafeCall invoke the func, catch Exception, call Main.AddLog($"ERROR:
{operationName} failed for '{itemName}': {ex.Message}") and return the default
on error, then call SafeCall for vaulted, mastered and partsOwned (default
false/false/"0") to remove duplication and keep the same logging/behavior.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
WFInfo/CustomEntrypoint.csWFInfo/Data.csWFInfo/LanguageProcessing/ChineseLanguageProcessor.csWFInfo/LanguageProcessing/EuropeanLanguageProcessor.csWFInfo/LanguageProcessing/KoreanLanguageProcessor.csWFInfo/LanguageProcessing/LanguageProcessor.csWFInfo/Main.csWFInfo/Ocr.csWFInfo/Services/TesseractService.csWFInfo/Tests/OCRTestRunner.csWFInfo/Tests/TestModels.csWFInfo/Tests/TestProgram.csWFInfo/errorDialogue.xaml.cstests/README.mdtests/run_tests.bat
🚧 Files skipped from review as they are similar to previous changes (4)
- WFInfo/errorDialogue.xaml.cs
- tests/run_tests.bat
- WFInfo/Tests/TestModels.cs
- WFInfo/CustomEntrypoint.cs
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (5)
WFInfo/Tests/OCRTestRunner.cs (1)
189-202: HDR change now triggers OCR reinitialization, but consider conditional data reload.The implementation now correctly reinitializes OCR engines when either locale or HDR changes, addressing the prior concern. However,
ReloadItems()is always called when reinitialization occurs, even if only HDR changed. Per the earlier suggestion, data reload may only be necessary for locale changes.⚡ Optimize to skip data reload when only HDR changed
if (localeChanged || hdrChanged) { string reason = localeChanged ? $"Locale changed to '{newLocale}'" : $"HDR changed to '{testCase.HDR}'"; Main.AddLog($" {reason}, reinitializing OCR engines..."); OCR.InitForTest( new TesseractService(), ApplicationSettings.GlobalReadonlySettings, _windowService, new HeadlessHDRDetector(testCase.HDR)); // Also re-update Data so Levenshtein uses the right locale for matching - Main.dataBase.ReloadItems().GetAwaiter().GetResult(); + if (localeChanged) + Main.dataBase.ReloadItems().GetAwaiter().GetResult(); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Tests/OCRTestRunner.cs` around lines 189 - 202, The code currently calls Main.dataBase.ReloadItems() whenever localeChanged || hdrChanged triggers OCR.InitForTest; change this so ReloadItems() is only invoked when localeChanged is true (HDR-only changes should skip data reload). Update the conditional around Main.dataBase.ReloadItems().GetAwaiter().GetResult() (within the block that handles localeChanged || hdrChanged, near OCR.InitForTest and HeadlessHDRDetector) to run the reload only if localeChanged, leaving OCR.InitForTest to run for both locale and HDR changes.WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs (1)
23-81: Duplicate romanization entries are technically harmless but wasteful.Several Korean syllables appear twice in
koreanReplacements(e.g., "궈" at lines 27, "놰" at line 31, "돼" at line 35, "퀘" at line 67). While theList<KeyValuePair>approach correctly avoids the runtime crash from the previousDictionaryimplementation, the duplicate entries are redundant since only the first match applies during iteration.🧹 Consider deduplicating entries
Remove duplicate entries to reduce memory and avoid confusion:
- Line 27: "궈" appears twice (keep first "gwo")
- Line 31: "놰" appears twice with different values ("nwo" vs "nwae")
- Line 35: "돼" appears twice with different values ("dwae" vs "doe")
- Line 67: "퀘" appears twice with different values ("kwo" vs "kwae")
For entries with intentionally different mappings, consider which romanization is more accurate for OCR matching purposes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs` around lines 23 - 81, The koreanReplacements list contains duplicated Korean keys (e.g., "궈", "놰", "돼", "퀘") which are redundant and can confuse maintenance; open the static List<KeyValuePair<string,string>> koreanReplacements and remove the later duplicate entries so each Hangul syllable appears only once, choosing the preferred romanization for conflicts (keep the first occurrence or replace it with the more accurate mapping) and ensure the final list has unique keys.WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs (1)
105-106: Latin-1 supplement range (0x00C0-0x00FF) includes non-letter characters.The range includes control characters (0x00C0-0x00FF) like
×(0x00D7),÷(0x00F7), and others that aren't letters. This is likely fine for Tesseract whitelisting since OCR rarely produces these, but if precision is desired, consider listing specific accented characters like the German processor does.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs` around lines 105 - 106, The CharacterWhitelist property in EuropeanLanguageProcessor currently appends GenerateCharacterRange(0x00C0, 0x00FF) which includes non-letter symbols (e.g., ×, ÷); change this to a precise set of accented letters instead of the full Latin-1 supplement range: either replace GenerateCharacterRange(0x00C0, 0x00FF) with an explicit string of only valid accented letters (similar to the German processor) or implement a helper that filters GenerateCharacterRange to include only unicode letters, and update CharacterWhitelist accordingly so it contains only A-Z, a-z, space and the specific accented characters.WFInfo/LanguageProcessing/LanguageProcessor.cs (1)
51-59: Consider logging fallback to invariant culture.The catch-all exception silently falls back to
InvariantCulture. While this prevents crashes, it may hide configuration issues.📝 Proposed enhancement
catch { // Fallback to invariant culture for unsupported locales + // Consider: Main.AddLog($"Unsupported locale '{locale}', falling back to invariant culture"); return CultureInfo.InvariantCulture; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/LanguageProcessor.cs` around lines 51 - 59, The catch block that returns CultureInfo.InvariantCulture currently swallows all exceptions; update it to log the failure and the offending locale before returning the fallback. In the method containing the call to new CultureInfo(locale, false) (the code that currently returns CultureInfo.InvariantCulture), capture the caught exception (e) and write a log entry including the locale and exception details (using your project's logging facility or System.Diagnostics) so callers can see why the fallback was used, then return CultureInfo.InvariantCulture as before.WFInfo/Data.cs (1)
992-1000: Remove unused locale-resolution work from the non-English path.
resolvedNameis computed for non-English at Line 999 but not used in that branch, adding avoidable work on hot OCR matching calls.♻️ Proposed refactor
- string resolvedName; - if (_settings.Locale == "en") - { - resolvedName = name; // Use original OCR text for English - } - else - { - resolvedName = GetLocaleNameData(name, false) ?? name; // Fallback to original OCR string if resolution fails - } + string resolvedName = name; // only used by English branch belowIf you still want pre-resolution for non-English in the future, move it into the branch where it is actually consumed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Data.cs` around lines 992 - 1000, The non-English branch is computing resolvedName via GetLocaleNameData(name, false) but never using it, wasting work on hot OCR matching; remove the unnecessary computation or move it into the location where resolvedName is actually consumed. Specifically, eliminate the resolvedName assignment inside the else branch (or defer calling GetLocaleNameData) and keep the English branch behavior using name; adjust any code paths that expect resolvedName to ensure they reference name or call GetLocaleNameData only when needed (referencing _settings.Locale, resolvedName, GetLocaleNameData, and name to locate the logic).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@WFInfo/Data.cs`:
- Around line 1024-1034: The parsing currently requires split.Length >= 3 which
skips two-field cache rows; change the logic around the split handling in the
block that builds marketItemsSnapshot so that you accept split.Length >= 2,
treat split[0] as the English name and when split.Length == 2 set the
localized/slug field to split[1] or to split[0] as a safe fallback, compute
englishNameLength from split[0], compute lengthDiff using (useLocalizedNames &&
split.Length >= 3 ? split[2].Length : split[0].Length), and set comparisonName
similarly (useLocalizedNames && split.Length >= 3 ? split[2] : split[0]) before
adding Tuple.Create(split[0], comparisonName,
processor.NormalizeForPatternMatching(comparisonName)) to marketItemsSnapshot so
older "name|slug" rows are included for non-localized matching.
- Around line 605-648: The null-recovery blocks that replace failed deserialized
data with new empty JObjects (for
marketData/marketItems/equipmentData/relicData/nameData) hide corruption;
instead, update the code around ParseFileOrMakeNew and the variables marketItems
(inside marketItemsLock), equipmentData, relicData, and nameData to fail fast:
if ParseFileOrMakeNew returns null set parseHasFailed (or throw a descriptive
exception) and stop initialization rather than creating an empty JObject, and
ensure any logging (Main.AddLog) includes the parse context and error before
aborting; keep ParseFileOrMakeNew usage but remove the silent creation of new
JObject so the caller surfaces the error.
In `@WFInfo/Ocr.cs`:
- Around line 2577-2584: The catch in GetTextFromImage silently swallows
exceptions; update the catch to capture the exception (e) and log it using the
same logger/format as GetTextWithBoundsFromImage so OCR failures are recorded,
then still set modeResults[mode] = "" and modeScores[mode] = 0; reference the
GetTextFromImage method and the modeResults/modeScores assignments so you locate
and replace the empty catch with a catch(Exception e) that calls the existing
logging utility (same logger/format used in GetTextWithBoundsFromImage) before
assigning the defaults.
In `@WFInfo/Services/TesseractService.cs`:
- Around line 125-127: Remove the invalid post-init SetVariable calls on engine
(engine.SetVariable("engine_mode"...), engine.SetVariable("oem_engine"...),
engine.SetVariable("enable_smoothing"...)); instead, set the OCR engine mode via
the TesseractEngine constructor/init using the documented parameter
tessedit_ocr_engine_mode (init-only), and if smoothing adjustments are required
use documented parameters such as thresholding_smooth_kernel_size or
textord_wordstats_smooth_factor passed at init or configured via appropriate API
before Init; update any code that assumes engine.SetVariable will change
init-only options and remove the unsupported enable_smoothing usage.
---
Nitpick comments:
In `@WFInfo/Data.cs`:
- Around line 992-1000: The non-English branch is computing resolvedName via
GetLocaleNameData(name, false) but never using it, wasting work on hot OCR
matching; remove the unnecessary computation or move it into the location where
resolvedName is actually consumed. Specifically, eliminate the resolvedName
assignment inside the else branch (or defer calling GetLocaleNameData) and keep
the English branch behavior using name; adjust any code paths that expect
resolvedName to ensure they reference name or call GetLocaleNameData only when
needed (referencing _settings.Locale, resolvedName, GetLocaleNameData, and name
to locate the logic).
In `@WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs`:
- Around line 105-106: The CharacterWhitelist property in
EuropeanLanguageProcessor currently appends GenerateCharacterRange(0x00C0,
0x00FF) which includes non-letter symbols (e.g., ×, ÷); change this to a precise
set of accented letters instead of the full Latin-1 supplement range: either
replace GenerateCharacterRange(0x00C0, 0x00FF) with an explicit string of only
valid accented letters (similar to the German processor) or implement a helper
that filters GenerateCharacterRange to include only unicode letters, and update
CharacterWhitelist accordingly so it contains only A-Z, a-z, space and the
specific accented characters.
In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs`:
- Around line 23-81: The koreanReplacements list contains duplicated Korean keys
(e.g., "궈", "놰", "돼", "퀘") which are redundant and can confuse maintenance; open
the static List<KeyValuePair<string,string>> koreanReplacements and remove the
later duplicate entries so each Hangul syllable appears only once, choosing the
preferred romanization for conflicts (keep the first occurrence or replace it
with the more accurate mapping) and ensure the final list has unique keys.
In `@WFInfo/LanguageProcessing/LanguageProcessor.cs`:
- Around line 51-59: The catch block that returns CultureInfo.InvariantCulture
currently swallows all exceptions; update it to log the failure and the
offending locale before returning the fallback. In the method containing the
call to new CultureInfo(locale, false) (the code that currently returns
CultureInfo.InvariantCulture), capture the caught exception (e) and write a log
entry including the locale and exception details (using your project's logging
facility or System.Diagnostics) so callers can see why the fallback was used,
then return CultureInfo.InvariantCulture as before.
In `@WFInfo/Tests/OCRTestRunner.cs`:
- Around line 189-202: The code currently calls Main.dataBase.ReloadItems()
whenever localeChanged || hdrChanged triggers OCR.InitForTest; change this so
ReloadItems() is only invoked when localeChanged is true (HDR-only changes
should skip data reload). Update the conditional around
Main.dataBase.ReloadItems().GetAwaiter().GetResult() (within the block that
handles localeChanged || hdrChanged, near OCR.InitForTest and
HeadlessHDRDetector) to run the reload only if localeChanged, leaving
OCR.InitForTest to run for both locale and HDR changes.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
WFInfo/CustomEntrypoint.csWFInfo/Data.csWFInfo/LanguageProcessing/ChineseLanguageProcessor.csWFInfo/LanguageProcessing/EuropeanLanguageProcessor.csWFInfo/LanguageProcessing/KoreanLanguageProcessor.csWFInfo/LanguageProcessing/LanguageProcessor.csWFInfo/Main.csWFInfo/Ocr.csWFInfo/Services/TesseractService.csWFInfo/Tests/OCRTestRunner.csWFInfo/Tests/TestModels.csWFInfo/Tests/TestProgram.csWFInfo/errorDialogue.xaml.cstests/README.mdtests/run_tests.bat
🚧 Files skipped from review as they are similar to previous changes (4)
- WFInfo/CustomEntrypoint.cs
- tests/run_tests.bat
- tests/README.md
- WFInfo/Tests/TestProgram.cs
There was a problem hiding this comment.
Actionable comments posted: 7
♻️ Duplicate comments (1)
tests/run_tests.bat (1)
40-47:⚠️ Potential issue | 🟡 MinorGuard timestamp generation failure before building output filename.
If
wmicfails,TIMESTAMPcan be empty and producetest_results_.json, which is easy to overwrite across runs.🛠️ Suggested hardening
+set "TIMESTAMP=" for /f "tokens=2 delims==" %%I in ('wmic os get localdatetime /value') do set "TIMESTAMP=%%I" +if not defined TIMESTAMP ( + echo ERROR: Failed to generate timestamp via WMIC + exit /b 4 +) set "TIMESTAMP=%TIMESTAMP:~0,8%_%TIMESTAMP:~8,6%"#!/bin/bash rg -n "wmic os get localdatetime|if not defined TIMESTAMP|set \"TIMESTAMP=%TIMESTAMP:~0,8%_%TIMESTAMP:~8,6%\"" tests/run_tests.bat -C2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/run_tests.bat` around lines 40 - 47, After calling wmic to set TIMESTAMP, add a guard that verifies TIMESTAMP is defined/non-empty and, if not, assigns a safe fallback (e.g. a sanitized %DATE%_%TIME% or a unique value using %RANDOM% with time) before composing OUTPUT_FILE; update the logic around the TIMESTAMP and OUTPUT_FILE variables so the if "%OUTPUT_FILE%"=="" block always builds a filename with a non-empty TIMESTAMP (referencing the TIMESTAMP and OUTPUT_FILE variables and the wmic call that produces TIMESTAMP).
🧹 Nitpick comments (4)
WFInfo/Tests/TestModels.cs (1)
39-40: InitializeScenariosto an empty list by default.This prevents null-dereference paths and simplifies runner logic for partially populated JSON maps.
🛠️ Suggested change
public class TestMap { [JsonProperty("scenarios")] - public List<string> Scenarios { get; set; } + public List<string> Scenarios { get; set; } = new List<string>(); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Tests/TestModels.cs` around lines 39 - 40, The Scenarios property on the TestModels class should be initialized to an empty list to avoid null dereferences; update the declaration of public List<string> Scenarios { get; set; } to instantiate a new List<string> (or set it in the TestModels constructor) so Scenarios defaults to an empty collection whenever a TestModels instance is created.WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs (1)
479-482: Precompute replacement iteration order once.
OrderByDescendingruns on every normalization call. Consider storing a pre-sorted static list to remove repeated allocation/sort work on hot OCR paths.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs` around lines 479 - 482, The loop currently calls koreanReplacements.OrderByDescending(r => r.Key.Length) on every normalization, causing repeated sorting/allocation; precompute a single ordered collection (e.g., a static readonly IList<KeyValuePair<string,string>> or array named something like koreanReplacementsOrdered) sorted by Key.Length descending when the processor is initialized, and then in the normalization method iterate that pre-sorted collection (use koreanReplacementsOrdered in place of koreanReplacements.OrderByDescending(...)) so result.Replace(replacement.Key, replacement.Value) uses the cached order without per-call sorting.WFInfo/LanguageProcessing/LanguageProcessor.cs (1)
218-247: Minor observation: potential redundant lowercasing.When
callBaseDefaultisfalse(the default),LevenshteinDistanceWithPreprocessingcallsDefaultLevenshteinDistancewhich appliesToLower(_culture). If thenormalizerfunction already lowercases the input (asNormalizeChineseCharactersdoes), this results in double lowercasing. This is functionally harmless but slightly inefficient.Consider passing
callBaseDefault: truefrom callers that already normalize case, or document thatnormalizershould not lowercase ifcallBaseDefaultis false.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/LanguageProcessor.cs` around lines 218 - 247, LevenshteinDistanceWithPreprocessing can double-lowercase when a provided normalizer (e.g., NormalizeChineseCharacters) already lowercases and callBaseDefault is false; update callers of LevenshteinDistanceWithPreprocessing that pass such normalizers to call it with callBaseDefault: true so the method calls ComputeLevenshteinCore (which expects already-normalized input) instead of DefaultLevenshteinDistance, or alternatively document the expectation that normalizers must not lowercase when relying on DefaultLevenshteinDistance; locate usages of LevenshteinDistanceWithPreprocessing and change the invocation to include callBaseDefault: true for those that perform case normalization.WFInfo/Ocr.cs (1)
111-115: Consider including Japanese locale in CJK check.
IsCJKLocale()checks for Korean and Chinese but not Japanese ("ja"). Japanese text also uses CJK ideographs and may benefit from the same line height and padding adjustments.However, per the PR description, Japanese is not supported yet, so this may be intentional.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Ocr.cs` around lines 111 - 115, The IsCJKLocale() method currently checks ApplicationSettings.GlobalReadonlySettings.Locale for "ko", "zh-hans", and "zh-hant" but omits Japanese; update IsCJKLocale() to include "ja" in the comparison (e.g., add || locale == "ja") so Japanese locales are treated as CJK for line-height/padding logic; ensure any callers that rely on IsCJKLocale() will now apply the same adjustments for Japanese text and add/update a unit test or comment if this behavior is intended.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@WFInfo/Data.cs`:
- Around line 614-619: In UpdateInner, do not silently return when JSON
deserialization yields null (e.g., the marketData null-check that currently
calls Main.AddLog, sets parseHasFailed and returns); instead throw a descriptive
exception (for example InvalidDataException or a custom ParseException) that
includes context (which field failed like "marketData") and any relevant data;
apply the same change to the other similar branches (the checks currently at the
ranges you noted) so each branch replaces Main.AddLog/parseHasFailed/return with
a thrown exception to surface fatal parse errors to callers.
In `@WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs`:
- Line 16: CharacterWhitelist in ChineseLanguageProcessor currently builds a
single large range (GenerateCharacterRange(0x4E00, 0x9FFF)) that exceeds
GenerateCharacterRange's 10,000-character limit and throws
ArgumentOutOfRangeException; split the large CJK ranges into multiple sub-ranges
under the limit and concatenate them (e.g., call GenerateCharacterRange
repeatedly for consecutive smaller spans) when building CharacterWhitelist,
keeping other ranges (0x3400-0x4DBF, 0xF900-0xFAFF) similarly split if needed;
reference the CharacterWhitelist property in ChineseLanguageProcessor and the
GenerateCharacterRange helper and follow the approach noted in
LanguageProcessor.cs comment.
In `@WFInfo/LanguageProcessing/LanguageProcessor.cs`:
- Around line 344-360: Character whitelist generation is throwing
ArgumentOutOfRangeException because ChineseLanguageProcessorBase and
JapaneseLanguageProcessor request ranges (0x4E00-0x9FFF and 0x4E00-0x9FAF)
larger than MaxGeneratedRangeSize; fix by either increasing
MaxGeneratedRangeSize to at least 25000 so GenerateCharacterRange(int start, int
end) can allocate those ranges, or refactor CharacterWhitelist creation in
ChineseLanguageProcessorBase/JapaneseLanguageProcessor to use
GenerateCharacterRangeIterator(int start, int end) and materialize via
string.Concat(iterator) (or a streaming builder) instead of calling
GenerateCharacterRange for large ranges; update references to
CharacterWhitelist, GenerateCharacterRange, GenerateCharacterRangeIterator, and
MaxGeneratedRangeSize accordingly.
In `@WFInfo/Main.cs`:
- Around line 98-102: Capture MainWindow.INSTANCE and its Dispatcher in local
variables before the await to avoid reading them across the async boundary;
e.g., store var wnd = MainWindow.INSTANCE and var disp = wnd?.Dispatcher, check
disp != null, then await disp.InvokeAsync(() => wnd.UpdateMarketStatus(msg));
ensure you reference MainWindow.INSTANCE, Dispatcher and UpdateMarketStatus and
only access INSTANCE once to prevent race conditions during teardown.
- Around line 266-267: RunOnUIThread currently races by checking
MainWindow.INSTANCE then invoking its Dispatcher later; fix it by capturing the
window once and performing safer dispatcher checks: store MainWindow.INSTANCE
into a local variable (e.g., mw), then if mw?.Dispatcher != null &&
!mw.Dispatcher.HasShutdownStarted && !mw.Dispatcher.HasShutdownFinished use
mw.Dispatcher.CheckAccess to run the action directly when already on UI thread
or call mw.Dispatcher.Invoke/BeginInvoke otherwise; apply this pattern to the
calls that reference MainWindow.INSTANCE and Dispatcher.Invoke (the act lambda
sites) to eliminate the lifecycle race and guard against dispatcher shutdown.
In `@WFInfo/Services/TesseractService.cs`:
- Around line 256-266: The empty catch around the traineddata download swallows
all errors; inside the try block that calls
webClient.DownloadFile(traineddata_hotlink, app_data_traineddata_path) and
File.Copy(app_data_traineddata_path, curr_data_traineddata_path, true) (in
TesseractService), replace the empty catch with concrete error handling: catch
(Exception ex) and either log the exception (using the project's logger) with
context about traineddata_hotlink, app_data_traineddata_path and
curr_data_traineddata_path, and/or rethrow or surface the error so callers can
react; ensure you include the exception message/stack in the log to preserve
root-cause information.
In `@WFInfo/Tests/OCRTestRunner.cs`:
- Around line 39-45: Validate that
JsonConvert.DeserializeObject<TestMap>(testMapJson) returned a non-null TestMap
and that testMap.Scenarios is non-null/non-empty before using it; if the
deserialization failed or scenarios is null, log a clear error via Main.AddLog
(or throw a descriptive exception) including testMapPath and abort the run, then
only proceed to use testMap.Scenarios in the foreach. Target the TestMap
deserialization and subsequent use of testMap and testMap.Scenarios in
OCRTestRunner.cs.
---
Duplicate comments:
In `@tests/run_tests.bat`:
- Around line 40-47: After calling wmic to set TIMESTAMP, add a guard that
verifies TIMESTAMP is defined/non-empty and, if not, assigns a safe fallback
(e.g. a sanitized %DATE%_%TIME% or a unique value using %RANDOM% with time)
before composing OUTPUT_FILE; update the logic around the TIMESTAMP and
OUTPUT_FILE variables so the if "%OUTPUT_FILE%"=="" block always builds a
filename with a non-empty TIMESTAMP (referencing the TIMESTAMP and OUTPUT_FILE
variables and the wmic call that produces TIMESTAMP).
---
Nitpick comments:
In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs`:
- Around line 479-482: The loop currently calls
koreanReplacements.OrderByDescending(r => r.Key.Length) on every normalization,
causing repeated sorting/allocation; precompute a single ordered collection
(e.g., a static readonly IList<KeyValuePair<string,string>> or array named
something like koreanReplacementsOrdered) sorted by Key.Length descending when
the processor is initialized, and then in the normalization method iterate that
pre-sorted collection (use koreanReplacementsOrdered in place of
koreanReplacements.OrderByDescending(...)) so result.Replace(replacement.Key,
replacement.Value) uses the cached order without per-call sorting.
In `@WFInfo/LanguageProcessing/LanguageProcessor.cs`:
- Around line 218-247: LevenshteinDistanceWithPreprocessing can double-lowercase
when a provided normalizer (e.g., NormalizeChineseCharacters) already lowercases
and callBaseDefault is false; update callers of
LevenshteinDistanceWithPreprocessing that pass such normalizers to call it with
callBaseDefault: true so the method calls ComputeLevenshteinCore (which expects
already-normalized input) instead of DefaultLevenshteinDistance, or
alternatively document the expectation that normalizers must not lowercase when
relying on DefaultLevenshteinDistance; locate usages of
LevenshteinDistanceWithPreprocessing and change the invocation to include
callBaseDefault: true for those that perform case normalization.
In `@WFInfo/Ocr.cs`:
- Around line 111-115: The IsCJKLocale() method currently checks
ApplicationSettings.GlobalReadonlySettings.Locale for "ko", "zh-hans", and
"zh-hant" but omits Japanese; update IsCJKLocale() to include "ja" in the
comparison (e.g., add || locale == "ja") so Japanese locales are treated as CJK
for line-height/padding logic; ensure any callers that rely on IsCJKLocale()
will now apply the same adjustments for Japanese text and add/update a unit test
or comment if this behavior is intended.
In `@WFInfo/Tests/TestModels.cs`:
- Around line 39-40: The Scenarios property on the TestModels class should be
initialized to an empty list to avoid null dereferences; update the declaration
of public List<string> Scenarios { get; set; } to instantiate a new List<string>
(or set it in the TestModels constructor) so Scenarios defaults to an empty
collection whenever a TestModels instance is created.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
WFInfo/CustomEntrypoint.csWFInfo/Data.csWFInfo/LanguageProcessing/ChineseLanguageProcessor.csWFInfo/LanguageProcessing/EuropeanLanguageProcessor.csWFInfo/LanguageProcessing/KoreanLanguageProcessor.csWFInfo/LanguageProcessing/LanguageProcessor.csWFInfo/Main.csWFInfo/Ocr.csWFInfo/Services/TesseractService.csWFInfo/Tests/OCRTestRunner.csWFInfo/Tests/TestModels.csWFInfo/Tests/TestProgram.csWFInfo/errorDialogue.xaml.cstests/README.mdtests/run_tests.bat
🚧 Files skipped from review as they are similar to previous changes (3)
- WFInfo/CustomEntrypoint.cs
- tests/README.md
- WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs
There was a problem hiding this comment.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
WFInfo/Ocr.cs (2)
1413-1757:⚠️ Potential issue | 🟠 MajorTemporary count-scan bitmaps leak on early
continuepaths.
cloneBitmapandcloneBitmapColouredare allocated before several earlycontinuebranches (Line 1463,Line 1544,Line 1637,Line 1710) and are not always disposed. This can exhaust GDI handles during noisy scans.♻️ Proposed disposal pattern
- Bitmap cloneBitmap = filteredImageClean.Clone(cloneRect, filteredImageClean.PixelFormat); - Bitmap cloneBitmapColoured = unfilteredImage.Clone(cloneRect, filteredImageClean.PixelFormat); + Bitmap cloneBitmap = null; + Bitmap cloneBitmapColoured = null; + try + { + cloneBitmap = filteredImageClean.Clone(cloneRect, filteredImageClean.PixelFormat); + cloneBitmapColoured = unfilteredImage.Clone(cloneRect, filteredImageClean.PixelFormat); // ... existing processing logic (including early continue paths) ... - cloneBitmapColoured.Dispose(); - cloneBitmap.Dispose(); + } + finally + { + cloneBitmapColoured?.Dispose(); + cloneBitmap?.Dispose(); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Ocr.cs` around lines 1413 - 1757, The code leaks GDI bitmaps because cloneBitmap and cloneBitmapColoured are created near the top of the inner loop (variables cloneBitmap, cloneBitmapColoured) but early continue paths (where sumBlack checks and Width/Height checks occur) return without disposing them; fix by ensuring both Bitmaps are disposed on all paths—wrap their creation in using blocks or add try/finally that disposes cloneBitmap and cloneBitmapColoured before every continue or on method exit; update the code paths around the first center-of-mass check (sumBlack < Height), after the flood-fill check (sumBlack < Height), after selecting topColor (topColor default check), and after the final width/height validation (Width < 5 || Height < 5) so no early return leaves undisposed Bitmaps.
320-341:⚠️ Potential issue | 🟠 MajorMarket-data safety check is incomplete;
plat/volumestill can throw.You now guard
ducats, butjob["plat"]anddouble.Parse(...)are still unchecked. A malformed/missingplatcan throw and abort reward processing mid-run.🛠️ Proposed hardening
string ducats = job["ducats"].ToObject<string>(); if (!int.TryParse(ducats, System.Globalization.NumberStyles.Integer, Main.culture, out int ducatValue) || ducatValue == 0) { hideRewardInfo = true; } //else if (correctName != "Kuva" || correctName != "Exilus Weapon Adapter Blueprint" || correctName != "Riven Sliver" || correctName != "Ayatan Amber Star") primeRewards.Add(correctName); -string plat = job["plat"].ToObject<string>(); +string plat = job["plat"]?.ToObject<string>(); +if (!double.TryParse(plat, styles, Main.culture, out double platinum)) +{ + Main.AddLog($"MARKET DATA: Invalid plat value for '{correctName}', skipping"); + continue; +} string primeSetPlat = null; if (primeSet != null) { primeSetPlat = (string)primeSet["plat"]; } -double platinum = double.Parse(plat, styles, Main.culture); -string volume = job["volume"].ToObject<string>(); +string volume = job["volume"]?.ToObject<string>() ?? "0";🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Ocr.cs` around lines 320 - 341, The code currently guards job["ducats"] but still accesses job["plat"] and job["volume"] directly and calls double.Parse, which can throw; update the processing around job, plat, and volume (symbols: job, plat, volume, double.Parse, primeRewards, hideRewardInfo) to first null-check job["plat"] and job["volume"], then use double.TryParse with the same styles and Main.culture to safely parse platinum into a double (handle failure by setting a default or skipping this reward and setting hideRewardInfo as appropriate), and only add to primeRewards when parsing succeeds and data is valid; ensure any fallback behavior (skip, log, or default values) is consistent with existing hideRewardInfo logic so malformed plat/volume won't abort processing.
♻️ Duplicate comments (2)
WFInfo/Services/TesseractService.cs (1)
256-269:⚠️ Potential issue | 🟠 MajorDo not continue silently when traineddata download fails and no local file exists.
Current logic logs and continues. On first-run/offline for a new locale, this can defer failure to engine creation with weaker diagnostics. Re-throw when both destination candidates are missing.
🛠️ Suggested fix
catch (Exception ex) { - Main.AddLog($"Failed to download traineddata for locale '{Locale}': {ex.Message}. Source: {traineddata_hotlink}, Target: {app_data_traineddata_path}"); - // Don't throw during initialization to allow service to continue with existing data + Main.AddLog($"Failed to download traineddata for locale '{Locale}': {ex.Message}. Source: {traineddata_hotlink}, Target: {app_data_traineddata_path}"); + // Continue only if some local traineddata is already available. + if (!File.Exists(app_data_traineddata_path) && !File.Exists(curr_data_traineddata_path)) + throw; }#!/bin/bash # Verify the catch path currently logs and continues without a local-file existence gate. rg -n -C3 'DownloadFile|File\.Copy|catch \(Exception ex\)|Failed to download traineddata|CreateEngine\(' WFInfo/Services/TesseractService.csExpected result: the catch block around download/copy logs an error but does not conditionally rethrow when both traineddata paths are absent.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Services/TesseractService.cs` around lines 256 - 269, The catch currently logs download failures in the webClient.DownloadFile/File.Copy block but swallows the exception; change it to check existence of both app_data_traineddata_path and curr_data_traineddata_path after the failed download and if neither file exists re-throw the caught exception (or throw a new informative exception including Locale, traineddata_hotlink, app_data_traineddata_path) so service initialization fails fast instead of deferring to CreateEngine; keep the Main.AddLog call for diagnostics and reference webClient.DownloadFile, File.Copy, Locale, traineddata_hotlink, app_data_traineddata_path, curr_data_traineddata_path and any existing catch(Exception ex) block when implementing.WFInfo/Main.cs (1)
266-275:⚠️ Potential issue | 🟠 Major
RunOnUIThreadstill allowsMainWindow.INSTANCElifecycle races through callsites.Even with dispatcher guards, the
Actiondelegate can still dereferenceMainWindow.INSTANCElater (Line [311], Line [553], Line [623]). Pass the captured window into the callback to close the race.🛠️ Suggested fix
- public static void RunOnUIThread(Action act) + public static void RunOnUIThread(Action<MainWindow> act) { var mw = MainWindow.INSTANCE; if (mw?.Dispatcher != null && !mw.Dispatcher.HasShutdownStarted && !mw.Dispatcher.HasShutdownFinished) { if (mw.Dispatcher.CheckAccess()) - act(); + act(mw); else - mw.Dispatcher.Invoke(act); + mw.Dispatcher.Invoke(() => act(mw)); } } - RunOnUIThread(() => { MainWindow.INSTANCE.ChangeStatus(message, severity); }); + RunOnUIThread(w => w.ChangeStatus(message, severity)); - RunOnUIThread(() => { MainWindow.INSTANCE.LoggedIn(); }); + RunOnUIThread(w => w.LoggedIn()); - RunOnUIThread(() => { MainWindow.INSTANCE.SignOut(); }); + RunOnUIThread(w => w.SignOut());Also applies to: 311-311, 553-553, 623-623
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Main.cs` around lines 266 - 275, The RunOnUIThread method currently captures MainWindow.INSTANCE lazily and still allows races when callers dereference MainWindow later; change RunOnUIThread to capture the window instance into a local variable (mw) and accept an Action<MainWindow> (or Func/Delegate that receives the window) so the invoked callback receives that captured mw rather than re-reading MainWindow.INSTANCE; update the method signature (RunOnUIThread) and all call sites that pass Action to instead pass Action<MainWindow> and use the provided mw parameter inside those callbacks to eliminate lifecycle races.
🧹 Nitpick comments (4)
WFInfo/Tests/OCRTestRunner.cs (1)
89-99: Redundant equality check in path traversal validation.The check
!jsonFull.Equals(baseDir, ...)on lines 89 and 95 is always true since a file path (ending in.jsonor.png) can never equal a directory path. This makes the condition effectively just theStartsWithcheck, which is correct.While the logic is safe, the redundant equality check adds confusion. Consider simplifying:
🔧 Simplify path validation
- if (!jsonFull.Equals(baseDir, StringComparison.OrdinalIgnoreCase) && - !jsonFull.StartsWith(baseDir + Path.DirectorySeparatorChar, StringComparison.OrdinalIgnoreCase)) + if (!jsonFull.StartsWith(baseDir + Path.DirectorySeparatorChar, StringComparison.OrdinalIgnoreCase)) { throw new Exception($"Path traversal detected for JSON file: {scenarioPath}"); } - if (!imageFull.Equals(baseDir, StringComparison.OrdinalIgnoreCase) && - !imageFull.StartsWith(baseDir + Path.DirectorySeparatorChar, StringComparison.OrdinalIgnoreCase)) + if (!imageFull.StartsWith(baseDir + Path.DirectorySeparatorChar, StringComparison.OrdinalIgnoreCase)) { throw new Exception($"Path traversal detected for image file: {scenarioPath}"); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Tests/OCRTestRunner.cs` around lines 89 - 99, The path-traversal checks in OCRTestRunner.cs use redundant equality comparisons (jsonFull.Equals(baseDir, ...) and imageFull.Equals(baseDir, ...)) which will never be true for file paths; remove those equality checks and simplify each conditional to only verify that jsonFull (and imageFull) StartsWith baseDir + Path.DirectorySeparatorChar using StringComparison.OrdinalIgnoreCase, keeping the same exception throw referencing scenarioPath; update the if conditions around jsonFull, imageFull, baseDir and scenarioPath accordingly.WFInfo/WFInfo.csproj (1)
133-164: Consider adding existence check before zipping.The
CreatePublishZiptarget assumesWFInfo.exeexists at$(ExePath). If the build fails or the path is incorrect, the PowerShell command will fail silently or produce an unclear error.💡 Optional: Add validation before zipping
<Target Name="CreatePublishZip" AfterTargets="GenerateUpdateXml" Condition="'$(Configuration)'=='Release'"> <PropertyGroup> <ExePath>$(OutputPath)WFInfo.exe</ExePath> <ZipPath>$(OutputPath)WFInfo.zip</ZipPath> </PropertyGroup> + <Error Condition="!Exists('$(ExePath)')" Text="Cannot create zip: $(ExePath) not found" /> + <!-- Create zip file using PowerShell with only the executable --> <Exec Command="powershell -Command "Compress-Archive -Path '$(ExePath)' -DestinationPath '$(ZipPath)' -Force""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/WFInfo.csproj` around lines 133 - 164, The CreatePublishZip target assumes WFInfo.exe exists at $(ExePath); add an existence check and fail early or skip zipping if it's missing. Inside the CreatePublishZip target (referencing ExePath, ZipPath and the Exec Command that calls Compress-Archive), add an MSBuild Condition/validation such as an <Error> with Condition="!Exists('$(ExePath)')" to stop the build with a clear message or wrap the <Exec> in Condition="Exists('$(ExePath)')" so the zip step only runs when the file is present; ensure the Message/Fail text mentions $(ExePath) for clarity.WFInfo/LanguageProcessing/PolishLanguageProcessor.cs (1)
59-75: Simplify unreachable uppercase replacements inNormalizePolishCharacters.After
ToLowerInvariant()at Line [59], uppercase replacements cannot match. Removing them reduces noise.♻️ Optional cleanup
private static string NormalizePolishCharacters(string input) { // Convert Polish characters to standard equivalents for comparison return input.ToLowerInvariant() .Replace('ą', 'a') - .Replace('Ą', 'A') .Replace('ę', 'e') - .Replace('Ę', 'E') .Replace('ć', 'c') - .Replace('Ć', 'C') .Replace('ł', 'l') - .Replace('Ł', 'L') .Replace('ś', 's') - .Replace('Ś', 'S') .Replace('ź', 'z') - .Replace('Ź', 'Z') .Replace('ż', 'z') - .Replace('Ż', 'Z') - .Replace('ó', 'o') - .Replace('Ó', 'O'); + .Replace('ó', 'o'); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/PolishLanguageProcessor.cs` around lines 59 - 75, In NormalizePolishCharacters remove the redundant Replace calls that handle uppercase Polish letters (e.g., .Replace('Ą','A'), .Replace('Ę','E'), .Replace('Ć','C'), .Replace('Ł','L'), .Replace('Ś','S'), .Replace('Ź','Z'), .Replace('Ż','Z'), .Replace('Ó','O')) because the method already calls ToLowerInvariant() at the start so those branches are unreachable; keep the lowercase replacements (ą,ę,ć,ł,ś,ź,ż,ó) and the initial ToLowerInvariant() and return the cleaned chain (refer to NormalizePolishCharacters to locate and update the Replace call chain).WFInfo/Data.cs (1)
879-883: Reduce lock scope inGetLocaleNameDatato avoid contention on OCR-heavy paths.
processor.GetLocalizedNameData(...)can be expensive; holdingmarketItemsLockfor the full operation unnecessarily blocks concurrent readers/writers.♻️ Suggested refactor
public string GetLocaleNameData(string s, bool useLevenshtein) { var processor = LanguageProcessorFactory.GetCurrentProcessor(); - lock (marketItemsLock) - { - return processor.GetLocalizedNameData(s, marketItems, useLevenshtein); - } + JObject snapshot; + lock (marketItemsLock) + { + snapshot = marketItems; + } + return processor.GetLocalizedNameData(s, snapshot, useLevenshtein); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Data.cs` around lines 879 - 883, GetLocaleNameData currently holds marketItemsLock while calling the expensive processor.GetLocalizedNameData, causing contention; instead, acquire marketItemsLock only to capture a safe snapshot (e.g., copy or shallow clone) of marketItems, then release the lock and call LanguageProcessorFactory.GetCurrentProcessor() followed by processor.GetLocalizedNameData(s, snapshot, useLevenshtein). Ensure you still read any minimal state under marketItemsLock (using marketItemsLock) but do not call processor.GetLocalizedNameData while holding the lock.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs`:
- Around line 16-21: CharacterWhitelist currently calls
GenerateCharacterRange(0x4E00, 0x7FFF) which exceeds MaxGeneratedRangeSize and
throws; split that large range into two valid chunks (for example 0x4E00-0x6FFF
and 0x7000-0x7FFF) and combine them with the other existing
GenerateCharacterRange calls (0x8000-0x9FFF, 0x3400-0x4DBF, 0xF900-0xFAFF) so
each GenerateCharacterRange invocation stays <= MaxGeneratedRangeSize; update
the CharacterWhitelist getter to concatenate the new smaller ranges (referencing
the CharacterWhitelist property and GenerateCharacterRange method) and ensure
ordering/spacing remains the same.
In `@WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs`:
- Around line 178-187: The check for mixed Japanese-Latin (hasJapanese &&
hasLatin) is unreachable because the earlier hasJapanese check returns
immediately; in JapaneseLanguageProcessor.cs move the mixed-language check
before the plain hasJapanese check (or remove the redundant plain hasJapanese
branch) so that the mixed-case logic runs; update the same method that contains
the hasJapanese/hasLatin variables (the word-filtering method) to evaluate
hasJapanese && hasLatin first and then fall back to hasJapanese-only and the
non-Japanese length check.
- Around line 217-220: The chained replacements in JapaneseLanguageProcessor use
result.Replace('ヲ', 'ヲ') which is a no-op and duplicates a later mapping to 'ヲ'
-> 'オ'; remove the redundant result.Replace('ヲ', 'ヲ') (or correct it to the
intended source character if it was a typo) so the Replace chain on the local
variable result only contains the correct mappings (e.g. keep
result.Replace('ヲ', 'オ') and the other mappings for 'ヮ','ヰ','ヱ'), then return
result.ToLowerInvariant().
In `@WFInfo/LanguageProcessing/LanguageProcessor.cs`:
- Around line 344-360: The first CJK range in ChineseLanguageProcessorBase is
calling GenerateCharacterRange(0x4E00, 0x7FFF) which produces 12,800 chars and
exceeds MaxGeneratedRangeSize; update the ChineseLanguageProcessorBase ranges to
split that call into smaller ranges (for example GenerateCharacterRange(0x4E00,
0x6FFF) and GenerateCharacterRange(0x7000, 0x7FFF)) so each generated range is
<= MaxGeneratedRangeSize, or replace the large call with repeated calls to
GenerateCharacterRangeIterator to build the same set without exceeding the
limit.
In `@WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs`:
- Around line 188-210: The NormalizeThaiCharacters method currently performs
destructive chained one-way replacements (e.g., 'ซ'→'ศ' then 'ศ'→'ษ') which
loses original info and risks false matches; replace that logic with a
single-pass, non-destructive canonicalization using a lookup map or grouping
(e.g., a Dictionary<char,char> or HashSet groups) that maps each character to a
chosen canonical representative (or leaves it unchanged) instead of chaining
Replace calls, or remove those replacements entirely and rely on the existing
GetThaiCharacterDifference similarity logic at comparison time; update
NormalizeThaiCharacters (and preserve NormalizeFullWidthCharacters usage) so it
iterates characters once, applies the canonical map, strips karan and diacritics
as needed, and returns the normalized string without multi-step cascading
replaces.
- Around line 136-145: The check for mixed Thai-Latin (if (hasThai && hasLatin))
is unreachable because the prior if (hasThai) returns early; in the
ThaiLanguageProcessor method containing the hasThai/hasLatin logic, either
remove the redundant mixed check or move the mixed-case branch above the plain
hasThai branch so mixed words are handled first (i.e., evaluate if (hasThai &&
hasLatin) before the if (hasThai) return false) and ensure the final return
applies only to non-Thai words.
- Around line 113-116: The toneMarks array in ThaiLanguageProcessor.cs contains
duplicate entries; update the definition of toneMarks so it only lists each Thai
tone mark once (remove the repeated '่', '้', '๊', '๋'), keeping the subsequent
logic that computes aIsTone and bIsTone and the early return if both are tone
marks unchanged.
In `@WFInfo/Ocr.cs`:
- Around line 191-199: The catch block in the _tesseractService.Init() call
should not set _tesseractService = null because callers like ProcessSnapIt,
profile scan paths, and tests still dereference it; instead remove the null
assignment and either rethrow the exception or set a dedicated init-failed flag
(e.g. add a private bool _tesseractInitFailed) so callers can explicitly check
initialization state; update places that assume non-null (e.g. ProcessSnapIt and
any test entry points) to check the new flag or handle the exception, and retain
the Main.AddLog(...) call to record the error.
- Around line 2901-2931: ProcessRewardScreenForTest currently dereferences
_tesseractService without checking for null (InitForTest can set it to null);
add an early guard at the top of ProcessRewardScreenForTest that returns an
empty results list if _tesseractService is null (or if _tesseractService.Engines
or FirstEngine are missing), and mirror the same null-guard pattern in the other
test entry point around lines 2954-2968; reference symbols:
ProcessRewardScreenForTest, _tesseractService, InitForTest, GetTextFromImage,
and ensure you check for _tesseractService == null (and safe-check
Engines/FirstEngine) before accessing them.
- Around line 111-115: IsCJKLocale currently reads
ApplicationSettings.GlobalReadonlySettings.Locale (a global static) which can
differ from the injected _settings or be null; change it to use the injected
settings instance (e.g., _settings.Locale or Settings.Locale) and stop using the
global static: make IsCJKLocale non-static if necessary so it can access the
instance field, read the locale from _settings with a null-safe check (default
to empty string) and then return locale == "ko" || locale == "zh-hans" || locale
== "zh-hant" || locale == "ja".
---
Outside diff comments:
In `@WFInfo/Ocr.cs`:
- Around line 1413-1757: The code leaks GDI bitmaps because cloneBitmap and
cloneBitmapColoured are created near the top of the inner loop (variables
cloneBitmap, cloneBitmapColoured) but early continue paths (where sumBlack
checks and Width/Height checks occur) return without disposing them; fix by
ensuring both Bitmaps are disposed on all paths—wrap their creation in using
blocks or add try/finally that disposes cloneBitmap and cloneBitmapColoured
before every continue or on method exit; update the code paths around the first
center-of-mass check (sumBlack < Height), after the flood-fill check (sumBlack <
Height), after selecting topColor (topColor default check), and after the final
width/height validation (Width < 5 || Height < 5) so no early return leaves
undisposed Bitmaps.
- Around line 320-341: The code currently guards job["ducats"] but still
accesses job["plat"] and job["volume"] directly and calls double.Parse, which
can throw; update the processing around job, plat, and volume (symbols: job,
plat, volume, double.Parse, primeRewards, hideRewardInfo) to first null-check
job["plat"] and job["volume"], then use double.TryParse with the same styles and
Main.culture to safely parse platinum into a double (handle failure by setting a
default or skipping this reward and setting hideRewardInfo as appropriate), and
only add to primeRewards when parsing succeeds and data is valid; ensure any
fallback behavior (skip, log, or default values) is consistent with existing
hideRewardInfo logic so malformed plat/volume won't abort processing.
---
Duplicate comments:
In `@WFInfo/Main.cs`:
- Around line 266-275: The RunOnUIThread method currently captures
MainWindow.INSTANCE lazily and still allows races when callers dereference
MainWindow later; change RunOnUIThread to capture the window instance into a
local variable (mw) and accept an Action<MainWindow> (or Func/Delegate that
receives the window) so the invoked callback receives that captured mw rather
than re-reading MainWindow.INSTANCE; update the method signature (RunOnUIThread)
and all call sites that pass Action to instead pass Action<MainWindow> and use
the provided mw parameter inside those callbacks to eliminate lifecycle races.
In `@WFInfo/Services/TesseractService.cs`:
- Around line 256-269: The catch currently logs download failures in the
webClient.DownloadFile/File.Copy block but swallows the exception; change it to
check existence of both app_data_traineddata_path and curr_data_traineddata_path
after the failed download and if neither file exists re-throw the caught
exception (or throw a new informative exception including Locale,
traineddata_hotlink, app_data_traineddata_path) so service initialization fails
fast instead of deferring to CreateEngine; keep the Main.AddLog call for
diagnostics and reference webClient.DownloadFile, File.Copy, Locale,
traineddata_hotlink, app_data_traineddata_path, curr_data_traineddata_path and
any existing catch(Exception ex) block when implementing.
---
Nitpick comments:
In `@WFInfo/Data.cs`:
- Around line 879-883: GetLocaleNameData currently holds marketItemsLock while
calling the expensive processor.GetLocalizedNameData, causing contention;
instead, acquire marketItemsLock only to capture a safe snapshot (e.g., copy or
shallow clone) of marketItems, then release the lock and call
LanguageProcessorFactory.GetCurrentProcessor() followed by
processor.GetLocalizedNameData(s, snapshot, useLevenshtein). Ensure you still
read any minimal state under marketItemsLock (using marketItemsLock) but do not
call processor.GetLocalizedNameData while holding the lock.
In `@WFInfo/LanguageProcessing/PolishLanguageProcessor.cs`:
- Around line 59-75: In NormalizePolishCharacters remove the redundant Replace
calls that handle uppercase Polish letters (e.g., .Replace('Ą','A'),
.Replace('Ę','E'), .Replace('Ć','C'), .Replace('Ł','L'), .Replace('Ś','S'),
.Replace('Ź','Z'), .Replace('Ż','Z'), .Replace('Ó','O')) because the method
already calls ToLowerInvariant() at the start so those branches are unreachable;
keep the lowercase replacements (ą,ę,ć,ł,ś,ź,ż,ó) and the initial
ToLowerInvariant() and return the cleaned chain (refer to
NormalizePolishCharacters to locate and update the Replace call chain).
In `@WFInfo/Tests/OCRTestRunner.cs`:
- Around line 89-99: The path-traversal checks in OCRTestRunner.cs use redundant
equality comparisons (jsonFull.Equals(baseDir, ...) and
imageFull.Equals(baseDir, ...)) which will never be true for file paths; remove
those equality checks and simplify each conditional to only verify that jsonFull
(and imageFull) StartsWith baseDir + Path.DirectorySeparatorChar using
StringComparison.OrdinalIgnoreCase, keeping the same exception throw referencing
scenarioPath; update the if conditions around jsonFull, imageFull, baseDir and
scenarioPath accordingly.
In `@WFInfo/WFInfo.csproj`:
- Around line 133-164: The CreatePublishZip target assumes WFInfo.exe exists at
$(ExePath); add an existence check and fail early or skip zipping if it's
missing. Inside the CreatePublishZip target (referencing ExePath, ZipPath and
the Exec Command that calls Compress-Archive), add an MSBuild
Condition/validation such as an <Error> with Condition="!Exists('$(ExePath)')"
to stop the build with a clear message or wrap the <Exec> in
Condition="Exists('$(ExePath)')" so the zip step only runs when the file is
present; ensure the Message/Fail text mentions $(ExePath) for clarity.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
WFInfo/CustomEntrypoint.csWFInfo/Data.csWFInfo/LanguageProcessing/ChineseLanguageProcessor.csWFInfo/LanguageProcessing/EuropeanLanguageProcessor.csWFInfo/LanguageProcessing/JapaneseLanguageProcessor.csWFInfo/LanguageProcessing/KoreanLanguageProcessor.csWFInfo/LanguageProcessing/LanguageProcessor.csWFInfo/LanguageProcessing/PolishLanguageProcessor.csWFInfo/LanguageProcessing/ThaiLanguageProcessor.csWFInfo/LanguageProcessing/TurkishLanguageProcessor.csWFInfo/Main.csWFInfo/Ocr.csWFInfo/Services/TesseractService.csWFInfo/Tests/OCRTestRunner.csWFInfo/Tests/TestModels.csWFInfo/Tests/TestProgram.csWFInfo/WFInfo.csprojWFInfo/errorDialogue.xaml.cstests/README.mdtests/run_tests.bat
✅ Files skipped from review due to trivial changes (1)
- tests/README.md
🚧 Files skipped from review as they are similar to previous changes (4)
- WFInfo/Tests/TestModels.cs
- WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs
- tests/run_tests.bat
- WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs
| public override string CharacterWhitelist => | ||
| GenerateCharacterRange(0x4E00, 0x7FFF) + | ||
| GenerateCharacterRange(0x8000, 0x9FFF) + | ||
| GenerateCharacterRange(0x3400, 0x4DBF) + | ||
| GenerateCharacterRange(0xF900, 0xFAFF) + | ||
| "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz "; // Full CJK ideographs |
There was a problem hiding this comment.
Runtime exception: CJK range 0x4E00-0x7FFF exceeds the 10,000 character limit.
The range 0x4E00-0x7FFF contains 12,800 characters (0x7FFF - 0x4E00 + 1), which exceeds MaxGeneratedRangeSize (10,000) and will throw ArgumentOutOfRangeException when CharacterWhitelist is accessed.
🐛 Split the large range into smaller chunks
public override string CharacterWhitelist =>
- GenerateCharacterRange(0x4E00, 0x7FFF) +
+ GenerateCharacterRange(0x4E00, 0x6FFF) +
+ GenerateCharacterRange(0x7000, 0x7FFF) +
GenerateCharacterRange(0x8000, 0x9FFF) +
GenerateCharacterRange(0x3400, 0x4DBF) +
GenerateCharacterRange(0xF900, 0xFAFF) +
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz "; // Full CJK ideographsRange sizes after fix:
0x4E00-0x6FFF: 8,704 chars ✓0x7000-0x7FFF: 4,096 chars ✓0x8000-0x9FFF: 8,192 chars ✓0x3400-0x4DBF: 6,592 chars ✓0xF900-0xFAFF: 512 chars ✓
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| public override string CharacterWhitelist => | |
| GenerateCharacterRange(0x4E00, 0x7FFF) + | |
| GenerateCharacterRange(0x8000, 0x9FFF) + | |
| GenerateCharacterRange(0x3400, 0x4DBF) + | |
| GenerateCharacterRange(0xF900, 0xFAFF) + | |
| "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz "; // Full CJK ideographs | |
| public override string CharacterWhitelist => | |
| GenerateCharacterRange(0x4E00, 0x6FFF) + | |
| GenerateCharacterRange(0x7000, 0x7FFF) + | |
| GenerateCharacterRange(0x8000, 0x9FFF) + | |
| GenerateCharacterRange(0x3400, 0x4DBF) + | |
| GenerateCharacterRange(0xF900, 0xFAFF) + | |
| "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz "; // Full CJK ideographs |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@WFInfo/LanguageProcessing/ChineseLanguageProcessor.cs` around lines 16 - 21,
CharacterWhitelist currently calls GenerateCharacterRange(0x4E00, 0x7FFF) which
exceeds MaxGeneratedRangeSize and throws; split that large range into two valid
chunks (for example 0x4E00-0x6FFF and 0x7000-0x7FFF) and combine them with the
other existing GenerateCharacterRange calls (0x8000-0x9FFF, 0x3400-0x4DBF,
0xF900-0xFAFF) so each GenerateCharacterRange invocation stays <=
MaxGeneratedRangeSize; update the CharacterWhitelist getter to concatenate the
new smaller ranges (referencing the CharacterWhitelist property and
GenerateCharacterRange method) and ensure ordering/spacing remains the same.
| // Keep all Japanese text (Hiragana/Katakana/Kanji characters) since Japanese words are meaningful | ||
| // even when split by OCR | ||
| if (hasJapanese) return false; | ||
|
|
||
| // For mixed Japanese-Latin words, be more lenient | ||
| if (hasJapanese && hasLatin) return false; | ||
|
|
||
| // For non-Japanese text, use standard filtering (filter very short words) | ||
| return word.Length < 2; | ||
| } |
There was a problem hiding this comment.
Dead code: mixed Japanese-Latin check is unreachable.
Same issue as ThaiLanguageProcessor: line 180 returns when hasJapanese is true, making line 183 (if (hasJapanese && hasLatin)) unreachable.
🔧 Remove unreachable code
// Keep all Japanese text (Hiragana/Katakana/Kanji characters) since Japanese words are meaningful
// even when split by OCR
if (hasJapanese) return false;
- // For mixed Japanese-Latin words, be more lenient
- if (hasJapanese && hasLatin) return false;
-
// For non-Japanese text, use standard filtering (filter very short words)
return word.Length < 2;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs` around lines 178 -
187, The check for mixed Japanese-Latin (hasJapanese && hasLatin) is unreachable
because the earlier hasJapanese check returns immediately; in
JapaneseLanguageProcessor.cs move the mixed-language check before the plain
hasJapanese check (or remove the redundant plain hasJapanese branch) so that the
mixed-case logic runs; update the same method that contains the
hasJapanese/hasLatin variables (the word-filtering method) to evaluate
hasJapanese && hasLatin first and then fall back to hasJapanese-only and the
non-Japanese length check.
| // Common katakana OCR confusions | ||
| result = result.Replace('ヲ', 'ヲ').Replace('ヮ', 'ワ').Replace('ヰ', 'イ').Replace('ヱ', 'エ').Replace('ヲ', 'オ'); | ||
|
|
||
| return result.ToLowerInvariant(); |
There was a problem hiding this comment.
No-op replacement and potential error in normalization.
Line 218 contains result.Replace('ヲ', 'ヲ') which replaces a character with itself (no-op). Later on the same line, 'ヲ' is also replaced with 'オ', but the first replacement does nothing and may indicate a copy-paste error where a different character was intended.
🔧 Remove or fix no-op replacement
- result = result.Replace('ヲ', 'ヲ').Replace('ヮ', 'ワ').Replace('ヰ', 'イ').Replace('ヱ', 'エ').Replace('ヲ', 'オ');
+ result = result.Replace('ヮ', 'ワ').Replace('ヰ', 'イ').Replace('ヱ', 'エ').Replace('ヲ', 'オ');📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Common katakana OCR confusions | |
| result = result.Replace('ヲ', 'ヲ').Replace('ヮ', 'ワ').Replace('ヰ', 'イ').Replace('ヱ', 'エ').Replace('ヲ', 'オ'); | |
| return result.ToLowerInvariant(); | |
| // Common katakana OCR confusions | |
| result = result.Replace('ヮ', 'ワ').Replace('ヰ', 'イ').Replace('ヱ', 'エ').Replace('ヲ', 'オ'); | |
| return result.ToLowerInvariant(); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs` around lines 217 -
220, The chained replacements in JapaneseLanguageProcessor use
result.Replace('ヲ', 'ヲ') which is a no-op and duplicates a later mapping to 'ヲ'
-> 'オ'; remove the redundant result.Replace('ヲ', 'ヲ') (or correct it to the
intended source character if it was a typo) so the Replace chain on the local
variable result only contains the correct mappings (e.g. keep
result.Replace('ヲ', 'オ') and the other mappings for 'ヮ','ヰ','ヱ'), then return
result.ToLowerInvariant().
| var toneMarks = new[] {'่', '้', '๊', '๋', '่', '้', '๊', '๋'}; // Different tone marks | ||
| bool aIsTone = toneMarks.Contains(a); | ||
| bool bIsTone = toneMarks.Contains(b); | ||
| if (aIsTone && bIsTone) return 1; // Low cost for tone mark differences |
There was a problem hiding this comment.
Tone marks array contains duplicates.
The toneMarks array contains each tone mark twice ('่', '้', '๊', '๋' repeated), which is unnecessary and may indicate a copy-paste error.
🔧 Remove duplicate tone marks
- var toneMarks = new[] {'่', '้', '๊', '๋', '่', '้', '๊', '๋'}; // Different tone marks
+ var toneMarks = new[] {'่', '้', '๊', '๋'}; // Thai tone marks📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| var toneMarks = new[] {'่', '้', '๊', '๋', '่', '้', '๊', '๋'}; // Different tone marks | |
| bool aIsTone = toneMarks.Contains(a); | |
| bool bIsTone = toneMarks.Contains(b); | |
| if (aIsTone && bIsTone) return 1; // Low cost for tone mark differences | |
| var toneMarks = new[] {'่', '้', '๊', '๋'}; // Thai tone marks | |
| bool aIsTone = toneMarks.Contains(a); | |
| bool bIsTone = toneMarks.Contains(b); | |
| if (aIsTone && bIsTone) return 1; // Low cost for tone mark differences |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs` around lines 113 - 116,
The toneMarks array in ThaiLanguageProcessor.cs contains duplicate entries;
update the definition of toneMarks so it only lists each Thai tone mark once
(remove the repeated '่', '้', '๊', '๋'), keeping the subsequent logic that
computes aIsTone and bIsTone and the early return if both are tone marks
unchanged.
|
|
||
| // Keep all Thai text since Thai words are meaningful even when split by OCR | ||
| if (hasThai) return false; | ||
|
|
||
| // For mixed Thai-Latin words, be more lenient | ||
| if (hasThai && hasLatin) return false; | ||
|
|
||
| // For non-Thai text, use standard filtering (filter very short words) | ||
| return word.Length < 2; | ||
| } |
There was a problem hiding this comment.
Dead code: mixed Thai-Latin check is unreachable.
Line 138 returns false when hasThai is true, making the subsequent check on line 141 (if (hasThai && hasLatin)) unreachable.
🔧 Remove unreachable code
// Keep all Thai text since Thai words are meaningful even when split by OCR
if (hasThai) return false;
- // For mixed Thai-Latin words, be more lenient
- if (hasThai && hasLatin) return false;
-
// For non-Thai text, use standard filtering (filter very short words)
return word.Length < 2;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Keep all Thai text since Thai words are meaningful even when split by OCR | |
| if (hasThai) return false; | |
| // For mixed Thai-Latin words, be more lenient | |
| if (hasThai && hasLatin) return false; | |
| // For non-Thai text, use standard filtering (filter very short words) | |
| return word.Length < 2; | |
| } | |
| // Keep all Thai text since Thai words are meaningful even when split by OCR | |
| if (hasThai) return false; | |
| // For non-Thai text, use standard filtering (filter very short words) | |
| return word.Length < 2; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs` around lines 136 - 145,
The check for mixed Thai-Latin (if (hasThai && hasLatin)) is unreachable because
the prior if (hasThai) returns early; in the ThaiLanguageProcessor method
containing the hasThai/hasLatin logic, either remove the redundant mixed check
or move the mixed-case branch above the plain hasThai branch so mixed words are
handled first (i.e., evaluate if (hasThai && hasLatin) before the if (hasThai)
return false) and ensure the final return applies only to non-Thai words.
| private static string NormalizeThaiCharacters(string input) | ||
| { | ||
| string result = NormalizeFullWidthCharacters(input); | ||
|
|
||
| // Basic Thai tone mark normalization | ||
| result = result.Normalize(System.Text.NormalizationForm.FormC); | ||
|
|
||
| // Common Thai OCR confusions and character variations | ||
| result = result.Replace('ซ', 'ศ').Replace('ศ', 'ษ'); // so variations normalization | ||
| result = result.Replace('ผ', 'ฝ'); // pho/fo confusion | ||
| result = result.Replace('บ', 'ป'); // bo/po confusion | ||
| result = result.Replace('ด', 'ต'); // do/to confusion | ||
| result = result.Replace('อ', 'โ'); // o/o form variations | ||
|
|
||
| // Remove or normalize common diacritic issues | ||
| result = result.Replace("์", ""); // Remove karan (silent marker) for comparison | ||
|
|
||
| // Normalize similar vowel forms | ||
| result = result.Replace('ใ', 'ไ'); // ai vowel variations | ||
| result = result.Replace('ำ', 'ํ'); // am vowel variations | ||
|
|
||
| return result.ToLowerInvariant(); | ||
| } |
There was a problem hiding this comment.
Thai character normalization is destructive and may cause false matches.
NormalizeThaiCharacters chains multiple one-way replacements (e.g., ซ→ศ then ศ→ษ) which:
- Loses the original character information
- May cause unrelated words to match incorrectly
This could reduce OCR accuracy rather than improve it. Consider whether these normalizations should be bidirectional similarity mappings (like in GetThaiCharacterDifference) rather than destructive replacements.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs` around lines 188 - 210,
The NormalizeThaiCharacters method currently performs destructive chained
one-way replacements (e.g., 'ซ'→'ศ' then 'ศ'→'ษ') which loses original info and
risks false matches; replace that logic with a single-pass, non-destructive
canonicalization using a lookup map or grouping (e.g., a Dictionary<char,char>
or HashSet groups) that maps each character to a chosen canonical representative
(or leaves it unchanged) instead of chaining Replace calls, or remove those
replacements entirely and rely on the existing GetThaiCharacterDifference
similarity logic at comparison time; update NormalizeThaiCharacters (and
preserve NormalizeFullWidthCharacters usage) so it iterates characters once,
applies the canonical map, strips karan and diacritics as needed, and returns
the normalized string without multi-step cascading replaces.
| private static bool IsCJKLocale() | ||
| { | ||
| var locale = ApplicationSettings.GlobalReadonlySettings.Locale; | ||
| return locale == "ko" || locale == "zh-hans" || locale == "zh-hant" || locale == "ja"; | ||
| } |
There was a problem hiding this comment.
Use injected settings for locale detection instead of global static state.
IsCJKLocale() reads ApplicationSettings.GlobalReadonlySettings.Locale, which can diverge from _settings (and can be null in test/headless flows). This can misapply CJK heuristics or throw unexpectedly.
💡 Proposed fix
private static bool IsCJKLocale()
{
- var locale = ApplicationSettings.GlobalReadonlySettings.Locale;
+ var locale = _settings?.Locale ?? ApplicationSettings.GlobalReadonlySettings?.Locale;
+ if (string.IsNullOrEmpty(locale))
+ return false;
return locale == "ko" || locale == "zh-hans" || locale == "zh-hant" || locale == "ja";
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@WFInfo/Ocr.cs` around lines 111 - 115, IsCJKLocale currently reads
ApplicationSettings.GlobalReadonlySettings.Locale (a global static) which can
differ from the injected _settings or be null; change it to use the injected
settings instance (e.g., _settings.Locale or Settings.Locale) and stop using the
global static: make IsCJKLocale non-static if necessary so it can access the
instance field, read the locale from _settings with a null-safe check (default
to empty string) and then return locale == "ko" || locale == "zh-hans" || locale
== "zh-hant" || locale == "ja".
| try | ||
| { | ||
| _tesseractService.Init(); | ||
| } | ||
| catch (Exception ex) | ||
| { | ||
| Main.AddLog($"ERROR: Failed to initialize TesseractService: {ex.Message}"); | ||
| _tesseractService = null; | ||
| } |
There was a problem hiding this comment.
Avoid nulling _tesseractService on init failure; it creates new null-state crash paths.
Setting _tesseractService = null here pushes failures downstream into methods that still dereference it (ProcessSnapIt, profile scan paths, test entry points). Prefer surfacing init failure explicitly (or tracking an init-failed flag) rather than mutating the service reference to null.
Based on learnings: TesseractService properly handles TesseractExceptions from CreateEngine() calls at a higher level, displaying an error popup to the user without requiring additional try/catch blocks around individual CreateEngine() calls.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@WFInfo/Ocr.cs` around lines 191 - 199, The catch block in the
_tesseractService.Init() call should not set _tesseractService = null because
callers like ProcessSnapIt, profile scan paths, and tests still dereference it;
instead remove the null assignment and either rethrow the exception or set a
dedicated init-failed flag (e.g. add a private bool _tesseractInitFailed) so
callers can explicitly check initialization state; update places that assume
non-null (e.g. ProcessSnapIt and any test entry points) to check the new flag or
handle the exception, and retain the Main.AddLog(...) call to record the error.
| internal static List<string> ProcessRewardScreenForTest(Bitmap screenshot, IWindowInfoService windowService) | ||
| { | ||
| var results = new List<string>(); | ||
| windowService.UseImage(screenshot); | ||
|
|
||
| List<Bitmap> parts; | ||
| try | ||
| { | ||
| parts = ExtractPartBoxAutomatically(out uiScaling, out _, screenshot); | ||
| } | ||
| catch (Exception e) | ||
| { | ||
| Main.AddLog("Test ProcessReward: ExtractPartBoxAutomatically failed: " + e.Message); | ||
| return results; | ||
| } | ||
|
|
||
| int engineCount = Math.Min(parts.Count, _tesseractService.Engines.Length); | ||
| string[] checks = new string[parts.Count]; | ||
| Task[] tasks = new Task[engineCount]; | ||
| for (int i = 0; i < engineCount; i++) | ||
| { | ||
| int tempI = i; | ||
| tasks[i] = Task.Factory.StartNew(() => { checks[tempI] = GetTextFromImage(parts[tempI], _tesseractService.Engines[tempI]); }); | ||
| } | ||
| Task.WaitAll(tasks); | ||
|
|
||
| // Process remaining parts sequentially if more parts than engines | ||
| for (int i = engineCount; i < parts.Count; i++) | ||
| { | ||
| checks[i] = GetTextFromImage(parts[i], _tesseractService.FirstEngine); | ||
| } |
There was a problem hiding this comment.
Test entry points should guard _tesseractService before engine access.
InitForTest() can set _tesseractService to null on init failure, but both test runners dereference _tesseractService unconditionally. Add early guard/return to avoid NullReferenceException in the test framework.
🧪 Proposed fix
internal static List<string> ProcessRewardScreenForTest(Bitmap screenshot, IWindowInfoService windowService)
{
var results = new List<string>();
+ if (_tesseractService == null)
+ {
+ Main.AddLog("Test ProcessReward: TesseractService is not initialized");
+ return results;
+ }
windowService.UseImage(screenshot);
...
}
internal static List<string> ProcessSnapItForTest(Bitmap screenshot, IWindowInfoService windowService)
{
var results = new List<string>();
+ if (_tesseractService == null)
+ {
+ Main.AddLog("Test SnapIt: TesseractService is not initialized");
+ return results;
+ }
windowService.UseImage(screenshot);
...
}Also applies to: 2954-2968
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@WFInfo/Ocr.cs` around lines 2901 - 2931, ProcessRewardScreenForTest currently
dereferences _tesseractService without checking for null (InitForTest can set it
to null); add an early guard at the top of ProcessRewardScreenForTest that
returns an empty results list if _tesseractService is null (or if
_tesseractService.Engines or FirstEngine are missing), and mirror the same
null-guard pattern in the other test entry point around lines 2954-2968;
reference symbols: ProcessRewardScreenForTest, _tesseractService, InitForTest,
GetTextFromImage, and ensure you check for _tesseractService == null (and
safe-check Engines/FirstEngine) before accessing them.
There was a problem hiding this comment.
Actionable comments posted: 10
♻️ Duplicate comments (8)
WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs (2)
224-224:⚠️ Potential issue | 🟡 MinorRemove the no-op replacement in normalization chain.
Replace('ヲ', 'ヲ')has no effect and only adds noise.Small cleanup
- result = result.Replace('ヲ', 'ヲ').Replace('ヮ', 'ワ').Replace('ヰ', 'イ').Replace('ヱ', 'エ').Replace('ヲ', 'オ'); + result = result.Replace('ヮ', 'ワ').Replace('ヰ', 'イ').Replace('ヱ', 'エ').Replace('ヲ', 'オ');🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs` at line 224, In the normalization chain inside JapaneseLanguageProcessor (the result.Replace(...) sequence), remove the no-op call Replace('ヲ','ヲ') so the chain only contains meaningful mappings (keep the intended Replace('ヲ','オ') and other replacements); update the result.Replace(...) call to eliminate the redundant 'ヲ'->'ヲ' entry.
186-190:⚠️ Potential issue | 🟡 MinorMixed-script branch is unreachable.
if (hasJapanese) return false;makesif (hasJapanese && hasLatin)dead code.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs` around lines 186 - 190, The first check "if (hasJapanese) return false;" makes the subsequent mixed-script branch "if (hasJapanese && hasLatin)" unreachable; update the logic in JapaneseLanguageProcessor so the first condition only handles Japanese-only text (e.g., change it to check hasJapanese && !hasLatin) and keep the mixed-script branch to handle hasJapanese && hasLatin; ensure you modify the conditions around the hasJapanese and hasLatin checks rather than removing the mixed branch.WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs (2)
137-142:⚠️ Potential issue | 🟡 MinorMixed Thai-Latin branch is unreachable.
if (hasThai) return false;makesif (hasThai && hasLatin)dead code.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs` around lines 137 - 142, The first if-statement makes the mixed Thai-Latin branch unreachable; inside the ThaiLanguageProcessor (where hasThai and hasLatin are evaluated) change the conditions so they are mutually exclusive — e.g., replace the initial check with a Thai-only condition (hasThai && !hasLatin) to return false, then keep the mixed case (hasThai && hasLatin) as the subsequent branch, or alternatively merge them into a single hasThai check if both should behave the same; update the ifs around hasThai/hasLatin so the mixed branch is reachable.
196-207:⚠️ Potential issue | 🟠 MajorCascading replacements in
NormalizeThaiCharacterscollapse distinct letters.
Replace('ซ','ศ').Replace('ศ','ษ')maps bothซandศtoษ, which can increase false positives.Safer one-pass canonicalization pattern
- result = result.Replace('ซ', 'ศ').Replace('ศ', 'ษ'); // so variations normalization + var sb = new StringBuilder(result.Length); + foreach (char c in result) + { + sb.Append(c switch + { + 'ซ' => 'ศ', + 'ศ' => 'ศ', + 'ษ' => 'ศ', + _ => c + }); + } + result = sb.ToString();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs` around lines 196 - 207, The chained Replace calls in NormalizeThaiCharacters (e.g., result = result.Replace('ซ','ศ').Replace('ศ','ษ')) cause cascading remaps that collapse distinct letters; replace this pattern with a single-pass canonicalization using a mapping table/dictionary applied per character (or a Regex.Replace with a callback) so each original character is mapped once to its canonical counterpart (handle 'ซ','ศ','ษ' and other pairs via explicit map entries), and remove the chained Replace sequences in NormalizeThaiCharacters to prevent unintended transitive replacements.WFInfo/Main.cs (1)
266-275:⚠️ Potential issue | 🟠 Major
RunOnUIThreadstill allows lifecycle races via callsite lambdas.The helper captures
mw, but callsites still execute actions that dereferenceMainWindow.INSTANCElater. Pass the captured window into the delegate to avoid re-reading global state after lifecycle changes.Suggested refactor
- public static void RunOnUIThread(Action act) + public static void RunOnUIThread(Action<MainWindow> act) { - var mw = MainWindow.INSTANCE; - if (mw?.Dispatcher != null && !mw.Dispatcher.HasShutdownStarted && !mw.Dispatcher.HasShutdownFinished) + var mw = MainWindow.INSTANCE; + var dispatcher = mw?.Dispatcher; + if (dispatcher != null && !dispatcher.HasShutdownStarted && !dispatcher.HasShutdownFinished) { - if (mw.Dispatcher.CheckAccess()) - act(); + if (dispatcher.CheckAccess()) + act(mw); else - mw.Dispatcher.Invoke(act); + dispatcher.Invoke(() => act(mw)); } } - RunOnUIThread(() => { MainWindow.INSTANCE.ChangeStatus(message, severity); }); + RunOnUIThread(w => w.ChangeStatus(message, severity)); - RunOnUIThread(() => { MainWindow.INSTANCE.LoggedIn(); }); + RunOnUIThread(w => w.LoggedIn()); - RunOnUIThread(() => { MainWindow.INSTANCE.SignOut(); }); + RunOnUIThread(w => w.SignOut());#!/bin/bash # Verify current callsites still dereference MainWindow.INSTANCE inside RunOnUIThread lambdas. rg -nP 'RunOnUIThread\s*\(\s*\(\)\s*=>\s*\{\s*MainWindow\.INSTANCE\.' WFInfo/Main.cs -C2Also applies to: 311-311, 553-553, 623-623
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Main.cs` around lines 266 - 275, RunOnUIThread currently captures MainWindow.INSTANCE but still lets callers dereference MainWindow.INSTANCE inside their lambdas, creating lifecycle races; change RunOnUIThread to pass the captured window into the delegate to avoid re-reading the global state—update the method signature from Action act to Action<MainWindow> act (or alternately keep Action and internally invoke a wrapper that calls actWithWindow(mw)), ensure the implementation uses the captured mw when invoking (CheckAccess/Invoke) and update all callsites (those that reference MainWindow.INSTANCE inside their lambdas) to accept the MainWindow parameter and use that captured instance instead of re-reading MainWindow.INSTANCE.WFInfo/Ocr.cs (2)
195-199:⚠️ Potential issue | 🟠 MajorDon’t null
_tesseractServiceon init failure.Line [198] and Line [3008] convert initialization errors into a latent null-state that later paths still dereference.
🛠️ Proposed direction
+ private static bool _tesseractInitFailed; @@ catch (Exception ex) { Main.AddLog($"ERROR: Failed to initialize TesseractService: {ex.Message}"); - _tesseractService = null; + _tesseractInitFailed = true; } @@ catch (Exception ex) { Main.AddLog($"ERROR: Failed to initialize TesseractService in test mode: {ex.Message}"); - _tesseractService = null; + _tesseractInitFailed = true; }Then gate OCR entry points with
_tesseractInitFailed(or rethrow) instead of relying on null.Based on learnings: TesseractService properly handles TesseractExceptions from CreateEngine() calls at a higher level, displaying an error popup to the user without requiring additional try/catch blocks around individual CreateEngine() calls.
Also applies to: 3005-3009
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Ocr.cs` around lines 195 - 199, The catch block that handles TesseractService initialization should not convert the failure into a latent null-state by setting _tesseractService = null; instead record the failure via a boolean flag (e.g., _tesseractInitFailed) or rethrow so callers can handle it explicitly; update the initialization catch (around TesseractService creation) to set _tesseractInitFailed = true (and log the error) rather than nulling _tesseractService, and add guards at OCR entry points that currently dereference _tesseractService to check _tesseractInitFailed (or handle the rethrown exception) before using TesseractService.
111-115:⚠️ Potential issue | 🟠 MajorUse injected settings for CJK locale detection.
Line [113] reads locale from global static settings, which can drift from
_settingsin headless/test flows and apply wrong CJK heuristics.💡 Proposed fix
private static bool IsCJKLocale() { - var locale = ApplicationSettings.GlobalReadonlySettings.Locale; + var locale = _settings?.Locale ?? ApplicationSettings.GlobalReadonlySettings?.Locale; + if (string.IsNullOrEmpty(locale)) + return false; return locale == "ko" || locale == "zh-hans" || locale == "zh-hant" || locale == "ja"; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Ocr.cs` around lines 111 - 115, IsCJKLocale currently reads ApplicationSettings.GlobalReadonlySettings.Locale (a static source) which can diverge from the injected _settings used in headless/tests; change IsCJKLocale to use the injected settings by either making it an instance method that reads _settings.Locale or add a parameter (locale) and have callers pass _settings.Locale. Update the IsCJKLocale implementation to check _settings.Locale (or the passed-in locale) for "ko", "zh-hans", "zh-hant", or "ja" and remove the reliance on ApplicationSettings.GlobalReadonlySettings.Locale so tests and headless flows use the injected configuration.WFInfo/Services/TesseractService.cs (1)
174-176:⚠️ Potential issue | 🔴 Critical
CreateEngine()now depends on factory init, but constructor creates engines first.Line [174] can throw
InvalidOperationExceptionifLanguageProcessorFactory.Initialize(...)has not run yet, and this path executes during constructor engine creation.🛡️ Safer fallback in engine creation
- var processor = LanguageProcessorFactory.GetProcessor(Locale); + LanguageProcessor processor = null; + try + { + processor = LanguageProcessorFactory.GetProcessor(Locale); + } + catch (InvalidOperationException) + { + Main.AddLog($"LanguageProcessorFactory not initialized for locale '{Locale}', using fallback whitelist"); + } var whitelist = processor?.CharacterWhitelist ?? DefaultWhitelist; engine.SetVariable("tessedit_char_whitelist", whitelist);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Services/TesseractService.cs` around lines 174 - 176, CreateEngine() currently calls LanguageProcessorFactory.GetProcessor(Locale) which can throw InvalidOperationException if the factory hasn't been initialized yet; wrap the GetProcessor call in a try/catch (catch InvalidOperationException) or otherwise guard with a factory-initialized check, and fall back to DefaultWhitelist when the factory is not available so constructor engine creation cannot throw; update the CreateEngine method to use the safe fallback (use processor?.CharacterWhitelist ?? DefaultWhitelist) only after catching/avoiding the factory exception and then call engine.SetVariable("tessedit_char_whitelist", whitelist).
🧹 Nitpick comments (5)
WFInfo/WFInfo.csproj (1)
135-135: Generateupdate.xmlonly for Release builds.This target currently runs for Debug too. Limiting it to Release reduces accidental non-release artifacts and keeps local build outputs cleaner.
♻️ Proposed tweak
- <Target Name="GenerateUpdateXml" AfterTargets="Build"> + <Target Name="GenerateUpdateXml" AfterTargets="Build" Condition="'$(Configuration)'=='Release'">🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/WFInfo.csproj` at line 135, The GenerateUpdateXml MSBuild target (Target Name="GenerateUpdateXml") currently runs for all configurations; restrict it to Release-only by adding a configuration condition to the target (e.g., add Condition="'$(Configuration)' == 'Release'" on the Target element) so the GenerateUpdateXml target executes only when building the Release configuration.WFInfo/LanguageProcessing/LanguageProcessor.cs (1)
34-39: Harden blueprint-removal cache init against null arrays/entries.A single processor returning
null(array or element) can throw during base construction and break processor creation.Suggested guardrails
_normalizedBlueprintRemovalsCache.GetOrAdd(concreteType, type => { - var blueprintRemovals = BlueprintRemovals; + var blueprintRemovals = BlueprintRemovals ?? Array.Empty<string>(); var normalized = new string[blueprintRemovals.Length]; for (int i = 0; i < blueprintRemovals.Length; i++) { - normalized[i] = blueprintRemovals[i].ToLowerInvariant(); + normalized[i] = (blueprintRemovals[i] ?? string.Empty).ToLowerInvariant(); } return normalized; });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/LanguageProcessor.cs` around lines 34 - 39, The blueprint-removal cache init fails if BlueprintRemovals or any element is null; in the LanguageProcessor constructor (the block using BlueprintRemovals and building normalized), guard against null by treating BlueprintRemovals as an empty array when null and replace null elements with String.Empty (or skip them) before calling ToLowerInvariant; update the code that creates normalized (referencing BlueprintRemovals, normalized, and the for-loop index i) to null-check the source array and each entry so construction cannot throw NullReferenceException.WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs (1)
23-81: Deduplicate repeated transliteration entries inkoreanReplacements.At least one mapping appears more than once (
"류" -> "ryu"), which adds unnecessary replace passes and maintenance noise.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs` around lines 23 - 81, The koreanReplacements list in KoreanLanguageProcessor.cs contains duplicate transliteration entries (e.g. "류" -> "ryu") which cause redundant replacements and maintenance noise; remove the duplicate(s) from the static List<KeyValuePair<string,string>> koreanReplacements so each Hangul key appears only once (keep the intended canonical mapping), and scan the list in the KoreanLanguageProcessor class for any other repeated keys to deduplicate them. Ensure only a single KeyValuePair for "류" -> "ryu" remains and preserve existing order/behavior otherwise.WFInfo/Services/TesseractService.cs (1)
247-247: DisposeWebClientingetLocaleTessdata().Line [247] allocates
WebClientwithout disposal; repeated reload paths should not leak handles/sockets.♻️ Proposed fix
- WebClient webClient = CustomEntrypoint.CreateNewWebClient(); + using WebClient webClient = CustomEntrypoint.CreateNewWebClient();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Services/TesseractService.cs` at line 247, getLocaleTessdata is creating a WebClient via CustomEntrypoint.CreateNewWebClient() without disposing it, which leaks handles; wrap the creation in a using statement (or ensure Dispose in a try/finally) so the WebClient is disposed after use, e.g., replace the raw allocation with a using (var webClient = CustomEntrypoint.CreateNewWebClient()) { ... } pattern inside getLocaleTessdata to guarantee proper resource cleanup.WFInfo/Data.cs (1)
879-883: Reduce lock scope around localized-name matching.
processor.GetLocalizedNameData(...)can scan many items and perform distance calculations; holdingmarketItemsLockfor that full duration increases contention. Snapshot under lock, then compute outside.♻️ Proposed refactor
public string GetLocaleNameData(string s, bool useLevenshtein) { var processor = LanguageProcessorFactory.GetCurrentProcessor(); + JObject marketItemsSnapshot; lock (marketItemsLock) { - return processor.GetLocalizedNameData(s, marketItems, useLevenshtein); + marketItemsSnapshot = marketItems; } + return processor.GetLocalizedNameData(s, marketItemsSnapshot, useLevenshtein); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WFInfo/Data.cs` around lines 879 - 883, The code currently holds marketItemsLock while calling LanguageProcessorFactory.GetCurrentProcessor() and processor.GetLocalizedNameData(s, marketItems, useLevenshtein), which can be long-running; instead take a short snapshot of the shared data under marketItemsLock (e.g., copy marketItems into a local immutable/list/array) and then release the lock and call processor.GetLocalizedNameData using that snapshot (keep calling LanguageProcessorFactory.GetCurrentProcessor() outside the lock if needed), ensuring only the minimal read/copy is protected by marketItemsLock.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@TestCharacterRanges.cs`:
- Around line 53-57: The TestApplicationSettings class currently only implements
Locale but must implement all members of IReadOnlyApplicationSettings; either
implement every required property/method on TestApplicationSettings (matching
the interface signatures) by adding the missing properties like ActivationKey,
ActivationKeyKey, ActivationMouseButton, Auto, AutoCSV, AutoCount, AutoDelay,
AutoList, CF_pBMax, CF_pBMin, etc., providing appropriate test-friendly default
values or auto-properties, or change TestApplicationSettings to be abstract so
it does not need to supply implementations; ensure each member name and type
exactly matches the interface contracts and that any methods/properties required
by IReadOnlyApplicationSettings are present (you can use simple defaults or
throw NotImplementedException for members not used in tests).
- Line 12: The method TestCharacterRanges() conflicts with the enclosing class
TestCharacterRanges because in C# a method cannot have the same name as its
class; rename the method (e.g., to RunTestCharacterRanges or
ExecuteTestCharacterRanges) and update any callers, or if you intended a
constructor remove the return type and make it parameterless constructor; modify
the method signature referenced as TestCharacterRanges() accordingly and ensure
references are updated to the new method name.
In `@WFInfo/Data.cs`:
- Around line 466-468: The fallback path currently reads and deserializes JSON
into JObject (response → data) and returns it without validating its shape;
update the fallback branch in the method that uses localeSpecificFallbackPath to
run the same payload validation routine you use for the online payload (e.g.,
reuse the existing ValidatePayload/ValidateResponse logic or parsing into the
same DTO/type) before returning (data, true), and do not catch/desuppress
deserialization or validation exceptions — allow them to bubble up so malformed
fallback JSON fails loudly.
In `@WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs`:
- Line 226: The CharacterWhitelist property is missing the space character
causing multi-word OCR results to collapse; update the overridden
CharacterWhitelist string in the EuropeanLanguageProcessor (the
CharacterWhitelist property) to include a space (' ') alongside the existing
characters so Italian text preserves word boundaries, matching the other
European processors' whitelists.
In `@WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs`:
- Around line 87-129: GetJapaneseCharacterDifference currently allocates
hiraganaKatakanaPairs and similarChars on every call, causing GC pressure; move
these into class-level static readonly fields (e.g., static readonly
Dictionary<char,char> HiraganaKatakanaPairs and static readonly
HashSet<(char,char)> or static readonly (char,char)[] SimilarCharPairs) and
initialize them once (static constructor or inline). Update
GetJapaneseCharacterDifference to reference those static fields (use TryGetValue
on HiraganaKatakanaPairs and check membership in SimilarCharPairs without
creating new arrays) so no per-call allocations occur.
In `@WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs`:
- Around line 25-30: CalculateLevenshteinDistance currently calls
ContainsThai(s) and ContainsThai(t) without guarding against null; update
CalculateLevenshteinDistance to null-check inputs (s and t) before calling
ContainsThai and handle nulls appropriately (e.g., treat null as empty string or
short-circuit to default behavior), and apply the same null-safety change to the
other affected method around lines 150-156 that also calls ContainsThai; ensure
the checks are added where ContainsThai is invoked so ContainsThai is never
iterated over a null value.
In `@WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs`:
- Around line 56-72: NormalizeTurkishCharacters currently calls ToLowerInvariant
which makes the subsequent uppercase Replace(...) calls unreachable and also
mishandles 'İ' (U+0130) by producing an "i" plus a combining dot; fix by
applying Turkish-aware casing and explicit I/İ handling before ASCII-folding: in
NormalizeTurkishCharacters first map 'İ'→'i' and 'I'→'ı' (or call ToLower with
new CultureInfo("tr-TR")) so Unicode special-case mapping for Turkish I is
applied, then perform the ASCII-fold replacements for 'ğ','ş','ç','ö','ü' and
finally map 'ı'→'i'; remove the now-unnecessary uppercase Replace(...) calls.
In `@WFInfo/Main.cs`:
- Around line 98-104: The dispatcher call in UpdateMarketStatusAsync using
MainWindow.INSTANCE.Dispatcher must be guarded against shutdown: before calling
disp.InvokeAsync(() => wnd.UpdateMarketStatus(msg)) check disp != null &&
!disp.HasShutdownStarted && !disp.HasShutdownFinished and ensure wnd is not
null, then await the returned DispatcherOperation.Task and wrap the await in a
try/catch that catches TaskCanceledException and OperationCanceledException (and
optionally InvalidOperationException) so exceptions during dispatcher shutdown
are observed/handled instead of being silently lost; reference
MainWindow.INSTANCE, Dispatcher, InvokeAsync, and UpdateMarketStatus when
applying this change.
In `@WFInfo/Tests/OCRTestRunner.cs`:
- Around line 133-138: After deserializing into TestCase, guard against testCase
being null: check if testCase == null right after
JsonConvert.DeserializeObject<TestCase>(...) and handle it explicitly (for
example, throw a descriptive exception or set the runner result to a
failed/error state and return) instead of dereferencing; ensure you reference
the same symbols used in the diff (testCase, result.Language, result.Theme,
result.Category, result.ExpectedParts, and testCase.Parts) so the code either
assigns defaults or aborts cleanly when the JSON is malformed.
In `@WFInfo/WFInfo.csproj`:
- Around line 50-52: The project currently keeps UpdateAssemblyVersion in
WFInfo.csproj in sync with AssemblyVersion in Properties/AssemblyInfo.cs, which
risks drift; remove the hardcoded <UpdateAssemblyVersion> entry and instead
derive the updater version by running the MSBuild GetAssemblyIdentity task
against the built assembly (the output assembly from the project) and assign its
Version metadata to the property used when generating update.xml; update the
MSBuild target that writes update.xml to use the property produced by
GetAssemblyIdentity (referencing the project output assembly and the
GetAssemblyIdentity task) so the value written to update.xml always matches the
compiled AssemblyVersion declared in AssemblyInfo.cs.
---
Duplicate comments:
In `@WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs`:
- Line 224: In the normalization chain inside JapaneseLanguageProcessor (the
result.Replace(...) sequence), remove the no-op call Replace('ヲ','ヲ') so the
chain only contains meaningful mappings (keep the intended Replace('ヲ','オ') and
other replacements); update the result.Replace(...) call to eliminate the
redundant 'ヲ'->'ヲ' entry.
- Around line 186-190: The first check "if (hasJapanese) return false;" makes
the subsequent mixed-script branch "if (hasJapanese && hasLatin)" unreachable;
update the logic in JapaneseLanguageProcessor so the first condition only
handles Japanese-only text (e.g., change it to check hasJapanese && !hasLatin)
and keep the mixed-script branch to handle hasJapanese && hasLatin; ensure you
modify the conditions around the hasJapanese and hasLatin checks rather than
removing the mixed branch.
In `@WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs`:
- Around line 137-142: The first if-statement makes the mixed Thai-Latin branch
unreachable; inside the ThaiLanguageProcessor (where hasThai and hasLatin are
evaluated) change the conditions so they are mutually exclusive — e.g., replace
the initial check with a Thai-only condition (hasThai && !hasLatin) to return
false, then keep the mixed case (hasThai && hasLatin) as the subsequent branch,
or alternatively merge them into a single hasThai check if both should behave
the same; update the ifs around hasThai/hasLatin so the mixed branch is
reachable.
- Around line 196-207: The chained Replace calls in NormalizeThaiCharacters
(e.g., result = result.Replace('ซ','ศ').Replace('ศ','ษ')) cause cascading remaps
that collapse distinct letters; replace this pattern with a single-pass
canonicalization using a mapping table/dictionary applied per character (or a
Regex.Replace with a callback) so each original character is mapped once to its
canonical counterpart (handle 'ซ','ศ','ษ' and other pairs via explicit map
entries), and remove the chained Replace sequences in NormalizeThaiCharacters to
prevent unintended transitive replacements.
In `@WFInfo/Main.cs`:
- Around line 266-275: RunOnUIThread currently captures MainWindow.INSTANCE but
still lets callers dereference MainWindow.INSTANCE inside their lambdas,
creating lifecycle races; change RunOnUIThread to pass the captured window into
the delegate to avoid re-reading the global state—update the method signature
from Action act to Action<MainWindow> act (or alternately keep Action and
internally invoke a wrapper that calls actWithWindow(mw)), ensure the
implementation uses the captured mw when invoking (CheckAccess/Invoke) and
update all callsites (those that reference MainWindow.INSTANCE inside their
lambdas) to accept the MainWindow parameter and use that captured instance
instead of re-reading MainWindow.INSTANCE.
In `@WFInfo/Ocr.cs`:
- Around line 195-199: The catch block that handles TesseractService
initialization should not convert the failure into a latent null-state by
setting _tesseractService = null; instead record the failure via a boolean flag
(e.g., _tesseractInitFailed) or rethrow so callers can handle it explicitly;
update the initialization catch (around TesseractService creation) to set
_tesseractInitFailed = true (and log the error) rather than nulling
_tesseractService, and add guards at OCR entry points that currently dereference
_tesseractService to check _tesseractInitFailed (or handle the rethrown
exception) before using TesseractService.
- Around line 111-115: IsCJKLocale currently reads
ApplicationSettings.GlobalReadonlySettings.Locale (a static source) which can
diverge from the injected _settings used in headless/tests; change IsCJKLocale
to use the injected settings by either making it an instance method that reads
_settings.Locale or add a parameter (locale) and have callers pass
_settings.Locale. Update the IsCJKLocale implementation to check
_settings.Locale (or the passed-in locale) for "ko", "zh-hans", "zh-hant", or
"ja" and remove the reliance on
ApplicationSettings.GlobalReadonlySettings.Locale so tests and headless flows
use the injected configuration.
In `@WFInfo/Services/TesseractService.cs`:
- Around line 174-176: CreateEngine() currently calls
LanguageProcessorFactory.GetProcessor(Locale) which can throw
InvalidOperationException if the factory hasn't been initialized yet; wrap the
GetProcessor call in a try/catch (catch InvalidOperationException) or otherwise
guard with a factory-initialized check, and fall back to DefaultWhitelist when
the factory is not available so constructor engine creation cannot throw; update
the CreateEngine method to use the safe fallback (use
processor?.CharacterWhitelist ?? DefaultWhitelist) only after catching/avoiding
the factory exception and then call
engine.SetVariable("tessedit_char_whitelist", whitelist).
---
Nitpick comments:
In `@WFInfo/Data.cs`:
- Around line 879-883: The code currently holds marketItemsLock while calling
LanguageProcessorFactory.GetCurrentProcessor() and
processor.GetLocalizedNameData(s, marketItems, useLevenshtein), which can be
long-running; instead take a short snapshot of the shared data under
marketItemsLock (e.g., copy marketItems into a local immutable/list/array) and
then release the lock and call processor.GetLocalizedNameData using that
snapshot (keep calling LanguageProcessorFactory.GetCurrentProcessor() outside
the lock if needed), ensuring only the minimal read/copy is protected by
marketItemsLock.
In `@WFInfo/LanguageProcessing/KoreanLanguageProcessor.cs`:
- Around line 23-81: The koreanReplacements list in KoreanLanguageProcessor.cs
contains duplicate transliteration entries (e.g. "류" -> "ryu") which cause
redundant replacements and maintenance noise; remove the duplicate(s) from the
static List<KeyValuePair<string,string>> koreanReplacements so each Hangul key
appears only once (keep the intended canonical mapping), and scan the list in
the KoreanLanguageProcessor class for any other repeated keys to deduplicate
them. Ensure only a single KeyValuePair for "류" -> "ryu" remains and preserve
existing order/behavior otherwise.
In `@WFInfo/LanguageProcessing/LanguageProcessor.cs`:
- Around line 34-39: The blueprint-removal cache init fails if BlueprintRemovals
or any element is null; in the LanguageProcessor constructor (the block using
BlueprintRemovals and building normalized), guard against null by treating
BlueprintRemovals as an empty array when null and replace null elements with
String.Empty (or skip them) before calling ToLowerInvariant; update the code
that creates normalized (referencing BlueprintRemovals, normalized, and the
for-loop index i) to null-check the source array and each entry so construction
cannot throw NullReferenceException.
In `@WFInfo/Services/TesseractService.cs`:
- Line 247: getLocaleTessdata is creating a WebClient via
CustomEntrypoint.CreateNewWebClient() without disposing it, which leaks handles;
wrap the creation in a using statement (or ensure Dispose in a try/finally) so
the WebClient is disposed after use, e.g., replace the raw allocation with a
using (var webClient = CustomEntrypoint.CreateNewWebClient()) { ... } pattern
inside getLocaleTessdata to guarantee proper resource cleanup.
In `@WFInfo/WFInfo.csproj`:
- Line 135: The GenerateUpdateXml MSBuild target (Target
Name="GenerateUpdateXml") currently runs for all configurations; restrict it to
Release-only by adding a configuration condition to the target (e.g., add
Condition="'$(Configuration)' == 'Release'" on the Target element) so the
GenerateUpdateXml target executes only when building the Release configuration.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (21)
TestCharacterRanges.csWFInfo/CustomEntrypoint.csWFInfo/Data.csWFInfo/LanguageProcessing/ChineseLanguageProcessor.csWFInfo/LanguageProcessing/EuropeanLanguageProcessor.csWFInfo/LanguageProcessing/JapaneseLanguageProcessor.csWFInfo/LanguageProcessing/KoreanLanguageProcessor.csWFInfo/LanguageProcessing/LanguageProcessor.csWFInfo/LanguageProcessing/PolishLanguageProcessor.csWFInfo/LanguageProcessing/ThaiLanguageProcessor.csWFInfo/LanguageProcessing/TurkishLanguageProcessor.csWFInfo/Main.csWFInfo/Ocr.csWFInfo/Services/TesseractService.csWFInfo/Tests/OCRTestRunner.csWFInfo/Tests/TestModels.csWFInfo/Tests/TestProgram.csWFInfo/WFInfo.csprojWFInfo/errorDialogue.xaml.cstests/README.mdtests/run_tests.bat
🚧 Files skipped from review as they are similar to previous changes (4)
- WFInfo/errorDialogue.xaml.cs
- WFInfo/LanguageProcessing/PolishLanguageProcessor.cs
- tests/README.md
- WFInfo/Tests/TestModels.cs
| /// </summary> | ||
| public class TestCharacterRanges | ||
| { | ||
| public static void TestCharacterRanges() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -e
python3 - <<'PY'
import pathlib, re
files = list(pathlib.Path('.').rglob('TestCharacterRanges.cs'))
if files:
p = files[0]
src = p.read_text(encoding='utf-8', errors='ignore')
cls = re.search(r'\bclass\s+(\w+)\b', src)
if cls:
cls_name = cls.group(1)
# Look for method with same name as class (non-constructor pattern)
m = re.search(rf'\b(public|private|internal|protected)\s+static\s+\w+\s+{cls_name}\s*\(', src)
print(f"file={p}")
print(f"class={cls_name}")
print(f"has_conflicting_member={bool(m)}")
if m:
line = src.count('\n', 0, m.start()) + 1
print(f"conflict_line={line}")
# Show context around the conflict
lines = src.split('\n')
start = max(0, line - 2)
end = min(len(lines), line + 1)
print(f"context:")
for i in range(start, end):
print(f" {i+1}: {lines[i]}")
else:
print("No class found")
else:
print("TestCharacterRanges.cs not found")
PYRepository: WFCD/WFinfo
Length of output: 245
Rename TestCharacterRanges(); current signature is a compile error.
Line 12 declares a method with the same name as its enclosing class with a return type, which is invalid in C# (CS0542). In C#, only constructors may share the class name, and they cannot have explicit return types.
🛠️ Proposed fix
- public static void TestCharacterRanges()
+ public static void RunCharacterRangeTests()
{
Console.WriteLine("Testing character range generation...");📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| public static void TestCharacterRanges() | |
| public static void RunCharacterRangeTests() | |
| { | |
| Console.WriteLine("Testing character range generation..."); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@TestCharacterRanges.cs` at line 12, The method TestCharacterRanges()
conflicts with the enclosing class TestCharacterRanges because in C# a method
cannot have the same name as its class; rename the method (e.g., to
RunTestCharacterRanges or ExecuteTestCharacterRanges) and update any callers, or
if you intended a constructor remove the return type and make it parameterless
constructor; modify the method signature referenced as TestCharacterRanges()
accordingly and ensure references are updated to the new method name.
| public class TestApplicationSettings : IReadOnlyApplicationSettings | ||
| { | ||
| public string Locale => "en"; | ||
| // Add other required properties as needed | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -e
python - <<'PY'
import pathlib, re
iface_file = None
iface_src = None
for p in pathlib.Path('.').rglob('*.cs'):
s = p.read_text(encoding='utf-8', errors='ignore')
if 'interface IReadOnlyApplicationSettings' in s:
iface_file, iface_src = p, s
break
test_file = next(pathlib.Path('.').rglob('TestCharacterRanges.cs'))
test_src = test_file.read_text(encoding='utf-8', errors='ignore')
print(f"interface_file={iface_file}")
print(f"test_file={test_file}")
iface_members = set(re.findall(r'\b(?:string|int|bool|double|float|decimal|long|short|byte|object|[A-Za-z_][A-Za-z0-9_<>,\[\]\?]*)\s+([A-Za-z_][A-Za-z0-9_]*)\s*\{\s*get;\s*\}', iface_src))
impl_match = re.search(r'class\s+TestApplicationSettings\s*:\s*IReadOnlyApplicationSettings\s*\{([\s\S]*?)\n\s*\}', test_src)
impl_body = impl_match.group(1) if impl_match else ''
impl_members = set(re.findall(r'\b([A-Za-z_][A-Za-z0-9_]*)\s*=>', impl_body))
impl_members |= set(re.findall(r'\b([A-Za-z_][A-Za-z0-9_]*)\s*\{\s*get;', impl_body))
missing = sorted(iface_members - impl_members)
print(f"interface_member_count={len(iface_members)}")
print(f"implemented_member_count={len(impl_members)}")
print(f"missing_sample={missing[:20]}")
PYRepository: WFCD/WFinfo
Length of output: 500
TestApplicationSettings does not fully implement IReadOnlyApplicationSettings.
The interface defines 76 members, but only Locale is implemented. Missing 75 members including: ActivationKey, ActivationKeyKey, ActivationMouseButton, Auto, AutoCSV, AutoCount, AutoDelay, AutoList, CF_pBMax, CF_pBMin, and others. This is a compile-blocking error—all interface members must be implemented.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@TestCharacterRanges.cs` around lines 53 - 57, The TestApplicationSettings
class currently only implements Locale but must implement all members of
IReadOnlyApplicationSettings; either implement every required property/method on
TestApplicationSettings (matching the interface signatures) by adding the
missing properties like ActivationKey, ActivationKeyKey, ActivationMouseButton,
Auto, AutoCSV, AutoCount, AutoDelay, AutoList, CF_pBMax, CF_pBMin, etc.,
providing appropriate test-friendly default values or auto-properties, or change
TestApplicationSettings to be abstract so it does not need to supply
implementations; ensure each member name and type exactly matches the interface
contracts and that any methods/properties required by
IReadOnlyApplicationSettings are present (you can use simple defaults or throw
NotImplementedException for members not used in tests).
| string response = File.ReadAllText(localeSpecificFallbackPath); | ||
| JObject data = JsonConvert.DeserializeObject<JObject>(response); | ||
| return (data, true); |
There was a problem hiding this comment.
Validate fallback payload shape before returning cached data.
If fallback JSON is malformed or deserializes to an unexpected shape, this currently returns bad data and fails later with less actionable errors. Validate fallback payload the same way as the online payload before returning.
🛠️ Proposed fix
if (File.Exists(localeSpecificFallbackPath))
{
string response = File.ReadAllText(localeSpecificFallbackPath);
JObject data = JsonConvert.DeserializeObject<JObject>(response);
+ if (!(data?["data"] is JArray))
+ {
+ throw new InvalidDataException(
+ $"Invalid fallback payload structure in '{localeSpecificFallbackPath}' for locale '{locale}'.");
+ }
return (data, true);
}Based on learnings: In WFInfo's Data.cs, D1firehail prefers letting JSON deserialization exceptions bubble up rather than gracefully degrading with try-catch blocks.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@WFInfo/Data.cs` around lines 466 - 468, The fallback path currently reads and
deserializes JSON into JObject (response → data) and returns it without
validating its shape; update the fallback branch in the method that uses
localeSpecificFallbackPath to run the same payload validation routine you use
for the online payload (e.g., reuse the existing
ValidatePayload/ValidateResponse logic or parsing into the same DTO/type) before
returning (data, true), and do not catch/desuppress deserialization or
validation exceptions — allow them to bubble up so malformed fallback JSON fails
loudly.
|
|
||
| public override string[] BlueprintRemovals => new[] { "Progetto", "Piano" }; | ||
|
|
||
| public override string CharacterWhitelist => "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-()" + |
There was a problem hiding this comment.
Add space to Italian CharacterWhitelist.
Line [226] is missing ' ', unlike the other European processors. This can collapse multi-word OCR output and hurt matching quality.
🔤 Proposed fix
- public override string CharacterWhitelist => "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-()" +
+ public override string CharacterWhitelist => "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-() " +
GenerateCharacterRange(0x00C0, 0x00C0) + // À
GenerateCharacterRange(0x00C8, 0x00C8) + // È
GenerateCharacterRange(0x00C9, 0x00C9) + // É📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| public override string CharacterWhitelist => "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-()" + | |
| public override string CharacterWhitelist => "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-() " + | |
| GenerateCharacterRange(0x00C0, 0x00C0) + // À | |
| GenerateCharacterRange(0x00C8, 0x00C8) + // È | |
| GenerateCharacterRange(0x00C9, 0x00C9) + // É |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@WFInfo/LanguageProcessing/EuropeanLanguageProcessor.cs` at line 226, The
CharacterWhitelist property is missing the space character causing multi-word
OCR results to collapse; update the overridden CharacterWhitelist string in the
EuropeanLanguageProcessor (the CharacterWhitelist property) to include a space
(' ') alongside the existing characters so Italian text preserves word
boundaries, matching the other European processors' whitelists.
| private int GetJapaneseCharacterDifference(char a, char b) | ||
| { | ||
| if (a == b) return 0; | ||
|
|
||
| // Hiragana-Katakana equivalents (lower cost for similar characters) | ||
| var hiraganaKatakanaPairs = new Dictionary<char, char> | ||
| { | ||
| {'あ', 'ア'}, {'い', 'イ'}, {'う', 'ウ'}, {'え', 'エ'}, {'お', 'オ'}, | ||
| {'か', 'カ'}, {'き', 'キ'}, {'く', 'ク'}, {'け', 'ケ'}, {'こ', 'コ'}, | ||
| {'が', 'ガ'}, {'ぎ', 'ギ'}, {'ぐ', 'グ'}, {'げ', 'ゲ'}, {'ご', 'ゴ'}, | ||
| {'さ', 'サ'}, {'し', 'シ'}, {'す', 'ス'}, {'せ', 'セ'}, {'そ', 'ソ'}, | ||
| {'ざ', 'ザ'}, {'じ', 'ジ'}, {'ず', 'ズ'}, {'ぜ', 'ゼ'}, {'ぞ', 'ゾ'}, | ||
| {'た', 'タ'}, {'ち', 'チ'}, {'つ', 'ツ'}, {'て', 'テ'}, {'と', 'ト'}, | ||
| {'だ', 'ダ'}, {'ぢ', 'ヂ'}, {'づ', 'ヅ'}, {'で', 'デ'}, {'ど', 'ド'}, | ||
| {'な', 'ナ'}, {'に', 'ニ'}, {'ぬ', 'ヌ'}, {'ね', 'ネ'}, {'の', 'ノ'}, | ||
| {'は', 'ハ'}, {'ひ', 'ヒ'}, {'ふ', 'フ'}, {'へ', 'ヘ'}, {'ほ', 'ホ'}, | ||
| {'ば', 'バ'}, {'び', 'ビ'}, {'ぶ', 'ブ'}, {'べ', 'ベ'}, {'ぼ', 'ボ'}, | ||
| {'ぱ', 'パ'}, {'ぴ', 'ピ'}, {'ぷ', 'プ'}, {'ぺ', 'ペ'}, {'ぽ', 'ポ'}, | ||
| {'ま', 'マ'}, {'み', 'ミ'}, {'む', 'ム'}, {'め', 'メ'}, {'も', 'モ'}, | ||
| {'や', 'ヤ'}, {'ゆ', 'ユ'}, {'よ', 'ヨ'}, | ||
| {'ら', 'ラ'}, {'り', 'リ'}, {'る', 'ル'}, {'れ', 'レ'}, {'ろ', 'ロ'}, | ||
| {'わ', 'ワ'}, {'ゐ', 'ヰ'}, {'ゑ', 'ヱ'}, {'を', 'ヲ'}, {'ん', 'ン'}, | ||
| {'っ', 'ッ'}, {'ゃ', 'ャ'}, {'ゅ', 'ュ'}, {'ょ', 'ョ'} | ||
| }; | ||
|
|
||
| // Check if characters are hiragana-katakana equivalents | ||
| if (hiraganaKatakanaPairs.TryGetValue(a, out var katakanaEquiv) && katakanaEquiv == b) | ||
| return 1; // Low cost for hiragana-katakana equivalents | ||
| if (hiraganaKatakanaPairs.TryGetValue(b, out var hiraganaEquiv) && hiraganaEquiv == a) | ||
| return 1; | ||
|
|
||
| // Similar looking characters (common OCR confusions) | ||
| var similarChars = new[] | ||
| { | ||
| new[] {'シ', 'ツ'}, // shi/tsu confusion | ||
| new[] {'ソ', 'ン'}, // so/n confusion | ||
| new[] {'ク', 'ワ'}, // ku/wa confusion | ||
| new[] {'ヘ', 'へ'}, // he/he (different forms) | ||
| new[] {'ベ', 'べ'}, // be/be (different forms) | ||
| new[] {'ヲ', 'ヲ'}, // wo/wo (different forms) | ||
| new[] {'ヶ', 'ケ'}, // ke/ke variation | ||
| new[] {'ヵ', 'カ'}, // ka/ka variation | ||
| }; |
There was a problem hiding this comment.
Avoid per-cell allocations in GetJapaneseCharacterDifference.
hiraganaKatakanaPairs and similarChars are recreated on every comparison inside the Levenshtein matrix loop. This can significantly increase GC pressure during OCR matching.
Suggested refactor direction
+ private static readonly Dictionary<char, char> HiraganaKatakanaPairs = new Dictionary<char, char>
+ {
+ // ... existing pairs ...
+ };
+
+ private static readonly (char A, char B)[] SimilarChars =
+ {
+ ('シ', 'ツ'),
+ ('ソ', 'ン'),
+ ('ク', 'ワ'),
+ ('ヘ', 'へ'),
+ ('ベ', 'べ'),
+ ('ヶ', 'ケ'),
+ ('ヵ', 'カ'),
+ };
private int GetJapaneseCharacterDifference(char a, char b)
{
if (a == b) return 0;
-
- var hiraganaKatakanaPairs = new Dictionary<char, char>
- {
- // ...
- };
+ // use static caches above
- if (hiraganaKatakanaPairs.TryGetValue(a, out var katakanaEquiv) && katakanaEquiv == b)
+ if (HiraganaKatakanaPairs.TryGetValue(a, out var katakanaEquiv) && katakanaEquiv == b)
return 1;
- if (hiraganaKatakanaPairs.TryGetValue(b, out var hiraganaEquiv) && hiraganaEquiv == a)
+ if (HiraganaKatakanaPairs.TryGetValue(b, out var hiraganaEquiv) && hiraganaEquiv == a)
return 1;
-
- var similarChars = new[]
- {
- // ...
- };
-
- foreach (var pair in similarChars)
+ foreach (var pair in SimilarChars)
{
- if ((a == pair[0] && b == pair[1]) || (a == pair[1] && b == pair[0]))
+ if ((a == pair.A && b == pair.B) || (a == pair.B && b == pair.A))
return 1;
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@WFInfo/LanguageProcessing/JapaneseLanguageProcessor.cs` around lines 87 -
129, GetJapaneseCharacterDifference currently allocates hiraganaKatakanaPairs
and similarChars on every call, causing GC pressure; move these into class-level
static readonly fields (e.g., static readonly Dictionary<char,char>
HiraganaKatakanaPairs and static readonly HashSet<(char,char)> or static
readonly (char,char)[] SimilarCharPairs) and initialize them once (static
constructor or inline). Update GetJapaneseCharacterDifference to reference those
static fields (use TryGetValue on HiraganaKatakanaPairs and check membership in
SimilarCharPairs without creating new arrays) so no per-call allocations occur.
| public override int CalculateLevenshteinDistance(string s, string t) | ||
| { | ||
| // Check if both inputs contain Thai characters for Thai-aware comparison | ||
| bool sHasThai = ContainsThai(s); | ||
| bool tHasThai = ContainsThai(t); | ||
|
|
There was a problem hiding this comment.
Null input can throw in Thai detection path.
CalculateLevenshteinDistance calls ContainsThai directly on s/t; ContainsThai iterates without a null check.
Suggested fix
- bool sHasThai = ContainsThai(s);
- bool tHasThai = ContainsThai(t);
+ bool sHasThai = !string.IsNullOrEmpty(s) && ContainsThai(s);
+ bool tHasThai = !string.IsNullOrEmpty(t) && ContainsThai(t); private static bool ContainsThai(string input)
{
+ if (string.IsNullOrEmpty(input)) return false;
foreach (char c in input)
{
// Thai Unicode range (0x0E00-0x0E7F)
if (c >= 0x0E00 && c <= 0x0E7F) return true;
}
return false;
}Also applies to: 150-156
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@WFInfo/LanguageProcessing/ThaiLanguageProcessor.cs` around lines 25 - 30,
CalculateLevenshteinDistance currently calls ContainsThai(s) and ContainsThai(t)
without guarding against null; update CalculateLevenshteinDistance to null-check
inputs (s and t) before calling ContainsThai and handle nulls appropriately
(e.g., treat null as empty string or short-circuit to default behavior), and
apply the same null-safety change to the other affected method around lines
150-156 that also calls ContainsThai; ensure the checks are added where
ContainsThai is invoked so ContainsThai is never iterated over a null value.
| private static string NormalizeTurkishCharacters(string input) | ||
| { | ||
| // Convert Turkish characters to standard equivalents for comparison | ||
| return input.ToLowerInvariant() | ||
| .Replace('ğ', 'g') | ||
| .Replace('Ğ', 'G') | ||
| .Replace('ş', 's') | ||
| .Replace('Ş', 'S') | ||
| .Replace('ç', 'c') | ||
| .Replace('Ç', 'C') | ||
| .Replace('ö', 'o') | ||
| .Replace('Ö', 'O') | ||
| .Replace('ü', 'u') | ||
| .Replace('Ü', 'U') | ||
| .Replace('ı', 'i') | ||
| .Replace('İ', 'I'); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "TurkishLanguageProcessor.cs" -type fRepository: WFCD/WFinfo
Length of output: 110
🏁 Script executed:
# Once we find the file, let's examine it
git ls-files | grep -i turkishRepository: WFCD/WFinfo
Length of output: 108
🌐 Web query:
.NET C# ToLowerInvariant Turkish character İ combining dot U+0307 normalization
💡 Result:
In .NET, ToLowerInvariant() uses culture-insensitive (invariant) Unicode case mapping, not Turkish-specific casing rules. Microsoft explicitly distinguishes culture-sensitive vs culture-insensitive casing and recommends using CultureInfo.InvariantCulture when you need consistent, culture-independent results. (learn.microsoft.com)
Why İ becomes i + U+0307
The Turkish capital dotted I U+0130 (İ) has a Unicode special-casing lowercase mapping (language-insensitive) to two code points: U+0069 (i) + U+0307 (COMBINING DOT ABOVE). (charlottebuff.com)
So in .NET you’ll see:
"\u0130".ToLowerInvariant()→"\u0069\u0307"(renders asi̇)
This also aligns with Unicode’s canonical decomposition data for U+0130 (it decomposes to I + U+0307), which is why the dot-above is represented as a combining mark. (charactercodes.net)
Why normalization doesn’t “fix” it
There is no single precomposed Unicode character for “Latin small letter i with dot above” that would replace i + U+0307, so Normalize(FormC) will typically keep it as two code points.
What to do (depending on intent)
1) If you actually want Turkish casing, use Turkish culture (when appropriate):
var tr = new CultureInfo("tr-TR");
var s = "\u0130".ToLower(tr); // "i" (U+0069)Unicode’s conditional special-casing for Turkish/Azeri maps U+0130 to plain i (without U+0307). (charlottebuff.com)
2) If you must use invariant casing but need to remove the extra dot (common for identifiers/keys):
Remove U+0307 after decomposition:
string FoldForKey(string input)
{
var lower = input.ToLowerInvariant().Normalize(NormalizationForm.FormD);
var sb = new StringBuilder(lower.Length);
foreach (var ch in lower)
if (ch != '\u0307') sb.Append(ch); // remove COMBINING DOT ABOVE
return sb.ToString().Normalize(NormalizationForm.FormC);
}(Only do this if it matches your domain rules, because it is no longer linguistically correct Turkish text handling.)
Citations:
- 1: https://learn.microsoft.com/en-us/dotnet/core/extensions/performing-culture-insensitive-case-changes?utm_source=openai
- 2: https://www.charlottebuff.com/unicode/charts/special-casing/?utm_source=openai
- 3: https://www.charactercodes.net/0130?utm_source=openai
- 4: https://charlottebuff.com/unicode/charts/special-casing/?utm_source=openai
🏁 Script executed:
cat -n WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs | sed -n '50,75p'Repository: WFCD/WFinfo
Length of output: 1020
🏁 Script executed:
# Let's also check the full method and surrounding context
cat -n WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs | sed -n '1,80p'Repository: WFCD/WFinfo
Length of output: 3301
NormalizeTurkishCharacters has two related bugs with Turkish character handling:
-
The uppercase character replacements (lines 61, 63, 65, 67, 69) are unreachable—they execute after
ToLowerInvariant()has already converted everything to lowercase. -
More critically,
ToLowerInvariant()on "İ" (U+0130, Turkish capital dotted I) produces "i" (U+0069) plus U+0307 (combining dot above), not plain "i". This leaves the combining dot artifact in the normalized output, breaking comparisons.
The fix is to handle Turkish I/İ casing before ASCII folding, using Turkish culture:
// Turkish-safe case handling before ASCII folding
string normalized = input
.Replace('İ', 'i')
.Replace('I', 'ı')
.ToLower(new CultureInfo("tr-TR"));
return normalized
.Replace('ğ', 'g')
.Replace('ş', 's')
.Replace('ç', 'c')
.Replace('ö', 'o')
.Replace('ü', 'u')
.Replace('ı', 'i');This ensures Unicode's special-case mapping for Turkish is applied correctly: U+0130 (İ) → plain "i", not "i̇".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@WFInfo/LanguageProcessing/TurkishLanguageProcessor.cs` around lines 56 - 72,
NormalizeTurkishCharacters currently calls ToLowerInvariant which makes the
subsequent uppercase Replace(...) calls unreachable and also mishandles 'İ'
(U+0130) by producing an "i" plus a combining dot; fix by applying Turkish-aware
casing and explicit I/İ handling before ASCII-folding: in
NormalizeTurkishCharacters first map 'İ'→'i' and 'I'→'ı' (or call ToLower with
new CultureInfo("tr-TR")) so Unicode special-case mapping for Turkish I is
applied, then perform the ASCII-fold replacements for 'ğ','ş','ç','ö','ü' and
finally map 'ı'→'i'; remove the now-unnecessary uppercase Replace(...) calls.
| var wnd = MainWindow.INSTANCE; | ||
| var disp = wnd?.Dispatcher; | ||
| if (disp != null) | ||
| await disp.InvokeAsync(() => | ||
| { | ||
| wnd.UpdateMarketStatus(msg); | ||
| }); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find and examine WFInfo/Main.cs
find . -name "Main.cs" -path "*/WFInfo/*"Repository: WFCD/WFinfo
Length of output: 71
🏁 Script executed:
# Get the file size first
wc -l WFInfo/Main.cs 2>/dev/null || find . -name "Main.cs" | head -5Repository: WFCD/WFinfo
Length of output: 73
🏁 Script executed:
# Read the specific section around lines 98-104
sed -n '90,115p' WFInfo/Main.csRepository: WFCD/WFinfo
Length of output: 978
🏁 Script executed:
# Find the method signature containing these lines
sed -n '70,110p' WFInfo/Main.csRepository: WFCD/WFinfo
Length of output: 1616
🏁 Script executed:
# Search for UpdateMarketStatusAsync and how it's called
rg "UpdateMarketStatusAsync|UpdateMarketStatus" WFInfo/Main.cs -B 2 -A 2Repository: WFCD/WFinfo
Length of output: 510
🌐 Web query:
WPF Dispatcher HasShutdownStarted HasShutdownFinished properties
💡 Result:
In WPF, these Dispatcher properties tell you where the dispatcher is in its shutdown lifecycle:
Dispatcher.HasShutdownStarted:trueonce shutdown has begun (theShutdownStartedevent is raised at that point). (learn.microsoft.com)Dispatcher.HasShutdownFinished:trueonce shutdown has fully completed (theShutdownFinishedevent is raised at that point). (learn.microsoft.com)
Key behavior to know:
- The dispatcher doesn’t finish shutting down until its event queue “unwinds” (i.e., it finishes processing/returning from the current message loop work). (learn.microsoft.com)
- After shutdown begins, pending queued work items are aborted, so
Invoke/BeginInvokework may be canceled/never run onceHasShutdownStartedistrue. (learn.microsoft.com)
Citations:
- 1: https://learn.microsoft.com/en-us/dotnet/api/system.windows.threading.dispatcher.hasshutdownstarted?view=windowsdesktop-9.0&utm_source=openai
- 2: https://learn.microsoft.com/en-us/dotnet/api/system.windows.threading.dispatcher.hasshutdownfinished?view=windowsdesktop-10.0&utm_source=openai
- 3: https://learn.microsoft.com/en-us/dotnet/api/system.windows.threading.dispatcher.hasshutdownstarted?view=windowsdesktop-9.0&utm_source=openai
- 4: https://learn.microsoft.com/en-us/dotnet/api/system.windows.threading.dispatcher.hasshutdownfinished?view=windowsdesktop-10.0&utm_source=openai
🌐 Web query:
WPF Dispatcher InvokeAsync shutdown exception fire and forget
💡 Result:
In WPF, Dispatcher.InvokeAsync queues work onto a Dispatcher. If the app (or that dispatcher thread) is shutting down, queued work may never run, or it may be canceled/rejected, commonly surfacing as exceptions like “The dispatcher has shutdown started/finished” / TaskCanceledException / OperationCanceledException (depending on timing). WPF itself has to defensively check for shutdown before running posted callbacks. [3]
What to do (safe “fire-and-forget”)
- Don’t post during shutdown
var d = Application.Current?.Dispatcher;
if (d is null || d.HasShutdownStarted || d.HasShutdownFinished) return;- Use a shutdown cancellation token and pass it to
InvokeAsync
InvokeAsyncsupports aCancellationToken, which you can cancel duringApplication.Exit,Window.Closing, etc., to prevent pending UI work from running late. [2]
private readonly CancellationTokenSource _uiCts = new();
public void OnAppExit() => _uiCts.Cancel();
public void PostToUi(Action action)
{
var d = Application.Current?.Dispatcher;
if (d is null || d.HasShutdownStarted || d.HasShutdownFinished) return;
_ = d.InvokeAsync(() =>
{
_uiCts.Token.ThrowIfCancellationRequested();
action();
}, DispatcherPriority.Normal, _uiCts.Token).Task
.ContinueWith(t => Log(t.Exception),
TaskContinuationOptions.OnlyOnFaulted);
}- Always observe/log exceptions from fire-and-forget
If you truly “fire and forget” and never observe the returnedTask, failures can be missed or show up later as unobserved task exceptions. Attaching a fault-only continuation is a common pattern. (The WPF threading/dispatcher model is explicitly queue-based, so you need to treat these as real async operations.) [1]
If you’re seeing shutdown-related dispatcher exceptions frequently
It can also indicate a dispatcher running on a worker thread that isn’t being shut down cleanly before process/AppDomain exit. That’s a known cause of shutdown-time dispatcher exceptions; the correct fix is to manage that dispatcher’s lifetime and shut it down properly. [4]
Sources: WPF threading/Dispatcher overview [1], InvokeAsync cancellation guidance [2], WPF source showing shutdown checks around posted callbacks [3], shutdown-time exceptions when dispatchers aren’t properly shut down [4].
Guard InvokeAsync against dispatcher shutdown in UpdateMarketStatusAsync.
disp != null alone is insufficient during teardown. InvokeAsync can throw TaskCanceledException or OperationCanceledException after shutdown starts, and since UpdateMarketStatus fires asynchronously without observing exceptions, failures are silently lost.
Add checks for dispatcher shutdown state before invoking:
Suggested fix
var wnd = MainWindow.INSTANCE;
var disp = wnd?.Dispatcher;
- if (disp != null)
- await disp.InvokeAsync(() =>
- {
- wnd.UpdateMarketStatus(msg);
- });
+ if (disp == null || disp.HasShutdownStarted || disp.HasShutdownFinished)
+ return;
+
+ await disp.InvokeAsync(() =>
+ {
+ wnd.UpdateMarketStatus(msg);
+ });📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| var wnd = MainWindow.INSTANCE; | |
| var disp = wnd?.Dispatcher; | |
| if (disp != null) | |
| await disp.InvokeAsync(() => | |
| { | |
| wnd.UpdateMarketStatus(msg); | |
| }); | |
| var wnd = MainWindow.INSTANCE; | |
| var disp = wnd?.Dispatcher; | |
| if (disp == null || disp.HasShutdownStarted || disp.HasShutdownFinished) | |
| return; | |
| await disp.InvokeAsync(() => | |
| { | |
| wnd.UpdateMarketStatus(msg); | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@WFInfo/Main.cs` around lines 98 - 104, The dispatcher call in
UpdateMarketStatusAsync using MainWindow.INSTANCE.Dispatcher must be guarded
against shutdown: before calling disp.InvokeAsync(() =>
wnd.UpdateMarketStatus(msg)) check disp != null && !disp.HasShutdownStarted &&
!disp.HasShutdownFinished and ensure wnd is not null, then await the returned
DispatcherOperation.Task and wrap the await in a try/catch that catches
TaskCanceledException and OperationCanceledException (and optionally
InvalidOperationException) so exceptions during dispatcher shutdown are
observed/handled instead of being silently lost; reference MainWindow.INSTANCE,
Dispatcher, InvokeAsync, and UpdateMarketStatus when applying this change.
| var testCase = JsonConvert.DeserializeObject<TestCase>(File.ReadAllText(jsonPath)); | ||
| result.Language = testCase.Language ?? "unknown"; | ||
| result.Theme = testCase.Theme ?? "auto"; | ||
| result.Category = testCase.Category ?? "reward"; | ||
| result.ExpectedParts = testCase.Parts?.Values.ToList() ?? new List<string>(); | ||
|
|
There was a problem hiding this comment.
Guard null TestCase after deserialization.
Line [134] dereferences testCase without a null check; malformed scenario JSON currently turns into an unstructured runner error.
🛠️ Proposed fix
// Load spec
var testCase = JsonConvert.DeserializeObject<TestCase>(File.ReadAllText(jsonPath));
+ if (testCase == null)
+ {
+ result.ErrorMessage = $"Invalid test case JSON: {jsonPath}";
+ result.Success = false;
+ stopwatch.Stop();
+ result.ProcessingTimeMs = stopwatch.ElapsedMilliseconds;
+ return result;
+ }
result.Language = testCase.Language ?? "unknown";🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@WFInfo/Tests/OCRTestRunner.cs` around lines 133 - 138, After deserializing
into TestCase, guard against testCase being null: check if testCase == null
right after JsonConvert.DeserializeObject<TestCase>(...) and handle it
explicitly (for example, throw a descriptive exception or set the runner result
to a failed/error state and return) instead of dereferencing; ensure you
reference the same symbols used in the diff (testCase, result.Language,
result.Theme, result.Category, result.ExpectedParts, and testCase.Parts) so the
code either assigns defaults or aborts cleanly when the JSON is malformed.
| <PropertyGroup> | ||
| <UpdateAssemblyVersion>9.8.0.0</UpdateAssemblyVersion> | ||
| </PropertyGroup> |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "1) update.xml version source in WFInfo/WFInfo.csproj"
rg -nP '<UpdateAssemblyVersion>([^<]+)</UpdateAssemblyVersion>|<WriteLinesToFile[^>]*version' WFInfo/WFInfo.csproj || echo "No matches found"
echo
echo "2) Version properties declared in WFInfo/WFInfo.csproj"
rg -nP '<(Version|AssemblyVersion|FileVersion|InformationalVersion)>([^<]+)</(Version|AssemblyVersion|FileVersion|InformationalVersion)>' WFInfo/WFInfo.csproj || echo "No matches found"
echo
echo "3) Assembly version attributes in AssemblyInfo files"
fd -HI 'AssemblyInfo.cs$' -x rg -nP 'Assembly(File|Informational)?Version\("([^"]+)"\)' {} || echo "No AssemblyInfo.cs found"
echo
echo "4) Full GenerateUpdateXml target"
rg -A 30 'Target Name="GenerateUpdateXml"' WFInfo/WFInfo.csproj || echo "Target not found"
echo
echo "5) Check lines 50-52 as mentioned in review"
sed -n '48,54p' WFInfo/WFInfo.csprojRepository: WFCD/WFinfo
Length of output: 2885
Eliminate version drift risk by deriving updater version from built assembly.
UpdateAssemblyVersion at line 51 is maintained separately from AssemblyVersion in WFInfo/Properties/AssemblyInfo.cs (lines 54-55). Though currently synchronized at 9.8.0.0, this dual maintenance point creates drift risk: if the assembly version is updated but UpdateAssemblyVersion is forgotten, update.xml at line 142 will advertise an incorrect version, breaking update behavior.
Derive the version from the compiled assembly using GetAssemblyIdentity instead:
💡 Proposed fix
- <PropertyGroup>
- <UpdateAssemblyVersion>9.8.0.0</UpdateAssemblyVersion>
- </PropertyGroup>
+ <!-- Remove hard-coded updater version; resolve from built assembly -->
<Target Name="GenerateUpdateXml" AfterTargets="Build">
+ <GetAssemblyIdentity AssemblyFiles="$(TargetPath)">
+ <Output TaskParameter="Assemblies" ItemName="BuiltAssemblyIdentity" />
+ </GetAssemblyIdentity>
<PropertyGroup>
<UpdateXmlPath>$(OutputPath)update.xml</UpdateXmlPath>
+ <ResolvedUpdateVersion>@(BuiltAssemblyIdentity->'%(Version)')</ResolvedUpdateVersion>
</PropertyGroup>
<WriteLinesToFile File="$(UpdateXmlPath)" Lines="<?xml version="1.0" encoding="UTF-8"?>" Overwrite="true" />
<WriteLinesToFile File="$(UpdateXmlPath)" Lines="<item>" Overwrite="false" />
- <WriteLinesToFile File="$(UpdateXmlPath)" Lines=" <version>$(UpdateAssemblyVersion)</version>" Overwrite="false" />
+ <WriteLinesToFile File="$(UpdateXmlPath)" Lines=" <version>$(ResolvedUpdateVersion)</version>" Overwrite="false" />
<WriteLinesToFile File="$(UpdateXmlPath)" Lines=" <url>https://github.com/WFCD/WFinfo/releases/latest/download/WFInfo.zip</url>" Overwrite="false" />
<WriteLinesToFile File="$(UpdateXmlPath)" Lines=" <changelog>https://github.com/WFCD/WFinfo/releases/latest</changelog>" Overwrite="false" />
<WriteLinesToFile File="$(UpdateXmlPath)" Lines=" <mandatory>false</mandatory>" Overwrite="false" />
<WriteLinesToFile File="$(UpdateXmlPath)" Lines="</item>" Overwrite="false" />
- <Message Text="Generated update.xml at $(UpdateXmlPath) with version $(UpdateAssemblyVersion)" Importance="high" />
+ <Message Text="Generated update.xml at $(UpdateXmlPath) with version $(ResolvedUpdateVersion)" Importance="high" />
</Target>Also applies to line 142 (where version is written).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@WFInfo/WFInfo.csproj` around lines 50 - 52, The project currently keeps
UpdateAssemblyVersion in WFInfo.csproj in sync with AssemblyVersion in
Properties/AssemblyInfo.cs, which risks drift; remove the hardcoded
<UpdateAssemblyVersion> entry and instead derive the updater version by running
the MSBuild GetAssemblyIdentity task against the built assembly (the output
assembly from the project) and assign its Version metadata to the property used
when generating update.xml; update the MSBuild target that writes update.xml to
use the property produced by GetAssemblyIdentity (referencing the project output
assembly and the GetAssemblyIdentity task) so the value written to update.xml
always matches the compiled AssemblyVersion declared in AssemblyInfo.cs.
Add support of all languages except of Thai, Japanese and Turkish.
Add boilerplate for future testing framework to monitor improvements/regressions of OCR
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores
UI