Skip to content

Add an auto-generated unicode character category file#4605

Open
TheBlueMatt wants to merge 2 commits intolightningdevkit:mainfrom
TheBlueMatt:2026-05-unicode-autogen
Open

Add an auto-generated unicode character category file#4605
TheBlueMatt wants to merge 2 commits intolightningdevkit:mainfrom
TheBlueMatt:2026-05-unicode-autogen

Conversation

@TheBlueMatt
Copy link
Copy Markdown
Collaborator

1a01b5a added detection of unicode format characters in PrintableString, but used a hard-coded table which may eventually become out of date.

Here we switch to an auto-generated table, include all General_Category Other characters, and also ban unallocated code points.

Finally, CI validates that the file is kept up to date.

Written by Claude

@ldk-reviews-bot
Copy link
Copy Markdown

ldk-reviews-bot commented May 7, 2026

👋 Thanks for assigning @tnull as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@TheBlueMatt TheBlueMatt requested a review from tnull May 7, 2026 18:47
1a01b5a added detection of unicode
format characters in `PrintableString`, but used a hard-coded table
which may eventually become out of date.

Here we switch to an auto-generated table, include all
`General_Category` `Other` characters, and also ban unallocated
code points.

Finally, CI validates that the file is kept up to date.

Written by Claude
@TheBlueMatt TheBlueMatt force-pushed the 2026-05-unicode-autogen branch from b6f8c03 to bd75483 Compare May 7, 2026 18:49
Comment on lines +36 to +38
let is_other = is_unicode_general_category_other(c);
let is_unassigned = is_unicode_general_category_unassigned(c);
let c = if c.is_control() || is_other || is_unassigned {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: c.is_control() is now fully redundant — it checks Cc (Control), which is already covered by is_unicode_general_category_other (see 0x0000..=0x001F and 0x007F..=0x009F in unicode.rs). The old code needed it because is_format_char only covered Cf, but the new function covers all of Cc / Cf / Cs / Co.

Not a bug (the || short-circuits harmlessly), but it's potentially confusing because it suggests is_other doesn't handle control characters.

Suggested change
let is_other = is_unicode_general_category_other(c);
let is_unassigned = is_unicode_general_category_unassigned(c);
let c = if c.is_control() || is_other || is_unassigned {
let c = if is_unicode_general_category_other(c) || is_unicode_general_category_unassigned(c) {

@ldk-claude-review-bot
Copy link
Copy Markdown
Collaborator

ldk-claude-review-bot commented May 7, 2026

Review Summary

The CI workflow file (.github/workflows/check_unicode.yml) has three bugs that will prevent it from functioning at all. The generator script and generated Rust code are correct.

Inline comments posted:

  • .github/workflows/check_unicode.yml:7Missing jobs: key: check-unicode is nested under on: instead of under a jobs: top-level key, so GitHub Actions will never run the job.
  • .github/workflows/check_unicode.yml:11-12Missing actions/checkout step: Without checking out the repo, contrib/gen_unicode_general_category.py and lightning-types/src/unicode.rs won't exist on the runner.
  • .github/workflows/check_unicode.yml:23fi indentation breaks YAML and shell: fi is at 8-space indent while the literal block scalar content is at 10-space indent, so fi falls outside the script, causing both a YAML parse error and an unterminated if in the shell.

Previously flagged (still applicable):

  • lightning-types/src/string.rs:38c.is_control() is redundant with is_unicode_general_category_other which already covers all Cc characters.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2026

Codecov Report

❌ Patch coverage is 92.23301% with 40 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.16%. Comparing base (1a01b5a) to head (2de5954).
⚠️ Report is 60 commits behind head on main.

Files with missing lines Patch % Lines
lightning-types/src/unicode.rs 92.18% 40 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4605      +/-   ##
==========================================
+ Coverage   86.09%   86.16%   +0.07%     
==========================================
  Files         157      158       +1     
  Lines      108828   109323     +495     
  Branches   108828   109323     +495     
==========================================
+ Hits        93694    94199     +505     
+ Misses      12519    12509      -10     
  Partials     2615     2615              
Flag Coverage Δ
tests 86.16% <92.23%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, seems maybe a bit excessive, but fine by me. No need to check this on every CI run though, let's just move it to a weekly cronjob?

Comment thread .github/workflows/build.yml Outdated
- name: Run default clippy linting
run: |
./ci/check-lint.sh
- name: Check Unicode general-category table is up to date
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth to do this every time we run CI? Can't we add that as a weekly cronjob rather?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean not sure it matters that much, but, done.

Copy link
Copy Markdown
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, please squash.

schedule:
- cron: '42 3 * * *'

check-unicode:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing jobs: key. check-unicode is nested under on: (the trigger block) instead of under its own jobs: top-level key. GitHub Actions will either reject this workflow or silently run zero jobs.

Every other workflow in this repo has the correct structure:

on:
  ...

jobs:
  check-unicode:
    ...

You need to add jobs: at the top level (zero indent) before check-unicode:, and un-indent check-unicode: out of the on: block.

Comment on lines +11 to +12
steps:
- name: Check unicode file state
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing actions/checkout step. Without checking out the repository, neither contrib/gen_unicode_general_category.py nor lightning-types/src/unicode.rs will exist on the runner, so the run step will fail immediately.

Every other workflow in this repo checks out first. Add:

    steps:
      - uses: actions/checkout@v4
      - name: Check unicode file state

RUN_URL="https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}"
BODY="The unicode character listing is out of date, see $RUN_URL"
gh issue create --title "$TITLE" --body "$BODY"
fi
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: fi is indented at 8 spaces, but the YAML literal block scalar (run: |) has its content indented at 10 spaces. In YAML, once the block scalar's indentation is established by the first content line, any line with less indentation terminates the block. So fi falls outside the shell script, producing:

  1. A shell syntax error (unterminated if)
  2. A YAML parse error (fi isn't a valid key at this level)

Fix by indenting fi to 10 spaces (matching the rest of the script body):

Suggested change
fi
fi

@tnull
Copy link
Copy Markdown
Contributor

tnull commented May 8, 2026

Claude's reviews seem legit though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants