Fix XGrammar bitmask initialization and add null check for gen_config in generate method by windreamer · Pull Request #4349 · InternLM/lmdeploy

windreamer · 2026-02-11T03:27:23Z

Motivation

This PR addresses two critical issues in LMDeploy's guided generation and batch inference functionality:

XGrammar Bitmask Initialization Bug: The guided decoding mechanism initializes token bitmasks to control which tokens are allowed during constrained generation. Previously, the bitmask was initialized with zeros, which incorrectly blocked all tokens instead of allowing all tokens. This caused generation failures when certain sequences in a batch didn't have grammar constraints applied.
Silent Failure on None gen_config: The batch_infer method accepts a list of GenerationConfig objects, but when None was passed for individual items, the code would silently fail or hang due to unhandled null pointer dereference when accessing gen_config.max_new_tokens.

Modifications

Guided Decoding Bitmask Fix (src/turbomind/generation/guided_decoding.cc):
- Changed the bitmask initialization value from 0 to -1 (all bits set to 1 in two's complement representation for int32_t).
- This ensures that when a sequence doesn't have an active grammar matcher, all tokens are permitted by default rather than none.
Null GenerationConfig Guard (lmdeploy/serve/core/async_engine.py):
- Added an explicit check to instantiate a default GenerationConfig() when gen_config is None.
- This prevents downstream code from attempting to access attributes on a null object, which was causing the engine to hang silently.
Regression Test (tests/test_lmdeploy/test_grammar.py):
- Added test_mix_guided_matrix to verify that batch inference works correctly when mixing guided and unguided generation in the same batch.
- The test ensures unguided sequences produce arbitrary text (not necessarily conforming to schema) while guided sequences strictly follow the JSON schema constraints.

Copilot

Pull request overview

This pull request fixes a bug where grammar constraints from structured output requests incorrectly persist and apply to subsequent non-structured output requests when ModelRequest instances are reused. The fix introduces a clearGrammar() method to explicitly reset the grammar state after each inference session completes.

Changes:

Added clearGrammar() C++ method to ModelRequest class for resetting grammar state
Added Python binding clear_grammar() to expose the cleanup functionality
Called clear_grammar() in the finally block of async_stream_infer() to ensure cleanup

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
src/turbomind/engine/model_request.h	Declares the new clearGrammar() method
src/turbomind/engine/model_request.cc	Implements clearGrammar() to reset the grammar_ shared_ptr
src/turbomind/python/bind.cpp	Adds Python binding for clear_grammar with appropriate GIL handling
lmdeploy/turbomind/turbomind.py	Calls clear_grammar() in finally block after inference completes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lmdeploy/turbomind/turbomind.py

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/test_lmdeploy/test_grammar.py

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lmdeploy/turbomind/turbomind.py

src/turbomind/generation/guided_decoding.cc

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/turbomind/python/bind.cpp

src/turbomind/engine/model_request.cc

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/test_lmdeploy/test_grammar.py

windreamer requested a review from lzhangzz February 11, 2026 03:27

windreamer mentioned this pull request Feb 11, 2026

[Bug] Segmentation fault (core dumped) with tp >= 2 #4337

Open

3 tasks

windreamer force-pushed the fix_guided_decoding_reuse branch from 20ef3f9 to 547c994 Compare February 11, 2026 03:30

CUHKSZzxy requested a review from Copilot February 11, 2026 06:39

Copilot started reviewing on behalf of CUHKSZzxy February 11, 2026 06:40 View session

Copilot AI reviewed Feb 11, 2026

View reviewed changes

lmdeploy/turbomind/turbomind.py Outdated Show resolved Hide resolved

lmdeploy/turbomind/turbomind.py Outdated Show resolved Hide resolved

windreamer requested a review from Copilot February 12, 2026 02:45

Copilot started reviewing on behalf of windreamer February 12, 2026 02:46 View session

Copilot AI reviewed Feb 12, 2026

View reviewed changes

tests/test_lmdeploy/test_grammar.py Show resolved Hide resolved

windreamer force-pushed the fix_guided_decoding_reuse branch 2 times, most recently from f606bb6 to f81c224 Compare February 12, 2026 13:33

windreamer requested a review from Copilot February 12, 2026 13:34

Copilot started reviewing on behalf of windreamer February 12, 2026 13:34 View session

Copilot AI reviewed Feb 12, 2026

View reviewed changes

lmdeploy/turbomind/turbomind.py Outdated Show resolved Hide resolved

src/turbomind/generation/guided_decoding.cc Outdated Show resolved Hide resolved

windreamer force-pushed the fix_guided_decoding_reuse branch from f81c224 to de1e35e Compare February 12, 2026 14:12

windreamer requested a review from Copilot February 12, 2026 14:13

Copilot started reviewing on behalf of windreamer February 12, 2026 14:13 View session

Copilot AI reviewed Feb 12, 2026

View reviewed changes

src/turbomind/python/bind.cpp Outdated Show resolved Hide resolved

src/turbomind/engine/model_request.cc Outdated Show resolved Hide resolved

windreamer and others added 3 commits February 13, 2026 17:40

fix(guided-decoding): fix the initialized value from 0s to 1s

ff042cd

fix: fix when gen_config is a mix of None and config

e667bc6

test: add mixing guided and non-guided tests

f74ca9b

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

windreamer force-pushed the fix_guided_decoding_reuse branch from de1e35e to f74ca9b Compare February 13, 2026 09:41

windreamer changed the title ~~fix: add clear_grammar to remove grammar from reused model_request~~ Fix XGrammar bitmask initialization and add null check for gen_config in generate method Feb 13, 2026

windreamer requested review from Copilot and lvhan028 February 13, 2026 09:49

Copilot AI reviewed Feb 13, 2026

View reviewed changes

tests/test_lmdeploy/test_grammar.py Show resolved Hide resolved

Copilot started reviewing on behalf of windreamer February 13, 2026 09:55 View session

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix XGrammar bitmask initialization and add null check for gen_config in generate method#4349

Fix XGrammar bitmask initialization and add null check for gen_config in generate method#4349
windreamer wants to merge 3 commits intoInternLM:mainfrom
windreamer:fix_guided_decoding_reuse

windreamer commented Feb 11, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

windreamer commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

windreamer commented Feb 11, 2026 •

edited

Loading