[Contribution] SolarOpenForCausalLM Support#65
Open
lifelongeeek wants to merge 10 commits intoaws-neuron:mainfrom
Open
[Contribution] SolarOpenForCausalLM Support#65lifelongeeek wants to merge 10 commits intoaws-neuron:mainfrom
lifelongeeek wants to merge 10 commits intoaws-neuron:mainfrom
Conversation
Adds NXD inference support for Solar Open MoE (SolarOpenForCausalLM) models
from upstage. Solar Open is a 100B-parameter MoE model sharing the DeepSeek
routing architecture but with distinct weight layout and RoPE configuration.
Key components:
- NeuronSolarOpenForCausalLM: top-level CausalLM model class
- NeuronSolarOpenModel: transformer body (all layers are MoE, first_k_dense_replace=0)
- NeuronSolarOpenAttention: multi-head GQA with full RoPE (partial_rotary_factor=1.0)
and optional YaRN scaling (SolarOpenYarnRotaryEmbedding)
- NeuronSolarOpenDecoderLayer: decoder layer with MoE MLP
- SolarOpenInferenceConfig: config loader with field mapping and defaults for
fields absent from the upstage/Solar-Open-100B config.json
- NeuronSolarOpenRouter: reuses GLM-4.5 group-limited routing logic
- initialize_solar_open_moe_module: wires router + ExpertMLPsV2 + SharedExperts
Weight conversion: HF per-expert format -> NXD fused format
HF: mlp.experts.{e}.{gate,up}_proj.weight [I, H] (per-expert)
NXD: mlp.experts.gate_up_proj [E, H, 2I] (fused)
Supports YaRN RoPE scaling (factor=2.0, original_max_position_embeddings=65536)
as used in upstage/Solar-Open-100B.
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
- examples/generation_solar_open_demo.py: compile + run inference demo for the tiny random Solar Open model (tp_degree=4, moe_tp_degree=4) - examples/generation_solar_open_100b_demo.py: compile + run inference demo for upstage/Solar-Open-100B (requires trn2.48xlarge or larger) - test_solar_open_accuracy.py: CPU vs Neuron token-matching accuracy test with standalone PyTorch reference implementation; verified passing (10/10 tokens match with greedy decoding on tiny random model) - test_solar_open_100b_accuracy.py: CPU vs Neuron accuracy test for the full 100B model; includes YaRN RoPE CPU reference implementation - create_solar_open_tiny_random.py: creates a small random-weight Solar Open checkpoint matching the 100B architecture for local testing - create_solar_open_100b_random.py: creates a 2-layer random-weight checkpoint with full 100B dimensions (128 experts, hidden_size=4096) for integration testing on larger instances Hardware note: upstage/Solar-Open-100B (48 layers, 128 experts) requires ~168 GB total weights; trn2.48xlarge (64 NeuronCores) recommended. Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
- docs/solar_open_implementation.md: architecture overview, module breakdown, weight conversion details (per-expert HF -> fused NXD), YaRN RoPE notes, and sharding configuration guide - docs/solar_open_testing.md: step-by-step testing guide with tiny random model, expected outputs, and troubleshooting notes - docs/solar_open_100b.md: full experiment report for upstage/Solar-Open-100B including discovered library limitations (EP + token generation unsupported), hardware requirements analysis, and runbook for large instance deployment Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
…pter.py tensor_capture_hook was referenced in prepare_inputs_for_generation() but never initialized from kwargs, causing a NameError at runtime. input_capture_hook (the correct variable) is already present and extracted from kwargs on L265. Removed the dangling tensor_capture_hook entry from the model_inputs dict.
- Remove src/neuronx_distributed_inference/models/solar_open/ (modeling + __init__) - Remove root-level scripts: create_solar_open_*.py, test_solar_open_*.py - Remove examples/generation_solar_open_demo.py, generation_solar_open_100b_demo.py - Remove docs/solar_open_*.md All files are now in contrib/models/solar_open/ following the NxDI contrib pattern.
… README Restructures Solar Open support to follow the NxDI contrib/ contribution pattern, identical to what was done for GLM-4.5 MoE (PR#58). contrib/models/solar_open/ ├── src/solar_open/ │ ├── __init__.py │ └── modeling_solar_open.py (NeuronSolarOpenForCausalLM + config + loader) ├── test/ │ ├── conftest.py (session-scoped fixtures) │ ├── integration/ │ │ ├── config_solar_open_2layers.json │ │ ├── utils.py (SolarOpenReferenceModel, get_neuron_config) │ │ └── test_model.py (smoke, output shape, determinism) │ └── unit/ │ ├── test_router.py (10 tests, object.__new__ bypass pattern) │ ├── test_attention.py (8 tests, inspect-based) │ └── test_decoder.py (16 tests, all-MoE architecture verification) ├── examples/generation_solar_open_demo.py └── README.md examples/generation_solar_open.py (top-level production benchmark script) Key Solar Open specifics: - NOT in transformers: uses load_solar_open_config() custom JSON loader - first_k_dense_replace=0: ALL layers are MoE (no dense branch) - Full RoPE (partial_rotary_factor=1.0), no QK norm, no attention bias - Router: sigmoid + group routing + e_score_correction_bias (identical to GLM-4.5) - Production config: tp_degree=32, moe_tp_degree=4, moe_ep_degree=8 Unit tests: 40/40 PASSED (CPU, no Neuron hardware required)
- modeling_solar_open.py: add transformers_version to SolarOpenInferenceConfig so HuggingFaceGenerationAdapter does not propagate None into generation_config - modeling_solar_open.py: override _construct_output to unwrap list/tuple logits returned by NxDI Neuron runtime into a single tensor (required for hf_adapter logits slicing in _sample) - test/conftest.py: copy generation_config.json to traced_dir alongside weights - test/integration/utils.py: generate generation_config.json in model_dir; handle list/tuple logits in check_logit_accuracy - test/integration/test_model.py: patch adapter.generation_config.transformers_version as fallback safety guard; all 5 integration tests now pass
Solar Open has been merged into transformers main (https://github.com/huggingface/transformers/blob/main/src/transformers/models/solar_open/) but is not yet available in stable releases (≤4.56.2). Update all comments and docstrings that incorrectly stated it was absent from transformers entirely. Also update PR#3 description table and Architecture Notes section accordingly.
- load_hf_model(): use SolarOpenForCausalLM.from_pretrained() instead of loading safetensors directly (transformers >= 5.0.0 includes solar_open) - load_solar_open_config(): use SolarOpenConfig.from_pretrained() with rope_parameters -> rope_theta/rope_scaling conversion for NxDI compat - Fix transformers 5.0 rename: SampleDecoderOnlyOutput -> GenerateDecoderOnlyOutput - test/integration/utils.py: replace 300-line SolarOpenReferenceModel with SolarOpenForCausalLM as CPU reference; create_tiny_solar_open_model() now uses save_pretrained() (auto-writes config.json + generation_config.json); check_text_accuracy() uses logit MAE vs SolarOpenForCausalLM - test/conftest.py: add transformers 5.0 compat shims (utils.fx stub, SampleDecoderOnlyOutput alias); remove config_solar_open_2layers.json dep - test/integration/test_model.py: update accuracy test docstring/args
- Remove examples/generation_solar_open.py (A) — generation_solar_open_demo.py
(B) under contrib/models/solar_open/examples/ is the canonical demo script
- Revert src/neuronx_distributed_inference/utils/hf_adapter.py to upstream state
- contrib/models/solar_open/test/conftest.py: add shim 3 to work around
hf_adapter.py upstream issue where tensor_capture_hook is undefined —
(a) inject None into module globals to resolve the NameError via LOAD_GLOBAL,
(b) wrap prepare_inputs_for_generation to strip tensor_capture_hook from
model_inputs so NeuronBaseForCausalLM.forward() does not receive it
- Update generation_solar_open_demo.py: remove outdated 'not in transformers'
comment, add generation_config.json to the post-compile copy step, add
production 100B config hints in argparse help and get_neuron_config()
30ff163 to
25c6d95
Compare
ahimsh-aws
reviewed
Mar 14, 2026
| model.load(traced_model_path) | ||
| ``` | ||
|
|
||
| See `examples/generation_solar_open_demo.py` for a full end-to-end example, or `../../examples/generation_solar_open.py` for the production benchmark script. |
There was a problem hiding this comment.
nit: unavailable file - ../../examples/generation_solar_open.py
ahimsh-aws
reviewed
Mar 14, 2026
| - **Parameters:** ~100B total, ~22B active per token | ||
| - **License:** Check HuggingFace model card | ||
|
|
||
| > **Note:** Solar Open is **not** available in the `transformers` library. The model config and weights are loaded directly from the HuggingFace checkpoint using custom loaders (`load_solar_open_config`). |
There was a problem hiding this comment.
nit: As you mentioned in the generation code. It is available in transformers (>= 5.0.0).
ahimsh-aws
reviewed
Mar 14, 2026
ahimsh-aws
left a comment
There was a problem hiding this comment.
Thanks for the contribution! Just some nit comments to address.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR contributes NeuronX Distributed Inference (NxDI) support for Solar Open 100B MoE (
upstage/Solar-Open-100B) following thecontrib/contribution pattern.Solar Open shares the GLM-4.5 MoE routing architecture but is distinct in that all layers are MoE (
first_k_dense_replace=0), uses full RoPE (partial_rotary_factor=1.0), and has no QK-norm or attention bias. The model has been merged into transformers main but is not yet in the current stable release; config and weights are loaded via a direct JSON loader until the stable release ships.Model Information
Model Name: Solar Open 100B (SolarOpenForCausalLM)
Model Architecture: Decoder-only MoE transformer. All layers are MoE (
first_k_dense_replace=0), full RoPE (partial_rotary_factor=1.0), no QK-norm, no attention bias.Purpose: Text generation
Checklist
Please ensure your PR includes the following items. Refer to the contrib/CONTRIBUTING.md for detailed guidelines.
Required Components
Accuracy Test (ex.
test/integration/test_model.py)README.md with the following sections:
Source Code (
src/)Optional Components
test/unit/directoryFolder Structure
Top-level production benchmark script:
Testing
How did you test this change?
Test Results:
Unit tests (CPU):
Integration tests (Neuron hardware, 2-layer toy model):
End-to-end generation (trn1.32xlarge, 100B full model):
Prompt: "Could you explain the concept of quantum computing in simple terms?"
Compatibility
Tested with:
Related Issues
vLLM Integration
For vLLM integration details, see: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/onboarding-models.html#nxdi-onboarding-models-vllm
By submitting this PR, I confirm that: