[Contribution] SolarOpenForCausalLM Support by lifelongeeek · Pull Request #65 · aws-neuron/neuronx-distributed-inference

lifelongeeek · 2026-03-10T06:15:03Z

Description

This PR contributes NeuronX Distributed Inference (NxDI) support for Solar Open 100B MoE (upstage/Solar-Open-100B) following the contrib/ contribution pattern.

Solar Open shares the GLM-4.5 MoE routing architecture but is distinct in that all layers are MoE (first_k_dense_replace=0), uses full RoPE (partial_rotary_factor=1.0), and has no QK-norm or attention bias. The model has been merged into transformers main but is not yet in the current stable release; config and weights are loaded via a direct JSON loader until the stable release ships.

Model Information

Model Name: Solar Open 100B (SolarOpenForCausalLM)

Model Architecture: Decoder-only MoE transformer. All layers are MoE (first_k_dense_replace=0), full RoPE (partial_rotary_factor=1.0), no QK-norm, no attention bias.

Purpose: Text generation

Checklist

Please ensure your PR includes the following items. Refer to the contrib/CONTRIBUTING.md for detailed guidelines.

Required Components

Accuracy Test (ex. test/integration/test_model.py)
- At least one integration test that validates model accuracy
- Uses logit validation or equivalent accuracy verification
- Test can compile and run the model on Neuron
README.md with the following sections:
- Usage Example: Clear code example showing how to use the model
- Compatibility Matrix: Table showing tested Neuron SDK versions and instance types (Trn1/Trn2/Inf2)
- Example Checkpoints: Links to compatible model checkpoints (e.g., HuggingFace Hub)
- Testing Instructions: Command to run the test suite for the model
Source Code (src/)
- Modeling code following NxD Inference patterns
- Properly structured in the contrib folder hierarchy

Optional Components

Unit Tests (CPU or Neuron-based)
- Tests for individual modeling components
- Located in test/unit/ directory

Folder Structure

contrib/models/solar_open/
├── src/solar_open/
│   ├── __init__.py
│   └── modeling_solar_open.py      # NeuronSolarOpenForCausalLM + config + loader
├── test/
│   ├── conftest.py
│   ├── integration/
│   │   ├── config_solar_open_2layers.json
│   │   ├── utils.py                # SolarOpenReferenceModel, get_neuron_config
│   │   └── test_model.py           # smoke, logit accuracy, performance
│   └── unit/
│       ├── test_router.py          # 10 tests
│       ├── test_attention.py       # 8 tests
│       └── test_decoder.py         # 16 tests
├── examples/generation_solar_open_demo.py
└── README.md

Top-level production benchmark script:

examples/generation_solar_open.py   # tp_degree=32, moe_tp_degree=4, moe_ep_degree=8, seq_len=65536

Testing

How did you test this change?

Unit tests for all model components (router, attention, decoder)
Integration test validating logit accuracy against a CPU reference model on a 2-layer toy config
End-to-end text generation on a trn1.32xlarge instance with the full 100B model

Test Results:

Unit tests (CPU):

$ python -m pytest contrib/models/solar_open/test/unit/ -v
...
40 passed in 6.30s

Integration tests (Neuron hardware, 2-layer toy model):

$ python -m pytest contrib/models/solar_open/test/integration/ -v
...
5 passed in 20s

End-to-end generation (trn1.32xlarge, 100B full model):

Prompt: "Could you explain the concept of quantum computing in simple terms?"

================== NxDI Hardware Generation Result (100B Full-Layer) ==================
<|think|>The user asks: "Could you explain the concept of quantum computing in simple terms?"
This is a straightforward request for an explanation. The system prompt says we are Solar Open 100B,
a large language model trained by Upstage AI, Korean startup, knowledge cutoff 2025-07, current date
2026-03-09. There's no disallowed content. So we can comply. Provide a simple explanation of quantum
computing, perhaps analogies, basic concepts: qubits, superposition, entanglement, quantum gates,
quantum algorithms, differences from classical computing, potential applications, challenges. Keep it
simple, accessible. Possibly mention that it's still early stage, but promising. Use Korean? The user
wrote in English. So respond in English. Provide a clear, simple explanation. Possibly include analogies
like coin flipping, Schrödinger's cat, etc. Also mention that quantum computers can solve certain problems
faster, like factoring large numbers (Shor's algorithm), searching unsorted databases (Grover's algorithm),
simulating quantum systems. Also mention that building quantum computers is hard due to decoherence, error
correction, need for low temperature, etc. Provide a simple summary.

We need to ensure we comply with policy. There's no disallowed content. So we can comply.

We should be mindful of the date: knowledge cutoff 2025-07, current date 2026-03-09. So we can mention that
as of 2025, quantum computing is still in early stages, but progress is being made. Possibly mention recent
developments up to 2025. But we must not hallucinate beyond knowledge cutoff. We can say "as of my last
training data up to July 2025..."
======================================================================================

Compatibility

Tested with:

Neuron SDK Version(s): 2.27
Instance Type(s): trn1.32xlarge (full 100B model), trn2.3xlarge (2-layer integration test)
PyTorch Version: 2.8.0
Python Version: 3.12.3

Related Issues

Model checkpoint: https://huggingface.co/upstage/Solar-Open-100B
Solar Open in transformers main: https://github.com/huggingface/transformers/blob/main/src/transformers/models/solar_open/modeling_solar_open.py

vLLM Integration

This model/feature is intended for use with vLLM
Documentation includes vLLM registration instructions

For vLLM integration details, see: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/onboarding-models.html#nxdi-onboarding-models-vllm

By submitting this PR, I confirm that:

I have read and followed the contributing guidelines
This is a community contribution and may have limited testing compared to officially-supported models
The code follows best practices and is well-documented
All required components listed above are included

Adds NXD inference support for Solar Open MoE (SolarOpenForCausalLM) models from upstage. Solar Open is a 100B-parameter MoE model sharing the DeepSeek routing architecture but with distinct weight layout and RoPE configuration. Key components: - NeuronSolarOpenForCausalLM: top-level CausalLM model class - NeuronSolarOpenModel: transformer body (all layers are MoE, first_k_dense_replace=0) - NeuronSolarOpenAttention: multi-head GQA with full RoPE (partial_rotary_factor=1.0) and optional YaRN scaling (SolarOpenYarnRotaryEmbedding) - NeuronSolarOpenDecoderLayer: decoder layer with MoE MLP - SolarOpenInferenceConfig: config loader with field mapping and defaults for fields absent from the upstage/Solar-Open-100B config.json - NeuronSolarOpenRouter: reuses GLM-4.5 group-limited routing logic - initialize_solar_open_moe_module: wires router + ExpertMLPsV2 + SharedExperts Weight conversion: HF per-expert format -> NXD fused format HF: mlp.experts.{e}.{gate,up}_proj.weight [I, H] (per-expert) NXD: mlp.experts.gate_up_proj [E, H, 2I] (fused) Supports YaRN RoPE scaling (factor=2.0, original_max_position_embeddings=65536) as used in upstage/Solar-Open-100B. Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

- examples/generation_solar_open_demo.py: compile + run inference demo for the tiny random Solar Open model (tp_degree=4, moe_tp_degree=4) - examples/generation_solar_open_100b_demo.py: compile + run inference demo for upstage/Solar-Open-100B (requires trn2.48xlarge or larger) - test_solar_open_accuracy.py: CPU vs Neuron token-matching accuracy test with standalone PyTorch reference implementation; verified passing (10/10 tokens match with greedy decoding on tiny random model) - test_solar_open_100b_accuracy.py: CPU vs Neuron accuracy test for the full 100B model; includes YaRN RoPE CPU reference implementation - create_solar_open_tiny_random.py: creates a small random-weight Solar Open checkpoint matching the 100B architecture for local testing - create_solar_open_100b_random.py: creates a 2-layer random-weight checkpoint with full 100B dimensions (128 experts, hidden_size=4096) for integration testing on larger instances Hardware note: upstage/Solar-Open-100B (48 layers, 128 experts) requires ~168 GB total weights; trn2.48xlarge (64 NeuronCores) recommended. Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

- docs/solar_open_implementation.md: architecture overview, module breakdown, weight conversion details (per-expert HF -> fused NXD), YaRN RoPE notes, and sharding configuration guide - docs/solar_open_testing.md: step-by-step testing guide with tiny random model, expected outputs, and troubleshooting notes - docs/solar_open_100b.md: full experiment report for upstage/Solar-Open-100B including discovered library limitations (EP + token generation unsupported), hardware requirements analysis, and runbook for large instance deployment Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

…pter.py tensor_capture_hook was referenced in prepare_inputs_for_generation() but never initialized from kwargs, causing a NameError at runtime. input_capture_hook (the correct variable) is already present and extracted from kwargs on L265. Removed the dangling tensor_capture_hook entry from the model_inputs dict.

- Remove src/neuronx_distributed_inference/models/solar_open/ (modeling + __init__) - Remove root-level scripts: create_solar_open_*.py, test_solar_open_*.py - Remove examples/generation_solar_open_demo.py, generation_solar_open_100b_demo.py - Remove docs/solar_open_*.md All files are now in contrib/models/solar_open/ following the NxDI contrib pattern.

… README Restructures Solar Open support to follow the NxDI contrib/ contribution pattern, identical to what was done for GLM-4.5 MoE (PR#58). contrib/models/solar_open/ ├── src/solar_open/ │ ├── __init__.py │ └── modeling_solar_open.py (NeuronSolarOpenForCausalLM + config + loader) ├── test/ │ ├── conftest.py (session-scoped fixtures) │ ├── integration/ │ │ ├── config_solar_open_2layers.json │ │ ├── utils.py (SolarOpenReferenceModel, get_neuron_config) │ │ └── test_model.py (smoke, output shape, determinism) │ └── unit/ │ ├── test_router.py (10 tests, object.__new__ bypass pattern) │ ├── test_attention.py (8 tests, inspect-based) │ └── test_decoder.py (16 tests, all-MoE architecture verification) ├── examples/generation_solar_open_demo.py └── README.md examples/generation_solar_open.py (top-level production benchmark script) Key Solar Open specifics: - NOT in transformers: uses load_solar_open_config() custom JSON loader - first_k_dense_replace=0: ALL layers are MoE (no dense branch) - Full RoPE (partial_rotary_factor=1.0), no QK norm, no attention bias - Router: sigmoid + group routing + e_score_correction_bias (identical to GLM-4.5) - Production config: tp_degree=32, moe_tp_degree=4, moe_ep_degree=8 Unit tests: 40/40 PASSED (CPU, no Neuron hardware required)

- modeling_solar_open.py: add transformers_version to SolarOpenInferenceConfig so HuggingFaceGenerationAdapter does not propagate None into generation_config - modeling_solar_open.py: override _construct_output to unwrap list/tuple logits returned by NxDI Neuron runtime into a single tensor (required for hf_adapter logits slicing in _sample) - test/conftest.py: copy generation_config.json to traced_dir alongside weights - test/integration/utils.py: generate generation_config.json in model_dir; handle list/tuple logits in check_logit_accuracy - test/integration/test_model.py: patch adapter.generation_config.transformers_version as fallback safety guard; all 5 integration tests now pass

Solar Open has been merged into transformers main (https://github.com/huggingface/transformers/blob/main/src/transformers/models/solar_open/) but is not yet available in stable releases (≤4.56.2). Update all comments and docstrings that incorrectly stated it was absent from transformers entirely. Also update PR#3 description table and Architecture Notes section accordingly.

- load_hf_model(): use SolarOpenForCausalLM.from_pretrained() instead of loading safetensors directly (transformers >= 5.0.0 includes solar_open) - load_solar_open_config(): use SolarOpenConfig.from_pretrained() with rope_parameters -> rope_theta/rope_scaling conversion for NxDI compat - Fix transformers 5.0 rename: SampleDecoderOnlyOutput -> GenerateDecoderOnlyOutput - test/integration/utils.py: replace 300-line SolarOpenReferenceModel with SolarOpenForCausalLM as CPU reference; create_tiny_solar_open_model() now uses save_pretrained() (auto-writes config.json + generation_config.json); check_text_accuracy() uses logit MAE vs SolarOpenForCausalLM - test/conftest.py: add transformers 5.0 compat shims (utils.fx stub, SampleDecoderOnlyOutput alias); remove config_solar_open_2layers.json dep - test/integration/test_model.py: update accuracy test docstring/args

- Remove examples/generation_solar_open.py (A) — generation_solar_open_demo.py (B) under contrib/models/solar_open/examples/ is the canonical demo script - Revert src/neuronx_distributed_inference/utils/hf_adapter.py to upstream state - contrib/models/solar_open/test/conftest.py: add shim 3 to work around hf_adapter.py upstream issue where tensor_capture_hook is undefined — (a) inject None into module globals to resolve the NameError via LOAD_GLOBAL, (b) wrap prepare_inputs_for_generation to strip tensor_capture_hook from model_inputs so NeuronBaseForCausalLM.forward() does not receive it - Update generation_solar_open_demo.py: remove outdated 'not in transformers' comment, add generation_config.json to the post-compile copy step, add production 100B config hints in argparse help and get_neuron_config()

ahimsh-aws · 2026-03-14T14:09:04Z

contrib/models/solar_open/README.md

+model.load(traced_model_path)
+```
+
+See `examples/generation_solar_open_demo.py` for a full end-to-end example, or `../../examples/generation_solar_open.py` for the production benchmark script.


nit: unavailable file - ../../examples/generation_solar_open.py

ahimsh-aws · 2026-03-14T14:09:25Z

contrib/models/solar_open/README.md

+- **Parameters:** ~100B total, ~22B active per token
+- **License:** Check HuggingFace model card
+
+> **Note:** Solar Open is **not** available in the `transformers` library. The model config and weights are loaded directly from the HuggingFace checkpoint using custom loaders (`load_solar_open_config`).


nit: As you mentioned in the generation code. It is available in transformers (>= 5.0.0).

ahimsh-aws

Thanks for the contribution! Just some nit comments to address.

circle-jin and others added 9 commits February 19, 2026 09:08

lifelongeeek marked this pull request as ready for review March 10, 2026 06:37

lifelongeeek force-pushed the feat/solar-open-support branch from 30ff163 to 25c6d95 Compare March 11, 2026 02:45

ahimsh-aws reviewed Mar 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Contribution] SolarOpenForCausalLM Support#65

[Contribution] SolarOpenForCausalLM Support#65
lifelongeeek wants to merge 10 commits intoaws-neuron:mainfrom
lifelongeeek:feat/solar-open-support

lifelongeeek commented Mar 10, 2026 •

edited

Loading

Uh oh!

ahimsh-aws Mar 14, 2026

Uh oh!

ahimsh-aws Mar 14, 2026

Uh oh!

ahimsh-aws left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lifelongeeek commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Model Information

Checklist

Required Components

Optional Components

Folder Structure

Testing

Compatibility

Related Issues

vLLM Integration

Uh oh!

ahimsh-aws Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

ahimsh-aws Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

ahimsh-aws left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lifelongeeek commented Mar 10, 2026 •

edited

Loading