Skip to content

[Contribution] SolarOpenForCausalLM Support#65

Open
lifelongeeek wants to merge 10 commits intoaws-neuron:mainfrom
lifelongeeek:feat/solar-open-support
Open

[Contribution] SolarOpenForCausalLM Support#65
lifelongeeek wants to merge 10 commits intoaws-neuron:mainfrom
lifelongeeek:feat/solar-open-support

Conversation

@lifelongeeek
Copy link

@lifelongeeek lifelongeeek commented Mar 10, 2026

Description

This PR contributes NeuronX Distributed Inference (NxDI) support for Solar Open 100B MoE (upstage/Solar-Open-100B) following the contrib/ contribution pattern.

Solar Open shares the GLM-4.5 MoE routing architecture but is distinct in that all layers are MoE (first_k_dense_replace=0), uses full RoPE (partial_rotary_factor=1.0), and has no QK-norm or attention bias. The model has been merged into transformers main but is not yet in the current stable release; config and weights are loaded via a direct JSON loader until the stable release ships.

Model Information

Model Name: Solar Open 100B (SolarOpenForCausalLM)

Model Architecture: Decoder-only MoE transformer. All layers are MoE (first_k_dense_replace=0), full RoPE (partial_rotary_factor=1.0), no QK-norm, no attention bias.

Purpose: Text generation

Checklist

Please ensure your PR includes the following items. Refer to the contrib/CONTRIBUTING.md for detailed guidelines.

Required Components

  • Accuracy Test (ex. test/integration/test_model.py)

    • At least one integration test that validates model accuracy
    • Uses logit validation or equivalent accuracy verification
    • Test can compile and run the model on Neuron
  • README.md with the following sections:

    • Usage Example: Clear code example showing how to use the model
    • Compatibility Matrix: Table showing tested Neuron SDK versions and instance types (Trn1/Trn2/Inf2)
    • Example Checkpoints: Links to compatible model checkpoints (e.g., HuggingFace Hub)
    • Testing Instructions: Command to run the test suite for the model
  • Source Code (src/)

    • Modeling code following NxD Inference patterns
    • Properly structured in the contrib folder hierarchy

Optional Components

  • Unit Tests (CPU or Neuron-based)
    • Tests for individual modeling components
    • Located in test/unit/ directory

Folder Structure

contrib/models/solar_open/
├── src/solar_open/
│   ├── __init__.py
│   └── modeling_solar_open.py      # NeuronSolarOpenForCausalLM + config + loader
├── test/
│   ├── conftest.py
│   ├── integration/
│   │   ├── config_solar_open_2layers.json
│   │   ├── utils.py                # SolarOpenReferenceModel, get_neuron_config
│   │   └── test_model.py           # smoke, logit accuracy, performance
│   └── unit/
│       ├── test_router.py          # 10 tests
│       ├── test_attention.py       # 8 tests
│       └── test_decoder.py         # 16 tests
├── examples/generation_solar_open_demo.py
└── README.md

Top-level production benchmark script:

examples/generation_solar_open.py   # tp_degree=32, moe_tp_degree=4, moe_ep_degree=8, seq_len=65536

Testing

How did you test this change?

  • Unit tests for all model components (router, attention, decoder)
  • Integration test validating logit accuracy against a CPU reference model on a 2-layer toy config
  • End-to-end text generation on a trn1.32xlarge instance with the full 100B model

Test Results:

Unit tests (CPU):

$ python -m pytest contrib/models/solar_open/test/unit/ -v
...
40 passed in 6.30s

Integration tests (Neuron hardware, 2-layer toy model):

$ python -m pytest contrib/models/solar_open/test/integration/ -v
...
5 passed in 20s

End-to-end generation (trn1.32xlarge, 100B full model):

Prompt: "Could you explain the concept of quantum computing in simple terms?"

================== NxDI Hardware Generation Result (100B Full-Layer) ==================
<|think|>The user asks: "Could you explain the concept of quantum computing in simple terms?"
This is a straightforward request for an explanation. The system prompt says we are Solar Open 100B,
a large language model trained by Upstage AI, Korean startup, knowledge cutoff 2025-07, current date
2026-03-09. There's no disallowed content. So we can comply. Provide a simple explanation of quantum
computing, perhaps analogies, basic concepts: qubits, superposition, entanglement, quantum gates,
quantum algorithms, differences from classical computing, potential applications, challenges. Keep it
simple, accessible. Possibly mention that it's still early stage, but promising. Use Korean? The user
wrote in English. So respond in English. Provide a clear, simple explanation. Possibly include analogies
like coin flipping, Schrödinger's cat, etc. Also mention that quantum computers can solve certain problems
faster, like factoring large numbers (Shor's algorithm), searching unsorted databases (Grover's algorithm),
simulating quantum systems. Also mention that building quantum computers is hard due to decoherence, error
correction, need for low temperature, etc. Provide a simple summary.

We need to ensure we comply with policy. There's no disallowed content. So we can comply.

We should be mindful of the date: knowledge cutoff 2025-07, current date 2026-03-09. So we can mention that
as of 2025, quantum computing is still in early stages, but progress is being made. Possibly mention recent
developments up to 2025. But we must not hallucinate beyond knowledge cutoff. We can say "as of my last
training data up to July 2025..."
======================================================================================

Compatibility

Tested with:

  • Neuron SDK Version(s): 2.27
  • Instance Type(s): trn1.32xlarge (full 100B model), trn2.3xlarge (2-layer integration test)
  • PyTorch Version: 2.8.0
  • Python Version: 3.12.3

Related Issues

vLLM Integration

  • This model/feature is intended for use with vLLM
  • Documentation includes vLLM registration instructions

For vLLM integration details, see: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/onboarding-models.html#nxdi-onboarding-models-vllm


By submitting this PR, I confirm that:

  • I have read and followed the contributing guidelines
  • This is a community contribution and may have limited testing compared to officially-supported models
  • The code follows best practices and is well-documented
  • All required components listed above are included

circle-jin and others added 9 commits February 19, 2026 09:08
Adds NXD inference support for Solar Open MoE (SolarOpenForCausalLM) models
from upstage. Solar Open is a 100B-parameter MoE model sharing the DeepSeek
routing architecture but with distinct weight layout and RoPE configuration.

Key components:
- NeuronSolarOpenForCausalLM: top-level CausalLM model class
- NeuronSolarOpenModel: transformer body (all layers are MoE, first_k_dense_replace=0)
- NeuronSolarOpenAttention: multi-head GQA with full RoPE (partial_rotary_factor=1.0)
  and optional YaRN scaling (SolarOpenYarnRotaryEmbedding)
- NeuronSolarOpenDecoderLayer: decoder layer with MoE MLP
- SolarOpenInferenceConfig: config loader with field mapping and defaults for
  fields absent from the upstage/Solar-Open-100B config.json
- NeuronSolarOpenRouter: reuses GLM-4.5 group-limited routing logic
- initialize_solar_open_moe_module: wires router + ExpertMLPsV2 + SharedExperts

Weight conversion: HF per-expert format -> NXD fused format
  HF: mlp.experts.{e}.{gate,up}_proj.weight [I, H]  (per-expert)
  NXD: mlp.experts.gate_up_proj [E, H, 2I]           (fused)

Supports YaRN RoPE scaling (factor=2.0, original_max_position_embeddings=65536)
as used in upstage/Solar-Open-100B.

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
- examples/generation_solar_open_demo.py: compile + run inference demo
  for the tiny random Solar Open model (tp_degree=4, moe_tp_degree=4)
- examples/generation_solar_open_100b_demo.py: compile + run inference demo
  for upstage/Solar-Open-100B (requires trn2.48xlarge or larger)
- test_solar_open_accuracy.py: CPU vs Neuron token-matching accuracy test
  with standalone PyTorch reference implementation; verified passing
  (10/10 tokens match with greedy decoding on tiny random model)
- test_solar_open_100b_accuracy.py: CPU vs Neuron accuracy test for the
  full 100B model; includes YaRN RoPE CPU reference implementation
- create_solar_open_tiny_random.py: creates a small random-weight Solar Open
  checkpoint matching the 100B architecture for local testing
- create_solar_open_100b_random.py: creates a 2-layer random-weight checkpoint
  with full 100B dimensions (128 experts, hidden_size=4096) for integration
  testing on larger instances

Hardware note: upstage/Solar-Open-100B (48 layers, 128 experts) requires
~168 GB total weights; trn2.48xlarge (64 NeuronCores) recommended.

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
- docs/solar_open_implementation.md: architecture overview, module breakdown,
  weight conversion details (per-expert HF -> fused NXD), YaRN RoPE notes,
  and sharding configuration guide
- docs/solar_open_testing.md: step-by-step testing guide with tiny random
  model, expected outputs, and troubleshooting notes
- docs/solar_open_100b.md: full experiment report for upstage/Solar-Open-100B
  including discovered library limitations (EP + token generation unsupported),
  hardware requirements analysis, and runbook for large instance deployment

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
…pter.py

tensor_capture_hook was referenced in prepare_inputs_for_generation()
but never initialized from kwargs, causing a NameError at runtime.
input_capture_hook (the correct variable) is already present and
extracted from kwargs on L265. Removed the dangling tensor_capture_hook
entry from the model_inputs dict.
- Remove src/neuronx_distributed_inference/models/solar_open/ (modeling + __init__)
- Remove root-level scripts: create_solar_open_*.py, test_solar_open_*.py
- Remove examples/generation_solar_open_demo.py, generation_solar_open_100b_demo.py
- Remove docs/solar_open_*.md
All files are now in contrib/models/solar_open/ following the NxDI contrib pattern.
… README

Restructures Solar Open support to follow the NxDI contrib/ contribution pattern,
identical to what was done for GLM-4.5 MoE (PR#58).

contrib/models/solar_open/
├── src/solar_open/
│   ├── __init__.py
│   └── modeling_solar_open.py  (NeuronSolarOpenForCausalLM + config + loader)
├── test/
│   ├── conftest.py             (session-scoped fixtures)
│   ├── integration/
│   │   ├── config_solar_open_2layers.json
│   │   ├── utils.py            (SolarOpenReferenceModel, get_neuron_config)
│   │   └── test_model.py       (smoke, output shape, determinism)
│   └── unit/
│       ├── test_router.py      (10 tests, object.__new__ bypass pattern)
│       ├── test_attention.py   (8 tests, inspect-based)
│       └── test_decoder.py     (16 tests, all-MoE architecture verification)
├── examples/generation_solar_open_demo.py
└── README.md

examples/generation_solar_open.py  (top-level production benchmark script)

Key Solar Open specifics:
- NOT in transformers: uses load_solar_open_config() custom JSON loader
- first_k_dense_replace=0: ALL layers are MoE (no dense branch)
- Full RoPE (partial_rotary_factor=1.0), no QK norm, no attention bias
- Router: sigmoid + group routing + e_score_correction_bias (identical to GLM-4.5)
- Production config: tp_degree=32, moe_tp_degree=4, moe_ep_degree=8

Unit tests: 40/40 PASSED (CPU, no Neuron hardware required)
- modeling_solar_open.py: add transformers_version to SolarOpenInferenceConfig
  so HuggingFaceGenerationAdapter does not propagate None into generation_config
- modeling_solar_open.py: override _construct_output to unwrap list/tuple logits
  returned by NxDI Neuron runtime into a single tensor (required for hf_adapter
  logits slicing in _sample)
- test/conftest.py: copy generation_config.json to traced_dir alongside weights
- test/integration/utils.py: generate generation_config.json in model_dir;
  handle list/tuple logits in check_logit_accuracy
- test/integration/test_model.py: patch adapter.generation_config.transformers_version
  as fallback safety guard; all 5 integration tests now pass
Solar Open has been merged into transformers main
(https://github.com/huggingface/transformers/blob/main/src/transformers/models/solar_open/)
but is not yet available in stable releases (≤4.56.2). Update all comments
and docstrings that incorrectly stated it was absent from transformers entirely.

Also update PR#3 description table and Architecture Notes section accordingly.
- load_hf_model(): use SolarOpenForCausalLM.from_pretrained() instead of
  loading safetensors directly (transformers >= 5.0.0 includes solar_open)
- load_solar_open_config(): use SolarOpenConfig.from_pretrained() with
  rope_parameters -> rope_theta/rope_scaling conversion for NxDI compat
- Fix transformers 5.0 rename: SampleDecoderOnlyOutput -> GenerateDecoderOnlyOutput
- test/integration/utils.py: replace 300-line SolarOpenReferenceModel with
  SolarOpenForCausalLM as CPU reference; create_tiny_solar_open_model() now
  uses save_pretrained() (auto-writes config.json + generation_config.json);
  check_text_accuracy() uses logit MAE vs SolarOpenForCausalLM
- test/conftest.py: add transformers 5.0 compat shims (utils.fx stub,
  SampleDecoderOnlyOutput alias); remove config_solar_open_2layers.json dep
- test/integration/test_model.py: update accuracy test docstring/args
@lifelongeeek lifelongeeek marked this pull request as ready for review March 10, 2026 06:37
- Remove examples/generation_solar_open.py (A) — generation_solar_open_demo.py
  (B) under contrib/models/solar_open/examples/ is the canonical demo script
- Revert src/neuronx_distributed_inference/utils/hf_adapter.py to upstream state
- contrib/models/solar_open/test/conftest.py: add shim 3 to work around
  hf_adapter.py upstream issue where tensor_capture_hook is undefined —
  (a) inject None into module globals to resolve the NameError via LOAD_GLOBAL,
  (b) wrap prepare_inputs_for_generation to strip tensor_capture_hook from
      model_inputs so NeuronBaseForCausalLM.forward() does not receive it
- Update generation_solar_open_demo.py: remove outdated 'not in transformers'
  comment, add generation_config.json to the post-compile copy step, add
  production 100B config hints in argparse help and get_neuron_config()
@lifelongeeek lifelongeeek force-pushed the feat/solar-open-support branch from 30ff163 to 25c6d95 Compare March 11, 2026 02:45
model.load(traced_model_path)
```

See `examples/generation_solar_open_demo.py` for a full end-to-end example, or `../../examples/generation_solar_open.py` for the production benchmark script.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unavailable file - ../../examples/generation_solar_open.py

- **Parameters:** ~100B total, ~22B active per token
- **License:** Check HuggingFace model card

> **Note:** Solar Open is **not** available in the `transformers` library. The model config and weights are loaded directly from the HuggingFace checkpoint using custom loaders (`load_solar_open_config`).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: As you mentioned in the generation code. It is available in transformers (>= 5.0.0).

Copy link

@ahimsh-aws ahimsh-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! Just some nit comments to address.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants