NVIDIA · Phlip79 · Feb 10, 2026 · Feb 12, 2026 · Feb 12, 2026
diff --git a/docs/get-started/quickstart.md b/docs/get-started/quickstart.md
@@ -33,7 +33,7 @@ torchrun --nproc_per_node=2 examples/run_simple_mcore_train_loop.py
 
 ```bash
 # 8 GPUs, FP8 precision, mock data
-./examples/llama/train_llama3_8b_fp8.sh
+./examples/open_models/llama/train_llama3_8b_fp8.sh
 ```
 
 ## Data Preparation

diff --git a/docs/models/llms.md b/docs/models/llms.md
@@ -34,12 +34,10 @@ See the [Megatron Bridge supported models list](https://github.com/NVIDIA-NeMo/M
 ## Example Scripts
 
 Training examples for these models can be found in the `examples/` directory:
-- `examples/gpt3/` - GPT-3 training scripts
-- `examples/llama/` - LLaMA training scripts
-- `examples/mixtral/` - Mixtral MoE training
-- `examples/mamba/` - Mamba training scripts
-- `examples/bert/` - BERT training scripts
-- `examples/t5/` - T5 training scripts
+- `examples/open_models/gpt3/` - GPT-3 training scripts
+- `examples/open_models/llama/` - LLaMA training scripts
+- `examples/open_models/mamba/` - Mamba training scripts
+- `examples/open_models/t5/` - T5 training scripts
 
 ## Model Implementation
 

diff --git a/docs/models/multimodal.md b/docs/models/multimodal.md
@@ -14,7 +14,7 @@ Megatron Core supports multimodal models that combine language with vision, audi
 - Unified embedding space across modalities
 - Support for both vision-language and audio-vision-language models
 
-See [examples/mimo](https://github.com/NVIDIA/Megatron-LM/tree/main/examples/mimo) for training scripts and examples.
+See [examples/open_models/mimo](https://github.com/NVIDIA/Megatron-LM/tree/main/examples/open_models/mimo) for training scripts and examples.
 
 ## Vision-Language Models
 
@@ -52,7 +52,7 @@ For multimodal diffusion models (image generation, text-to-image, etc.), see [Ne
 Multimodal training examples can be found in the following directories:
 
 **MIMO Framework:**
-- `examples/mimo/` - Multimodal In/Out training with support for vision-language and audio-vision-language models
+- `examples/open_models/mimo/` - Multimodal In/Out training with support for vision-language and audio-vision-language models
 
 **Specific Multimodal Models:**
 - `examples/multimodal/` - LLaVA-style training with Mistral + CLIP

diff --git a/docs/user-guide/training-examples.md b/docs/user-guide/training-examples.md
@@ -24,7 +24,7 @@ This example:
 Train LLaMA-3 8B model with FP8 mixed precision on 8 GPUs:
 
 ```bash
-./examples/llama/train_llama3_8b_fp8.sh
+./examples/open_models/llama/train_llama3_8b_fp8.sh
 ```
 
 **Configuration:**

diff --git a/examples/bert/README.md b/examples/bert/README.md
diff --git a/examples/bert/train_bert_340m_distributed.sh b/examples/bert/train_bert_340m_distributed.sh
diff --git a/examples/mixtral/README.md b/examples/mixtral/README.md