Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
9683d6e
fix optimizer CP restore
tianshub Feb 11, 2026
e06c7ed
[Tunix] Add log_level config to SglangJaxSampler.
lc5211 Feb 11, 2026
84b7f82
[Tunix] Add trajectory logging to agentic GRPO learner.
lc5211 Feb 11, 2026
e06c593
remove max_steps from trajectory_collect_engine
tianshub Feb 11, 2026
279b975
use absl logging across the repo
tianshub Feb 11, 2026
47a9f25
feat: log rollout and train time at micro batch level.
jiangyangmu Feb 11, 2026
5e54d57
Code update
s-noghabi Feb 12, 2026
f43c85a
[Tunix] Remove upper bound on JAX version in tunix prod dependencies.
wang2yn84 Feb 12, 2026
88f08c6
Refactor GRPO rollout to simplify grouping and avoid deepcopy
hgao327 Feb 12, 2026
7141c9f
chore: Migrate gsutil usage to gcloud storage
gurusai-voleti Feb 12, 2026
0a98e91
Copybara import of the project:
gagika Feb 12, 2026
f739ee6
[Tunix] Pad the number of heads for projection bias.
wang2yn84 Feb 12, 2026
035373c
Remove max_open_buckets from GroupQueueManager
hgao327 Feb 12, 2026
30e37dc
Merge pull request #1082 from gurusai-voleti:ai-gsutil-migration-4bd8…
a-googler Feb 13, 2026
d19ecbb
add log_level in SglangJaxConfig and update default page_size from 64…
aolemila Feb 13, 2026
bbba936
Merge pull request #1090 from google:tiny/update-sglangjax-arguments
a-googler Feb 13, 2026
ea538fd
fix potential race condition on dictionary update
tianshub Feb 13, 2026
2fa731e
update doc string and error message
s-noghabi Feb 13, 2026
b46a716
Supports padding kwargs for samplers.
wang2yn84 Feb 13, 2026
2567514
Copybara import of the project:
NicoGrande Feb 13, 2026
0a19790
Merge pull request #1095 from google:lance-fix-kwargs
a-googler Feb 13, 2026
2c5f81b
[Tunix]: Skip the already trained data on job resume.
lc5211 Feb 13, 2026
9f2ac84
[Tunix] Refactor DeepScaler training script to support different roll…
wang2yn84 Feb 14, 2026
ec0a820
disable perf metrics by default in the cli.
s-noghabi Feb 14, 2026
7c73390
minor update
tianshub Feb 14, 2026
50b81a6
Added a GPU demo for PEFT with QLoRA on Llama 3_1
katjasrz Feb 17, 2026
ea6951b
Merge pull request #1105 from katjasrz:main
a-googler Feb 17, 2026
c81a5c1
Adds vllm logging capability to vllm async driver.
wang2yn84 Feb 17, 2026
14e9676
Merge pull request #1107 from google:lance-vllm-logging
a-googler Feb 17, 2026
21d9ce8
Use Exception instead BaseException
wang2yn84 Feb 17, 2026
0f80da8
Merge pull request #1108 from google:lance-vllm-logging1
a-googler Feb 17, 2026
5454984
fix loss mask for agentic learner
tianshub Feb 18, 2026
62a5d5e
Refactor.
wang2yn84 Feb 17, 2026
6808337
Set the max worker number in asyncio loop.
wang2yn84 Feb 18, 2026
b4ff700
Merge pull request #1111 from google:lance-fix-concurrency
a-googler Feb 18, 2026
ee8a07c
Allow config_id as an alternative model_id to automodel
s-noghabi Feb 18, 2026
769f42f
add comment clarifying micro batch has to be 1
s-noghabi Feb 18, 2026
7534565
[Tunix] This change adds unique metadata IDs to each cell in the Jupy…
lc5211 Feb 18, 2026
6314324
Merge pull request #1110 from google:lance-refactor-initvar
a-googler Feb 18, 2026
d3f81ac
forbidden_tokens in sampler call accepts token IDs instead of strings.
galenmandrew Feb 18, 2026
707db58
simplify trajectory result processing
tianshub Feb 18, 2026
4376cf0
fix group_id and pair_idx in traj
tianshub Feb 19, 2026
fb9c752
Add Colab and Kaggle badges to qlora_llam3_gpu example tutorial
rajasekharporeddy Feb 19, 2026
acb5eb4
Merge pull request #1122 from rajasekharporeddy:colab_badge
a-googler Feb 19, 2026
b3e3ffc
[Tunix] mprove GCS CSV writing.
lc5211 Feb 19, 2026
022a17c
Creates a test notebook to use an agent-sandbox instead of a pod
igooch Feb 6, 2026
a3703b7
Adds test that starts and runs in multiple sandboxes
igooch Feb 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,10 @@
.idea
.vscode
.envrc
tmp/

# virtualenv/venv directories
**/.venv/
/venv/
/bin/
/include/
Expand Down
1 change: 1 addition & 0 deletions .python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.12.9
40 changes: 22 additions & 18 deletions docs/models.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,15 +62,15 @@ Adding the new model needs to following the naming convention that Tunix support
## AutoModel

`AutoModel` provides a unified interface for instantiating Tunix models from
pretrained checkpoints, similar to the Hugging Face `AutoModel` API. It allows
pretrained checkpoints, similar to the Huggingface `AutoModel` API. It allows
you to load a model simply by providing its `model_id`, handling the download
and initialization for you.

### Basic Usage

To load a model, use the `AutoModel.from_pretrained` method with the model
identifier and your JAX sharding mesh. By default this will download the model
from HuggingFace.
from Huggingface.

```python
from tunix.models.automodel import AutoModel
Expand All @@ -80,9 +80,9 @@ import jax
mesh = jax.make_mesh((1, 1), ("fsdp", "tp"), axis_types=(jax.sharding.AxisType.Auto,) * 2)

# 2. Load the model
# By default, this downloads from Hugging Face.
# By default, this downloads from Huggingface.
model, model_path = AutoModel.from_pretrained(
model_id="google/gemma-2-2b-it",
model_id="google/gemma-2-2b-it", # Using HF id as model_id
mesh=mesh
)

Expand All @@ -94,20 +94,19 @@ print(f"Model loaded from: {model_path}")
You can load models from different sources (e.g., Kaggle, GCS, etc.) using the
`model_source` argument.

#### From HuggingFace:
#### From Huggingface:

This is the default choice (`ModelSource.HUGGINGFACE`) as shown in the
example above.

#### From Kaggle:

For Kaggle, you must provide the `model_id` which is the Hugging Face identifier
(to determine the model configuration) and the `model_path` which is the Kaggle
For Kaggle, you must provide the `model_id` which is the Huggingface identifier or model_config_id (see [Naming Conventions](models.md#naming-conventions)) to determine the model configuration and the `model_path` which is the Kaggle
Hub model identifier (used to download the model from Kaggle).

```python
model, model_path = AutoModel.from_pretrained(
model_id="google/gemma2-2b-it",
model_id="gemma2_2b_it", # Using model_config_id as model_id
mesh=mesh,
model_source=ModelSource.KAGGLE,
model_path="google/gemma-2/flax/gemma2-2b-it",
Expand All @@ -120,13 +119,12 @@ For example the `model_path` for the `google/gemma-2/flax/gemma2-2b-it` is extra

#### From GCS:

For GCS, you must provide the `model_id` which is the Hugging Face identifier
(to determine the model configuration) and the `model_path` (the actual GCS
For GCS, you must provide the `model_id` which is the Huggingface identifier or model_config_id (see [Naming Conventions](models.md#naming-conventions)) to determine the model configuration and the `model_path` (the actual GCS
location).

```python
model, model_path = AutoModel.from_pretrained(
model_id="google/gemma-2-2b-it",
model_id="gemma2_2b_it", # Using model_config_id as model_id
mesh=mesh,
model_source=ModelSource.GCS,
model_path="gs://my-bucket/gemma-2-2b-it"
Expand All @@ -139,7 +137,7 @@ Optionally, you can also provide the `model_download_path` argument, which
specifies where the model is to be downloaded to. Depending on the
`model_source` the effect of specifying this variable is different:

* **Hugging Face**: Files are downloaded directly to this directory.
* **Huggingface**: Files are downloaded directly to this directory.
* **Kaggle**: Sets the `KAGGLEHUB_CACHE` environment variable to this path.
* **GCS**: No-op.
* **Internal**: Files are copied to this directory. If omitted, the model is loaded directly from the `model_path`. This mode (Internal) is not supported in OSS version.
Expand All @@ -148,21 +146,27 @@ specifies where the model is to be downloaded to. Depending on the

This section outlines the naming conventions used within Tunix for model
identification and configuration. These conventions ensure consistency when
loading models from various sources like Hugging Face or Kaggle.
loading models from various sources like Huggingface or Kaggle.

The `ModelNaming` dataclass handles the parsing and standardization of model names.

* **`model_id`**: The full model name identifier (case sensitive), as it appears
on Hugging Face, including the parent directory. For example,
* **`model_id`**: This is a unique identifier used to identifty the model in mind and extract the family, version, and desired config from. Tunix support two identifiers as the `model_id`:
1. **Huggingface (HF) IDs:** The full model name identifier (case sensitive), as it appears
on Huggingface, including the parent directory.
* **Extracting model_id from HF**: For example,
`meta-llama/Llama-3.1-8B` is extracted as shown below:
![Hugging Face extracting Model ID](images/model_id_huggingface.png){: width="75%"}
![Huggingface extracting Model ID](images/model_id_huggingface.png){: width="75%"}

2. **Native Tunix model_configs:** the `model_config_id` representing the exact config from the model class can be used directly as the `model_id`. In this case it will also be treated as the `model_name`.
* **Extracting model_id from model_config_id**: In this case, you would need to refer to the source code (`model.py`) for each model family and select the config id from the `ModelConfig` class, for example `llama3p1_8b` from the llama [model code](https://github.com/google/tunix/blob/main/models/llama3/model.py;bpv=1;bpt=1;l=138).


* **`model_name`**: The unique full name identifier of the model. This
corresponds to the full name and should match exactly with the model name
used in Hugging Face or Kaggle. It is typically all lowercase and formatted
as `<model-family>-<model-version>`.
* *Example*: `gemma-2b`, `llama-3.1-8b`, `gemma2-2b-it`.
as `<model-family>-<model-version>` (when HF is used for model_id) or `<model-family>_<model-version>` (when model_config_id is used for model_id) .
* *Example for HF as model_id*: `gemma-2b`, `llama-3.1-8b`, `gemma-2-2b-it`.
* *Example for model_config_id as model_id*: `gemma_2b`, `llama3p1_8b`, `gemma2_2b_it`.

* **`model_family`**: The standardized model family. Unnecessary hyphens are
removed, and versions are standardized (e.g., replacing dot with `p`).
Expand Down
2 changes: 1 addition & 1 deletion docs/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,7 @@ Next, we load the English-French translation dataset. Note you can use your own
datasets too (PyGrain, Hugging Face dataset, TFDS, etc.).

```sh
gsutil cp gs://gemma-data/tokenizers/tokenizer_gemma3.model .
gcloud storage cp gs://gemma-data/tokenizers/tokenizer_gemma3.model .
```

```python
Expand Down
11 changes: 11 additions & 0 deletions examples/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,9 +68,20 @@ Create a v5litepod-8 TPU VM in GCE:

Reference: `TPU Runtime Versions <https://docs.cloud.google.com/tpu/docs/runtimes?hl=en&_gl=1*1tpeg3j*_ga*MTk1NzE5MjMyNy4xNzYwOTEwNjk3*_ga_WH2QY8WWF5*czE3NjIxNTU1OTEkbzE3JGcwJHQxNzYyMTU1NTkxJGo2MCRsMCRoMA..#training-v5p-v5e>`_

```
gcloud compute tpus tpu-vm create v5-8 \
--zone=us-west1-c \
--accelerator-type=v5litepod-8 \
--version=v2-alpha-tpuv5-lite
```

2. Configure VM
~~~~~~~~~~~~~~~~

```
gcloud compute tpus tpu-vm ssh --zone "us-west1-c" "v5-8"
```

SSH into the VM using the supplied gcloud command, then run:

.. code-block:: bash
Expand Down
Loading