diff --git a/skills/tinker/SKILL.md b/skills/tinker/SKILL.md new file mode 100644 index 0000000..f614596 --- /dev/null +++ b/skills/tinker/SKILL.md @@ -0,0 +1,348 @@ +--- +name: tinker +description: > + Use this skill whenever the user mentions Tinker, tinker CLI, + training runs, checkpoints, model fine-tuning with Tinker, + tinker-cookbook, tinker recipes, or any Thinking Machines AI + SDK operations. Also trigger when users ask about + listing/inspecting/downloading/deleting training checkpoints, + pushing checkpoints to HuggingFace, managing checkpoint TTL, + configuring post-training pipelines (SFT, RL, math RL, + code RL, distillation, preference learning, RLHF, tool use + training, multi-agent RL, prompt distillation, rubric + grading, VLM classification, Harbor RL), or working with + tinker:// paths. Use this skill even if the user just + mentions "tinker" in passing — it covers the full Tinker + ecosystem including CLI, Python SDK, and cookbook recipes. +--- + +# Tinker SDK Skill + +Tinker is an ML platform SDK by Thinking Machines AI for +managing training runs, model checkpoints, and fine-tuning +workflows. This skill covers the CLI, Python SDK, and the +tinker-cookbook training recipes. + +**Source repos:** +- SDK: https://github.com/thinking-machines-lab/tinker +- Cookbook: https://github.com/thinking-machines-lab/tinker-cookbook +- This skill: https://github.com/zjrwtx/max_skills + +**IMPORTANT — Always use the latest version:** +Before running any Tinker command or cookbook recipe, +ensure the latest version is installed: +```bash +uv pip install --upgrade tinker +# For cookbook, pull latest and reinstall: +cd && git pull && uv pip install -e . +``` +When you need more detailed information about API +internals, recipe implementations, or SDK source code, +always check the latest code from these repos — do NOT +rely on cached or outdated knowledge. Clone or browse +the repos directly to get up-to-date APIs and options. + +## Quick Start + +### Authentication + +```bash +# Option 1: Environment variable (preferred) +export TINKER_API_KEY="your-api-key" + +# Option 2: Config file (~/.tinker/config.json) +mkdir -p ~/.tinker +echo '{"api_key": "your-api-key"}' > ~/.tinker/config.json +``` + +### Verify Installation + +```bash +tinker version +tinker run list --limit 3 +``` + +### Tinker Path Format + +All checkpoint operations use **tinker paths**: + +``` +tinker://// +``` + +- `TYPE`: `weights` (training) or `sampler_weights` (sampler) +- Example: `tinker://run-abc123/weights/00040` + +--- + +## CLI Commands + +### Global Options + +- `--format [table|json]` or `-f` — output format + (default: table) +- `-h` / `--help` — help on any command + +### Run Commands + +```bash +# List training runs (default: 20, use --limit=0 for all) +tinker run list [--limit N] [-c COLUMNS] + +# Available columns: +# id, model, owner, lora, updated, status, +# checkpoint, checkpoint_time +# Default columns: id, model, lora, updated, status + +# Show detailed info for a specific run +tinker run info +``` + +### Checkpoint Commands + +```bash +# List checkpoints (all runs, or filter by --run-id) +tinker checkpoint list [--run-id ID] [--limit N] + +# Show checkpoint details +tinker checkpoint info + +# Download and extract checkpoint locally +tinker checkpoint download \ + [-o OUTPUT_DIR] [--force] + +# Toggle public access +tinker checkpoint publish +tinker checkpoint unpublish + +# Set or remove expiration (TTL in seconds) +tinker checkpoint set-ttl --ttl 604800 +tinker checkpoint set-ttl --remove + +# Delete checkpoints (by path or by filters) +tinker checkpoint delete [PATH2 ...] [-y] +tinker checkpoint delete --run-id \ + [--type weights|sampler_weights] \ + [--before DATE] [--after DATE] [-y] + +# Push checkpoint to HuggingFace Hub +tinker checkpoint push-hf \ + [-r REPO_ID] [--public] [--revision REV] \ + [--commit-message MSG] [--create-pr] \ + [--allow-pattern PAT] [--ignore-pattern PAT] \ + [--no-model-card] +``` + +> For full flag details and output format examples, +> read `references/cli-reference.md`. + +--- + +## Common Workflows + +### 1. Find and Download a Checkpoint + +```bash +# Step 1: Find your training run +tinker run list + +# Step 2: Inspect the run +tinker run info + +# Step 3: List available checkpoints +tinker checkpoint list --run-id + +# Step 4: Download +tinker checkpoint download \ + tinker:///weights/ \ + -o ./models/ --force +``` + +### 2. Push a Checkpoint to HuggingFace + +```bash +# Prerequisite: authenticate with HF +# pip install huggingface_hub && hf auth login + +# Push as public PEFT adapter +tinker checkpoint push-hf \ + tinker:///sampler_weights/ \ + -r myorg/my-lora --public + +# Or create a PR instead of direct push +tinker checkpoint push-hf \ + tinker:///sampler_weights/ \ + -r myorg/my-lora --create-pr +``` + +### 3. Clean Up Old Checkpoints + +```bash +# Delete checkpoints older than a date +tinker checkpoint delete --run-id \ + --type weights --before 2025-01-01 -y + +# Delete specific checkpoints +tinker checkpoint delete \ + tinker:///weights/0001 \ + tinker:///weights/0002 -y +``` + +### 4. Scripting with JSON Output + +```bash +# Export all runs as JSON +tinker --format json run list --limit=0 > runs.json + +# Parse with jq +jq '.runs[].training_run_id' runs.json + +# Batch list checkpoints per run +for rid in $(jq -r '.runs[].training_run_id' runs.json) +do + tinker --format json checkpoint list --run-id "$rid" +done +``` + +--- + +## Cookbook Recipes + +The tinker-cookbook provides ready-to-use training recipes. +Repo: https://github.com/thinking-machines-lab/tinker-cookbook + +### Recipe Architecture + +Every recipe follows the same pattern: + +```python +import chz +from tinker_cookbook.rl import train # or supervised + +# 1. Build a typed config via chz.Blueprint +def build_config_blueprint() -> chz.Blueprint[train.Config]: + return chz.Blueprint(train.Config).apply({ + "model_name": "meta-llama/Llama-3.1-8B", + "learning_rate": 2e-4, + ... + }) + +# 2. Run the training loop +def main(config): + asyncio.run(train.main(config)) + +# 3. CLI entry point with chz overrides +if __name__ == "__main__": + bp = build_config_blueprint() + bp.make_from_argv(sys.argv[1:]) + main(bp.make()) +``` + +Override any config field from the command line: +```bash +python -m tinker_cookbook.recipes.sl_basic \ + --model_name "Qwen/Qwen3-8B" \ + --learning_rate 1e-4 \ + --log_path /tmp/my-run +``` + +### Running SFT (Supervised Fine-Tuning) + +```bash +# Minimal SFT on NoRobots dataset +python -m tinker_cookbook.recipes.sl_basic + +# With custom dataset (JSONL of conversations) +# Edit sl_basic.py to use FromConversationFileBuilder: +# file_path="/path/to/conversations.jsonl" +# Format: same as example_data/conversations.jsonl +``` + +### Running RL Training + +```bash +# Math RL on GSM8K +python -m tinker_cookbook.recipes.rl_basic + +# Override hyperparameters +python -m tinker_cookbook.recipes.rl_basic \ + --learning_rate 4e-5 \ + --max_tokens 256 +``` + +### Available Recipes + +| Recipe | Type | Use Case | +|--------|------|----------| +| `sl_basic` | SFT | Minimal SFT template | +| `rl_basic` | RL | Minimal RL template | +| `chat_sl/` | SFT | Conversations (Tulu3) | +| `math_rl/` | RL | Math reasoning (GSM8K) | +| `code_rl/` | RL | Code (sandboxed exec) | +| `preference/` | RLHF | SFT → reward → RL | +| `search_tool/` | RL | Retrieval tool use | +| `distillation/` | SFT/RL | Teacher→student | +| `prompt_distillation/` | SFT | Internalize prompts | +| `multiplayer_rl/` | RL | Self-play / multi-agent | +| `rubric/` | RL | LLM grader rubrics | +| `verifiers_rl/` | RL | Community envs | +| `vlm_classifier/` | SFT | Vision-language | +| `harbor_rl/` | RL | Terminal/SWE tasks | + +### Key Utilities + +```python +from tinker_cookbook import model_info + +# Get the right renderer for a model +renderer = model_info.get_recommended_renderer_name( + "meta-llama/Llama-3.1-8B" +) + +# Checkpoint save/resume +from tinker_cookbook import checkpoint_utils +resume = checkpoint_utils.get_last_checkpoint(log_path) +``` + +### Supported Models + +Llama 3.x, Qwen 3/3.5, DeepSeek V3, Nemotron 3, +Kimi K2/K2.5, GPT-OSS, and 30+ more. Each model has +a recommended renderer in `model_info.py`. + +> For recipe deep-dives, renderer details, dataset +> builder patterns, and RL environment setup, read +> `references/cookbook-recipes.md`. + +--- + +## Quick Troubleshooting + +| Problem | Fix | +|---------|-----| +| Auth failure | Check `TINKER_API_KEY` or `~/.tinker/config.json` | +| Checkpoint not found | Verify path format `tinker://RUN/TYPE/STEP`; list available with `tinker checkpoint list --run-id ID` | +| Download fails | Use `--force` to overwrite; check disk space | +| Cookbook import error | `uv pip install -e .` in cookbook dir; needs Python 3.10+ | +| chz override syntax | `--field value` (flat) or `--outer.inner value` (nested) | +| Rate limit | Wait and retry; reduce `--limit` for batch ops | +| HF push fails | Run `hf auth login`; install `huggingface_hub` | + +> For the full error catalog, read +> `references/troubleshooting.md`. + +--- + +## Detailed References + +When the SKILL.md cheat sheet is not enough: + +- **`references/cli-reference.md`** — Every flag, output + format example (table + JSON), exit codes, date format + rules, bulk delete filter logic +- **`references/cookbook-recipes.md`** — Per-recipe config + fields, renderer selection, dataset builder interface, + RL environment pattern, hyperparameter guidance +- **`references/troubleshooting.md`** — Extended error + catalog with 15+ error-to-fix mappings, network/proxy + issues, W&B integration, checkpoint corruption diff --git a/skills/tinker/references/cli-reference.md b/skills/tinker/references/cli-reference.md new file mode 100644 index 0000000..638ddf3 --- /dev/null +++ b/skills/tinker/references/cli-reference.md @@ -0,0 +1,365 @@ +# Tinker CLI — Full Command Reference + +This reference covers every CLI command, flag, output +format, and edge case. Read this when the SKILL.md cheat +sheet is not enough. + +## Table of Contents + +1. [Global Options](#global-options) +2. [tinker version](#tinker-version) +3. [tinker run list](#tinker-run-list) +4. [tinker run info](#tinker-run-info) +5. [tinker checkpoint list](#tinker-checkpoint-list) +6. [tinker checkpoint info](#tinker-checkpoint-info) +7. [tinker checkpoint download](#tinker-checkpoint-download) +8. [tinker checkpoint publish / unpublish](#publish--unpublish) +9. [tinker checkpoint set-ttl](#tinker-checkpoint-set-ttl) +10. [tinker checkpoint delete](#tinker-checkpoint-delete) +11. [tinker checkpoint push-hf](#tinker-checkpoint-push-hf) +12. [Tinker Path Anatomy](#tinker-path-anatomy) +13. [Exit Codes](#exit-codes) +14. [Output System](#output-system) + +--- + +## Global Options + +| Flag | Type | Default | Description | +|------|------|---------|-------------| +| `-f, --format` | `table\|json` | `table` | Output format | +| `-h, --help` | flag | — | Show help | + +Global `--format` is placed **before** the subcommand: +```bash +tinker --format json run list +``` + +--- + +## tinker version + +Show SDK version. No arguments. + +```bash +tinker version +# Output: tinker 0.8.0 +``` + +--- + +## tinker run list + +List training runs for the authenticated user. + +| Option | Type | Default | Description | +|--------|------|---------|-------------| +| `--limit` | int | 20 | Max runs (0 = all) | +| `-c, --columns` | str | `id,model,lora,updated,status` | Comma-separated columns | + +**Available columns:** +`id`, `model`, `owner`, `lora`, `updated`, `status`, +`checkpoint`, `checkpoint_time` + +**Table output:** +``` +Run ID Base Model LoRA Updated Status +────────────────────────────────────────────────────────────── +run-abc123 llama2-7b Rank 32 2 hours ago OK +run-def456 mistral-7b Rank 64 1 day ago OK +``` + +**JSON output:** +```json +{ + "runs": [ + { + "training_run_id": "run-abc123", + "base_model": "llama2-7b", + "model_owner": "user123", + "is_lora": true, + "lora_rank": 32, + "corrupted": false, + "last_request_time": "2024-03-27T15:30:00Z", + "last_checkpoint": { + "checkpoint_id": "weights/00040", + "checkpoint_type": "training", + "time": "2024-03-27T14:00:00Z", + "tinker_path": "tinker://run-abc123/weights/00040", + "size_bytes": 1073741824, + "public": false, + "expires_at": null + }, + "user_metadata": {"task": "instruction-tuning"} + } + ] +} +``` + +**Pagination:** Fetches in batches of 100. Title shows +count with hint (e.g., "20 runs (5 more not shown, use +--limit to see more)"). + +--- + +## tinker run info + +Show details for a single training run. + +| Argument | Required | Description | +|----------|----------|-------------| +| `RUN_ID` | yes | Training run ID | + +```bash +tinker run info run-abc123 +``` + +**Table output** shows key-value pairs: +``` +Training Run: run-abc123 + +Property Value +────────────────────────────────────────── +Run ID run-abc123 +Base Model llama2-7b +Owner user123 +LoRA Yes (Rank 32) +Last Update 2 hours ago +Status OK +Last Training Checkpoint weights/00040 + - Time 2 hours ago + - Path tinker://run-abc123/weights/00040 +``` + +--- + +## tinker checkpoint list + +List checkpoints, optionally filtered by run. + +| Option | Type | Default | Description | +|--------|------|---------|-------------| +| `--run-id` | str | None | Filter to one run | +| `--limit` | int | 20 | Max results (0 = all) | + +**Behavior:** +- With `--run-id`: lists all checkpoints for that run + (no pagination) +- Without: lists across all runs with pagination + +```bash +tinker checkpoint list --run-id run-abc123 +``` + +**Table output:** +``` +4 checkpoints + +Checkpoint ID Type Size Public Created Path +───────────────────────────────────────────────────────────────────── +weights/00040 training 1.5 GB No 2 hours ago tinker://run-abc123/weights/00040 +sampler_weights/05 sampler 512 MB No 3 days ago tinker://run-abc123/sampler_weights/05 +``` + +--- + +## tinker checkpoint info + +Show details for a single checkpoint. + +| Argument | Required | Description | +|----------|----------|-------------| +| `CHECKPOINT_PATH` | yes | tinker:// path | + +```bash +tinker checkpoint info tinker://run-abc123/weights/00040 +``` + +Shows: checkpoint ID, type, tinker path, size, public +status, creation time, expiration, run ID, LoRA info. + +--- + +## tinker checkpoint download + +Download and extract a checkpoint archive. + +| Argument/Option | Type | Default | Description | +|-----------------|------|---------|-------------| +| `CHECKPOINT_PATH` | str | — | tinker:// path (required) | +| `-o, --output` | path | cwd | Parent directory | +| `--force` | flag | false | Overwrite existing dir | + +**Directory naming:** +`tinker://run-abc123/weights/00040` +→ `run-abc123_weights_00040/` + +```bash +tinker checkpoint download \ + tinker://run-abc123/weights/00040 \ + -o ./models/ --force +``` + +**Extracted contents (typical LoRA):** +``` +run-abc123_weights_00040/ +├── adapter_config.json +├── adapter_model.safetensors +└── checkpoint_complete +``` + +**Safety:** Rejects symlinks, hardlinks, and path +traversal in tar archives. + +--- + +## Publish / Unpublish + +Toggle public accessibility of a checkpoint. + +```bash +tinker checkpoint publish +tinker checkpoint unpublish +``` + +Silent on success. Only the run owner can change this. + +--- + +## tinker checkpoint set-ttl + +Set or remove checkpoint expiration. + +| Option | Type | Description | +|--------|------|-------------| +| `--ttl` | int | TTL in seconds from now | +| `--remove` | flag | Clear expiration | + +Must specify exactly one of `--ttl` or `--remove`. + +```bash +# Expire in 7 days +tinker checkpoint set-ttl \ + tinker://run-abc123/weights/00040 --ttl 604800 + +# Remove expiration +tinker checkpoint set-ttl \ + tinker://run-abc123/weights/00040 --remove +``` + +--- + +## tinker checkpoint delete + +Delete checkpoints permanently. Two modes: + +**Mode 1: By explicit paths** +```bash +tinker checkpoint delete \ + tinker://run-id/weights/0001 \ + tinker://run-id/weights/0002 [-y] +``` + +**Mode 2: By run ID with filters** + +| Option | Type | Description | +|--------|------|-------------| +| `--run-id` | str | Target run | +| `--type` | str | `weights` or `sampler_weights` | +| `--before` | str | ISO 8601 date (UTC) | +| `--after` | str | ISO 8601 date (UTC) | +| `-y, --yes` | flag | Skip confirmation | + +```bash +tinker checkpoint delete --run-id run-abc123 \ + --type weights --before 2025-01-01 -y +``` + +**Constraints:** +- Cannot mix explicit paths with `--run-id` +- Filters (`--type`, `--before`, `--after`) require + `--run-id` +- Date format: `2024-01-01`, `2024-01-01T12:00:00Z` +- Without `-y`, shows confirmation prompt + +**Concurrency:** Deletes up to 32 checkpoints in +parallel using ThreadPoolExecutor. + +--- + +## tinker checkpoint push-hf + +Upload checkpoint to HuggingFace Hub as a PEFT adapter. + +| Option | Type | Default | Description | +|--------|------|---------|-------------| +| `CHECKPOINT_PATH` | str | — | tinker:// path (required) | +| `-r, --repo` | str | auto | HF repo ID | +| `--public` | flag | false | Public repo | +| `--revision` | str | auto | Branch/revision | +| `--commit-message` | str | auto | Commit message | +| `--create-pr` | flag | false | Create PR instead | +| `--allow-pattern` | str | — | File include pattern (repeatable) | +| `--ignore-pattern` | str | — | File exclude pattern (repeatable) | +| `--no-model-card` | flag | false | Skip README.md | + +**Prerequisites:** +```bash +pip install huggingface_hub +hf auth login +``` + +**Auto-derived values:** +- Repo ID: `tinker--` +- Revision: sanitized checkpoint ID + (e.g., `sampler_weights-0005`) + +```bash +tinker checkpoint push-hf \ + tinker://run-abc123/sampler_weights/0005 \ + -r myorg/my-lora --public \ + --commit-message "Checkpoint after epoch 5" +``` + +**Generates model card** with PEFT metadata, base model +info, usage snippet, and tinker source path. + +--- + +## Tinker Path Anatomy + +``` +tinker://// +``` + +| Component | Values | Example | +|-----------|--------|---------| +| `RUN_ID` | run identifier | `run-abc123` | +| `CHECKPOINT_TYPE` | `weights` (training), `sampler_weights` (sampler) | `weights` | +| `CHECKPOINT_ID` | step number or name | `00040`, `final` | + +Full example: `tinker://run-abc123/weights/00040` + +--- + +## Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | Success | +| 1 | General error (TinkerCliError) | +| 2 | Click validation error (bad args) | +| 130 | User interrupt (Ctrl+C) | + +--- + +## Output System + +All commands support `--format table` (default) and +`--format json`. + +- **Table**: Uses Rich library, colored headers, cyan + first column, emoji disabled to prevent ID mangling +- **JSON**: 2-space indent, trailing newline, suitable + for piping to `jq` +- Progress bars only shown in table mode diff --git a/skills/tinker/references/cookbook-recipes.md b/skills/tinker/references/cookbook-recipes.md new file mode 100644 index 0000000..93b1cc5 --- /dev/null +++ b/skills/tinker/references/cookbook-recipes.md @@ -0,0 +1,432 @@ +# Tinker Cookbook — Recipe Deep-Dives + +This reference covers the cookbook's architecture, every +recipe, renderer selection, dataset builders, RL +environments, and hyperparameter guidance. + +Cookbook repo: +https://github.com/thinking-machines-lab/tinker-cookbook + +SDK repo: +https://github.com/thinking-machines-lab/tinker + +For detailed source code, browse or clone these repos. + +## Table of Contents + +1. [Blueprint / chz Pattern](#blueprint--chz-pattern) +2. [SFT Recipes](#sft-recipes) +3. [RL Recipes](#rl-recipes) +4. [Renderer Selection](#renderer-selection) +5. [Dataset Builder Interface](#dataset-builder-interface) +6. [RL Environment Pattern](#rl-environment-pattern) +7. [Pipelined Async Training](#pipelined-async-training) +8. [Checkpoint & Resume](#checkpoint--resume) +9. [Logging](#logging) +10. [Hyperparameter Guidance](#hyperparameter-guidance) + +--- + +## Blueprint / chz Pattern + +Every recipe uses `chz.Blueprint` for typed, CLI- +overridable configuration: + +```python +import chz +from tinker_cookbook.rl import train + +def build_config_blueprint() -> chz.Blueprint[train.Config]: + """Build a Blueprint with defaults that can be + overridden from the command line.""" + return chz.Blueprint(train.Config).apply({ + "model_name": "meta-llama/Llama-3.1-8B", + "learning_rate": 2e-4, + "log_path": "/tmp/my-run", + ... + }) + +if __name__ == "__main__": + bp = build_config_blueprint() + # Parse CLI overrides: --field value + bp.make_from_argv(sys.argv[1:]) + config = bp.make() + main(config) +``` + +**CLI override syntax:** +- Flat: `--model_name "Qwen/Qwen3-8B"` +- Nested: `--dataset_builder.batch_size 64` +- Check fields: `python -m --help` + +--- + +## SFT Recipes + +### sl_basic (Minimal SFT) + +Entry point: `tinker_cookbook/recipes/sl_basic.py` + +Default config: +- Model: `meta-llama/Llama-3.1-8B` +- Dataset: NoRobots (HuggingFace Hub) +- LR: 2e-4, schedule: linear +- Epochs: 1, eval every 8 batches +- Batch size: 128, max length: 32768 + +```bash +python -m tinker_cookbook.recipes.sl_basic \ + --model_name "meta-llama/Llama-3.1-8B" \ + --learning_rate 2e-4 +``` + +### chat_sl (Conversational SFT) + +Entry point: `tinker_cookbook/recipes/chat_sl/train.py` + +Multi-dataset support (NoRobots, Tulu3). Uses flexible +chat template rendering. Key config: +- `train_on_what`: `ALL_ASSISTANT_MESSAGES` (default) +- Supports custom conversation JSONL files + +### prompt_distillation + +Entry point: +`tinker_cookbook/recipes/prompt_distillation/train.py` + +Internalizes long system prompts into model parameters. +Teacher-student framework with task-specific data +generation. + +### vlm_classifier (Vision-Language SFT) + +Entry point: +`tinker_cookbook/recipes/vlm_classifier/train.py` + +Image + text classification using vision-language models. +Example: Caltech101 dataset. + +### Custom Dataset for SFT + +Use `FromConversationFileBuilder` with a JSONL file: + +```python +from tinker_cookbook.supervised.data import ( + FromConversationFileBuilder, +) +from tinker_cookbook.supervised.types import ( + ChatDatasetBuilderCommonConfig, +) + +common = ChatDatasetBuilderCommonConfig( + model_name_for_tokenizer="meta-llama/Llama-3.1-8B", + renderer_name="llama3", + max_length=32768, + batch_size=128, + train_on_what=TrainOnWhat.ALL_ASSISTANT_MESSAGES, +) +dataset = FromConversationFileBuilder( + common_config=common, + file_path="/path/to/conversations.jsonl", +) +``` + +JSONL format: see `example_data/conversations.jsonl`. +Each line is a JSON object with a `messages` array of +`{"role": "user"|"assistant", "content": "..."}`. + +--- + +## RL Recipes + +### rl_basic (Minimal RL) + +Entry point: `tinker_cookbook/recipes/rl_basic.py` + +Default config: +- Model: `meta-llama/Llama-3.1-8B` +- Dataset: GSM8K via `Gsm8kDatasetBuilder` +- LR: 4e-5, max tokens: 256 +- Batch size: 128, group size: 16 + +```bash +python -m tinker_cookbook.recipes.rl_basic \ + --learning_rate 4e-5 --max_tokens 256 +``` + +### math_rl (Math Reasoning) + +Entry point: `tinker_cookbook/recipes/math_rl/train.py` + +Trains on GSM8K / MATH / Arithmetic with custom grading +functions. Structured answer extraction with regex. + +### code_rl (Code Reasoning) + +Entry point: `tinker_cookbook/recipes/code_rl/train.py` + +DeepCoder-like competitive programming. Sandboxed code +execution via SandboxFusion or Modal. Test-driven rewards. + +### search_tool (Tool Use RL) + +Entry point: `tinker_cookbook/recipes/search_tool/train.py` + +Multi-hop QA with tool-calling framework. Vector DB +integration (ChromaDB) for retrieval. Multi-turn +interaction. + +### preference (RLHF) + +3-stage pipeline: +1. SFT on reference data +2. Train reward model on preference pairs +3. RL against reward model + +Entry: `tinker_cookbook/recipes/preference/` + +### distillation + +On-policy & off-policy teacher→student distillation. +Multi-dataset support, teacher model loading. + +Entry: `tinker_cookbook/recipes/distillation/` + +### multiplayer_rl (Multi-Agent / Self-Play) + +Environments: tic-tac-toe, 20 Questions, guess-the-number. +Self-play and multi-agent training. + +Entry: `tinker_cookbook/recipes/multiplayer_rl/` + +### rubric (LLM Grader) + +LLM-based reward via structured rubrics. Regex extraction, +Prometheus dataset support. + +Entry: `tinker_cookbook/recipes/rubric/train.py` + +### verifiers_rl (Community Envs) + +Prime Intellect Environments Hub integration. Generic +environment interface for community-contributed envs. + +Entry: `tinker_cookbook/recipes/verifiers_rl/train.py` + +### harbor_rl (Terminal/SWE Tasks) + +Harbor task format standardization. Multi-turn bash tool +use with sandboxed execution. + +Entry: `tinker_cookbook/recipes/harbor_rl/train.py` + +--- + +## Renderer Selection + +Each model requires a specific chat format renderer. +The registry in `model_info.py` maps model names to +recommended renderers. + +```python +from tinker_cookbook import model_info + +name = model_info.get_recommended_renderer_name( + "meta-llama/Llama-3.1-8B" +) +# Returns: "llama3" +``` + +**Supported models and renderers:** + +| Model Family | Example Model | Renderer | +|-------------|---------------|----------| +| Llama 3.x | `meta-llama/Llama-3.1-8B` | `llama3` | +| Qwen 3 | `Qwen/Qwen3-8B` | `qwen3` | +| Qwen 3.5 | `Qwen/Qwen3.5-*` | `qwen3_5` | +| DeepSeek V3 | `deepseek-ai/DeepSeek-V3.1` | `deepseek_v3` | +| Nemotron 3 | `nvidia/Nemotron-3-*` | `nemotron3` | +| Kimi K2 | `moonshotai/Kimi-K2` | `kimi_k2` | +| Kimi K2.5 | `moonshotai/Kimi-K2.5` | `kimi_k25` | +| GPT-OSS | `openai/gpt-oss-*` | `gpt_oss` | + +Renderer files: `tinker_cookbook/renderers/` + +Features vary by renderer: +- Tool calling support +- Thinking/reasoning mode (Qwen3, DeepSeek, Kimi) +- Stop sequences +- Image token counting (VLM renderers) + +**Warning system:** If you use a non-recommended renderer, +`warn_if_renderer_not_recommended()` logs a warning. + +--- + +## Dataset Builder Interface + +### SFT Datasets + +Inherit from the builder pattern: +- `SupervisedDatasetFromHFDataset` — HuggingFace Hub +- `FromConversationFileBuilder` — local JSONL file + +Common config fields: +- `model_name_for_tokenizer`: tokenizer model name +- `renderer_name`: chat format renderer +- `max_length`: max sequence length +- `batch_size`: training batch size +- `train_on_what`: which messages to train on + (`ALL_ASSISTANT_MESSAGES`, etc.) + +### RL Datasets + +Inherit from `RLDatasetBuilder`: + +```python +class MyEnvBuilder(RLDatasetBuilder): + def build_dataset(self): + """Return batches of prompts + reward fn.""" + ... +``` + +Each builder provides: +- Prompt generation +- Reward computation +- Episode structure (single or multi-turn) + +--- + +## RL Environment Pattern + +RL recipes define environments that provide prompts and +compute rewards. Example from `math_rl/math_env.py`: + +```python +class MathEnvironment: + def get_prompt(self, problem): + """Format the math problem as a prompt.""" + ... + + def compute_reward(self, response, answer): + """Grade the response against ground truth.""" + ... +``` + +**Multi-turn environments** (search_tool, multiplayer_rl) +use `message_env.py`: +- Token-level trajectories from message-level episodes +- Multiple interaction turns with the environment +- Terminal rewards after conversation ends + +--- + +## Pipelined Async Training + +The training loop uses pipelined async requests for +throughput: + +```python +# Pipeline: overlap compute with data loading +fwd_bwd_future = client.forward_backward(batch, loss) +optim_future = client.optim_step(adam_params) + +# While GPU computes, prepare next batch +next_batch = dataset.next() + +# Collect results +fwd_bwd_result = fwd_bwd_future.result() +optim_result = optim_future.result() +``` + +This overlaps: +- Data loading on CPU +- Forward/backward pass on GPU (via API) +- Optimizer step on GPU (via API) + +--- + +## Checkpoint & Resume + +```python +from tinker_cookbook import checkpoint_utils + +# Save checkpoint +checkpoint_utils.save_checkpoint( + client, log_path, batch_num, metrics +) + +# Resume from last checkpoint +resume = checkpoint_utils.get_last_checkpoint(log_path) +if resume: + client = service_client\ + .create_training_client_from_state_with_optimizer( + resume.state_path + ) + start_batch = resume.batch +``` + +All recipes support `--log_path` for recovery. Re-run +with the same path and choose "resume" when prompted. + +Artifacts in log_path: +- `metrics.jsonl` — training metrics per batch +- `checkpoints.jsonl` — checkpoint metadata + +--- + +## Logging + +### Local Logging +```python +from tinker_cookbook.utils.ml_log import MLLog + +logger = MLLog(log_path="/tmp/my-run") +logger.log({"loss": 0.5, "lr": 1e-4}, step=100) +``` + +### Weights & Biases +```bash +python -m tinker_cookbook.recipes.sl_basic \ + --wandb_project "my-project" +``` + +Requires: `pip install wandb` and `wandb login`. + +--- + +## Hyperparameter Guidance + +### SFT Defaults + +| Param | Small (1-3B) | Medium (7-8B) | Large (70B) | +|-------|-------------|---------------|-------------| +| LR | 2e-4 | 2e-4 | 1e-4 | +| Batch | 64-128 | 128 | 128-256 | +| Epochs | 1-3 | 1-2 | 1 | +| Max len | 4096-8192 | 8192-32768 | 32768 | + +### RL Defaults + +| Param | Small (1-3B) | Medium (7-8B) | Large (70B) | +|-------|-------------|---------------|-------------| +| LR | 4e-5 | 4e-5 | 2e-5 | +| Group | 8-16 | 16 | 16-32 | +| Max tok | 256-512 | 256-1024 | 512-2048 | + +### Hyperparameter Utilities + +```python +from tinker_cookbook import hyperparam_utils + +# Estimate LoRA parameter count +count = hyperparam_utils.estimate_lora_params( + model_name, lora_rank=32 +) + +# Get LR suggestion scaled by model size +lr = hyperparam_utils.suggest_learning_rate(model_name) +``` + +The `hyperparam_utils` module has a registry of known +hidden sizes per model for accurate parameter estimation. diff --git a/skills/tinker/references/troubleshooting.md b/skills/tinker/references/troubleshooting.md new file mode 100644 index 0000000..86978df --- /dev/null +++ b/skills/tinker/references/troubleshooting.md @@ -0,0 +1,319 @@ +# Tinker Troubleshooting Guide + +Extended error catalog and common fixes for the Tinker +CLI and cookbook. + +## Table of Contents + +1. [Authentication Errors](#authentication-errors) +2. [Checkpoint Errors](#checkpoint-errors) +3. [Network / Connection Errors](#network--connection-errors) +4. [Cookbook / Import Errors](#cookbook--import-errors) +5. [HuggingFace Push Errors](#huggingface-push-errors) +6. [W&B Integration](#wb-integration) +7. [CLI Error Reference](#cli-error-reference) + +--- + +## Authentication Errors + +### "Authentication failed" + +**Message:** +``` +Error: Authentication failed +Please check your API key or authentication credentials. +``` + +**Fixes:** +1. Check env var: `echo $TINKER_API_KEY` +2. Check config: `cat ~/.tinker/config.json` +3. API keys are workspace-scoped — verify you're using + the correct workspace key +4. Regenerate key from the Tinker dashboard + +### "Permission denied" + +**Message:** +``` +Error: Permission denied +You do not have access to this resource. +``` + +**Causes:** +- Trying to modify another user's checkpoint (publish, + unpublish, delete, set-ttl) +- API key lacks required permissions +- Wrong workspace + +--- + +## Checkpoint Errors + +### "Resource not found" + +**Message:** +``` +Error: Resource not found +``` + +**Fixes:** +1. Verify path format: `tinker://RUN_ID/TYPE/STEP` + - TYPE must be `weights` or `sampler_weights` + - STEP must match exactly (e.g., `00040` not `40`) +2. List available checkpoints: + ```bash + tinker checkpoint list --run-id + ``` +3. Check if checkpoint was deleted or TTL-expired +4. Verify the run still exists: + ```bash + tinker run info + ``` + +### "Invalid checkpoint path" + +**Message:** +``` +Error: Invalid checkpoint path: +Checkpoint path must be in the format: +tinker://run-id/weights/0001 +``` + +**Fix:** Ensure path starts with `tinker://` and follows +the format `tinker://RUN_ID/TYPE/STEP`. + +### "Target directory already exists" + +**Message:** +``` +Error: Target directory already exists: +Use --force to overwrite or choose a different output +directory. +``` + +**Fix:** Add `--force` flag to overwrite, or use `-o` to +specify a different output directory. + +### "Failed to extract archive" + +**Message:** +``` +Error: Failed to extract archive:
+The downloaded file may be corrupted. +``` + +**Fixes:** +1. Re-download the checkpoint +2. Check disk space: `df -h` +3. Verify the checkpoint is not corrupted by checking + its info: `tinker checkpoint info ` + +### Checkpoint Corruption Detection + +Signs of a corrupted checkpoint: +- Missing `checkpoint_complete` marker file +- Missing `adapter_config.json` +- Missing weight files (`.safetensors` or `.bin`) +- Run shows `corrupted: true` in `tinker run info` + +**Fix:** Download a different checkpoint step, or contact +Tinker support if the entire run is corrupted. + +--- + +## Network / Connection Errors + +### "Connection failed" + +**Message:** +``` +Error: Connection failed +Please check your network connection and try again. +``` + +**Fixes:** +1. Check internet: `ping api.thinkingmachines.ai` +2. Check proxy: `echo $HTTP_PROXY $HTTPS_PROXY` +3. Firewall may block Tinker API endpoints +4. Try again — may be transient + +### "Request timeout" + +**Message:** +``` +Error: Request timeout +The request took too long. Please try again. +``` + +**Fixes:** +1. Retry the command +2. For large downloads, check network bandwidth +3. For bulk operations, reduce batch size + +### "Rate limit exceeded" + +**Message:** +``` +Error: Rate limit exceeded +Please wait and try again. +``` + +**Fixes:** +1. Wait 30-60 seconds and retry +2. Reduce `--limit` for batch listing operations +3. Avoid running multiple CLI sessions in parallel + +--- + +## Cookbook / Import Errors + +### ModuleNotFoundError: tinker_cookbook + +**Fix:** +```bash +cd /path/to/tinker-cookbook +uv pip install -e . +``` + +### ModuleNotFoundError: tinker + +**Fix:** +```bash +uv pip install tinker +# Or with CLI extras: +uv pip install "tinker[cli]" +``` + +### Python Version + +The cookbook requires Python 3.10+. Check: +```bash +python --version +``` + +### Missing Optional Dependencies + +Some recipes need extras: + +| Recipe | Extra | Install | +|--------|-------|---------| +| math_rl | math | `uv pip install -e ".[math-rl]"` | +| code_rl | modal | `uv pip install -e ".[modal]"` | +| search_tool | vector | `uv pip install -e ".[vector-search]"` | +| Any + W&B | wandb | `uv pip install -e ".[wandb]"` | +| verifiers_rl | verifiers | `uv pip install -e ".[verifiers]"` | +| eval | inspect | `uv pip install -e ".[inspect]"` | + +### chz Configuration Errors + +**"Unknown field"** when using CLI overrides: +- Check available fields: `python -m --help` +- Use exact field names (case-sensitive) +- Nested fields: `--dataset_builder.batch_size 64` + +**"Cannot convert"** type errors: +- Ensure values match expected types +- Strings need quotes: `--model_name "Qwen/Qwen3-8B"` +- Booleans: `--flag True` or `--flag False` + +### Log Directory Conflicts + +When re-running a recipe with the same `--log_path`: +- Choose "resume" to continue from last checkpoint +- Choose "overwrite" to start fresh +- Use `cli_utils.check_log_dir()` behavior parameter: + - `"ask"` — interactive prompt + - `"resume"` — auto-resume + - `"overwrite"` — auto-overwrite + +--- + +## HuggingFace Push Errors + +### "huggingface_hub is not installed" + +**Fix:** +```bash +pip install huggingface_hub +``` + +### "Not logged in to Hugging Face" + +**Fix:** +```bash +hf auth login +# Paste your HF token when prompted +# Verify: hf whoami +``` + +### "Repo contains different Tinker checkpoint" + +The target HF repo already has a checkpoint from a +different Tinker run. This prevents accidental overwrites. + +**Fixes:** +1. Use a different `--repo` name +2. Use `--revision` to push to a different branch +3. Delete the existing repo on HF and retry + +### "Invalid adapter format" + +The checkpoint doesn't contain required PEFT files +(`adapter_config.json` + weight files). + +**Fixes:** +1. Verify it's a LoRA checkpoint (not a full model) +2. Check the run: `tinker run info ` — look + for `is_lora: true` +3. Try a different checkpoint step + +--- + +## W&B Integration + +### "wandb not installed" + +```bash +pip install wandb +# Or: uv pip install -e ".[wandb]" +``` + +### "wandb not logged in" + +```bash +wandb login +# Paste your API key +``` + +### Metrics Not Appearing + +1. Check `--wandb_project` is set correctly +2. Verify W&B API key: `wandb verify` +3. Check network access to `api.wandb.ai` +4. Look for errors in recipe stdout/stderr + +--- + +## CLI Error Reference + +Full mapping of API errors to CLI messages: + +| API Error | CLI Message | Hint | +|-----------|------------|------| +| `NotFoundError` | Resource not found | Check path/ID | +| `AuthenticationError` | Authentication failed | Check API key | +| `PermissionDeniedError` | Permission denied | Wrong owner | +| `BadRequestError` | Invalid request | Check args | +| `UnprocessableEntityError` | Invalid data | Check format | +| `RateLimitError` | Rate limit exceeded | Wait & retry | +| `InternalServerError` | Internal server error | Retry later | +| `APITimeoutError` | Request timeout | Retry | +| `APIConnectionError` | Connection failed | Check network | +| `APIStatusError` | API error (status N) | Check status | + +**Exit codes:** +- `0` — Success +- `1` — General error +- `2` — Bad arguments (Click validation) +- `130` — Ctrl+C interrupt