diff --git a/docs/docs/docs/features/rl_training.md b/docs/docs/docs/features/rl_training.md
new file mode 100644
index 00000000..a254e901
--- /dev/null
+++ b/docs/docs/docs/features/rl_training.md
@@ -0,0 +1,212 @@
+# RL Training
+
+The RL training module enables training manipulation policies with reinforcement learning using [rsl_rl](https://github.com/leggedrobotics/rsl_rl) (PPO). It runs fully in simulation with parallel environments and no human teleoperation required.
+
+:::note
+End-to-end RL for manipulation is challenging — reward design, exploration, and sim-to-real transfer all require significant task-specific tuning. Currently only the **LiftCube** task is supported. Support for additional tasks will be added in future updates.
+:::
+
+## Training
+
+```shell
+python scripts/datagen/rl/train.py \
+    --task LeIsaac-SO101-LiftCube-RL-v0 \
+    --num_envs 512 \
+    --max_iterations 1500 \
+    --headless
+```
+
+<details>
+<summary><strong>Parameter descriptions for train.py</strong></summary>
+
+- `--task`: Gym task ID to train. Required.
+
+- `--num_envs`: Number of parallel simulation environments. More environments = faster data collection. Default: value from task config.
+
+- `--max_iterations`: Number of PPO update iterations. Default: value from agent config.
+
+- `--seed`: Random seed for reproducibility. Default: value from agent config.
+
+- `--headless`: Run without rendering window for faster training.
+
+- `--device`: Computation device, such as `cpu` or `cuda`.
+
+</details>
+
+::::tip
+Training logs (tensorboard) are written to `logs/rsl_rl/<experiment_name>/<timestamp>/`. Monitor progress with:
+
+```shell
+tensorboard --logdir logs/rsl_rl
+```
+
+Key metrics to watch: `Train/mean_reward` (total episode reward) and individual reward terms such as `Episode/rew_cube_height`.
+::::
+
+## Evaluation & Recording
+
+Evaluate a checkpoint visually (no recording):
+
+```shell
+python scripts/datagen/rl/play.py \
+    --task LeIsaac-SO101-LiftCube-RL-v0 \
+    --checkpoint logs/rsl_rl/lift_cube_rl/<run>/model_<iter>.pt \
+    --num_envs 1
+```
+
+Save all episodes to HDF5 (both success and failure) by adding `--record`:
+
+```shell
+python scripts/datagen/rl/play.py \
+    --task LeIsaac-SO101-LiftCube-RL-v0 \
+    --checkpoint logs/rsl_rl/lift_cube_rl/<run>/model_<iter>.pt \
+    --num_envs 1 \
+    --num_episodes 100 \
+    --record --dataset_file ./datasets/rl_eval.hdf5
+```
+
+<details>
+<summary><strong>Parameter descriptions for play.py</strong></summary>
+
+- `--task`: Gym task ID. Required.
+
+- `--checkpoint`: Path to a saved model checkpoint (`.pt`). Required.
+
+- `--num_envs`: Number of parallel environments. Default: value from task config.
+
+- `--num_episodes`: Total episodes to run across all envs. `0` = run indefinitely. Default: `0`.
+
+- `--seed`: Random seed. Default: value from agent config.
+
+- `--record`: Enable HDF5 recording. Both successful and failed episodes are saved.
+
+- `--resume_recording`: Append to an existing dataset file instead of creating a new one.
+
+- `--dataset_file`: Output HDF5 file path. Default: `./datasets/rl_eval.hdf5`.
+
+- `--real-time`: Slow down simulation to real-time speed.
+
+</details>
+
+## Reward Design
+
+The LiftCube RL task uses three reward terms:
+
+| Term | Weight | Description |
+|------|--------|-------------|
+| `cube_success` | 200.0 | One-time bonus when cube height ≥ 20 cm above robot base. Episode ends immediately after (early termination). |
+| `ee_to_cube` | 1.5 | `1 - tanh(5 × dist(TCP, cube))` — guides TCP to cube center. Range [0, 1]. |
+| `cube_height` | 10.0 | `tanh(3 × max(h - 4.6 cm, 0))` — zero below 4.6 cm, monotonically increasing above. Range [0, 1]. |
+
+**TCP (Tool Center Point)** is computed as the midpoint between the two fingertip contact surfaces, derived from `body_pos_w` and `body_quat_w` with calibrated local offsets from the USD collision mesh:
+
+- Jaw tip offset (jaw body local frame): `(0.0, -0.05, 0.02)`
+- Gripper tip offset (gripper body local frame): `(-0.012, 0.0, -0.08)`
+
+**Termination**: episode ends on timeout (15 s) or when cube height ≥ 20 cm (success).
+
+## Action Space
+
+RL training uses the `rl_so101leader` device mode — delta end-effector control with a binary gripper:
+
+| Component | Dims | Description |
+|-----------|------|-------------|
+| `arm_action` | 6 | Delta EE pose (dx, dy, dz, droll, dpitch, dyaw), scale=(0.02, 0.02, 0.02, 0.5, 0.5, 0.5) → ±2 cm / ±0.5 rad per step |
+| `gripper_action` | 1 | Binary: action > 0 → open (1.0 rad), action < 0 → close (0.2 rad) |
+| **Total** | **7** | |
+
+## Observation Space
+
+26D flat vector (concatenated):
+
+| Term | Dims |
+|------|------|
+| `joint_pos` | 6 |
+| `joint_vel` | 6 |
+| `ee_frame_state` (pos + quat, robot frame) | 7 |
+| `cube_pos_relative_to_ee` | 3 |
+| `cube_quat` (orientation in world frame) | 4 |
+| **Total** | **26** |
+
+## Adding a New RL Task
+
+1. Create `<task>/mdp/rewards.py` with reward functions.
+2. Create `<task>/<task>_rl_env_cfg.py` with the RL env config class:
+
+```python
+@configclass
+class MyTaskRLEnvCfg(MyTaskEnvCfg):
+    observations: MyTaskRLObsCfg = MyTaskRLObsCfg()
+    rewards: MyTaskRLRewardsCfg = MyTaskRLRewardsCfg()
+    terminations: MyTaskRLTerminationsCfg = MyTaskRLTerminationsCfg()
+
+    def __post_init__(self):
+        super().__post_init__()
+        self.use_teleop_device("rl_so101leader")  # or "bi_rl_so101leader" for bi-arm
+        self.scene.front = None  # disable camera for faster training
+        self.episode_length_s = 15.0
+```
+
+3. Create `<task>/rl_agents/rsl_rl_ppo_cfg.py` with the PPO runner config:
+
+```python
+from isaaclab.utils import configclass
+from isaaclab_rl.rsl_rl import RslRlOnPolicyRunnerCfg, RslRlPpoActorCriticCfg, RslRlPpoAlgorithmCfg
+
+@configclass
+class MyTaskRLPPORunnerCfg(RslRlOnPolicyRunnerCfg):
+    num_steps_per_env = 100
+    max_iterations = 1500
+    save_interval = 50
+    experiment_name = "my_task_rl"
+    obs_groups = {"actor": ["policy"], "critic": ["policy"]}
+
+    policy = RslRlPpoActorCriticCfg(
+        init_noise_std=0.3,
+        actor_obs_normalization=True,
+        critic_obs_normalization=True,
+        actor_hidden_dims=[256, 128, 64],
+        critic_hidden_dims=[256, 128, 64],
+        activation="elu",
+    )
+
+    algorithm = RslRlPpoAlgorithmCfg(
+        value_loss_coef=1.0,
+        use_clipped_value_loss=True,
+        clip_param=0.2,
+        entropy_coef=0.005,
+        num_learning_epochs=5,
+        num_mini_batches=4,
+        learning_rate=1.0e-3,
+        schedule="adaptive",
+        gamma=0.99,
+        lam=0.95,
+        desired_kl=0.01,
+        max_grad_norm=1.0,
+    )
+```
+
+4. Register the gym environment in `<task>/__init__.py`:
+
+```python
+from . import rl_agents
+
+gym.register(
+    id="LeIsaac-SO101-MyTask-RL-v0",
+    entry_point="isaaclab.envs:ManagerBasedRLEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": f"{__name__}.<task>_rl_env_cfg:MyTaskRLEnvCfg",
+        "rsl_rl_cfg_entry_point": f"{rl_agents.__name__}.rsl_rl_ppo_cfg:MyTaskRLPPORunnerCfg",
+    },
+)
+```
+
+5. Train:
+
+```bash
+python scripts/datagen/rl/train.py \
+    --task LeIsaac-SO101-MyTask-RL-v0 \
+    --num_envs 512 \
+    --headless
+```
diff --git a/docs/docs/docs/features/state_machine.md b/docs/docs/docs/features/state_machine.md
index 081b9def..994e9d14 100644
--- a/docs/docs/docs/features/state_machine.md
+++ b/docs/docs/docs/features/state_machine.md
@@ -60,7 +60,7 @@ After recording, you can replay the collected demonstrations in simulation:
 python scripts/datagen/state_machine/replay.py \
     --task LeIsaac-SO101-PickOrange-v0 \
     --dataset_file ./datasets/pick_orange.hdf5 \
-    --task_type so101_state_machine \
+    --task_type ik_so101leader \
     --select_episodes 0 \
     --device cuda \
     --enable_cameras \
@@ -82,7 +82,7 @@ python scripts/datagen/state_machine/replay.py \
 
 - `--replay_mode`: Replay mode — `action` replays IK pose targets, `state` replays joint positions.
 
-- `--task_type`: State machine device type used during recording, e.g., `so101_state_machine` or `bi_so101_state_machine`. Inferred from task name if not set.
+- `--task_type`: State machine device type used during recording, e.g., `ik_so101leader` or `bi_ik_so101leader`. Inferred from task name if not set.
 
 - `--select_episodes`: List of episode indices to replay. Leave empty to replay all episodes.
 
@@ -97,7 +97,7 @@ python scripts/datagen/state_machine/replay.py \
 
 ```python
 TASK_REGISTRY = {
-    "LeIsaac-SO101-PickOrange-v0": (PickOrangeStateMachine, "so101_state_machine"),
-    "LeIsaac-MY-NewTask-v0":       (MyNewStateMachine,      "so101_state_machine"),
+    "LeIsaac-SO101-PickOrange-v0": (PickOrangeStateMachine, "ik_so101leader"),
+    "LeIsaac-MY-NewTask-v0":       (MyNewStateMachine,      "ik_so101leader"),
 }
 ```
diff --git a/docs/sidebars.js b/docs/sidebars.js
index 7f6402b3..2b037604 100644
--- a/docs/sidebars.js
+++ b/docs/sidebars.js
@@ -123,6 +123,12 @@ const sidebars = {
           link: { type: 'doc', id: 'docs/features/state_machine' },
           items: [],
         },
+        {
+          type: 'category',
+          label: 'RL Training',
+          link: { type: 'doc', id: 'docs/features/rl_training' },
+          items: [],
+        },
       ],
     },
     'docs/trouble_shooting',
diff --git a/scripts/datagen/rl/play.py b/scripts/datagen/rl/play.py
new file mode 100644
index 00000000..c17637a6
--- /dev/null
+++ b/scripts/datagen/rl/play.py
@@ -0,0 +1,179 @@
+"""Play or record a trained leisaac RL policy.
+
+Use --record to save episodes to HDF5. Without --record, runs in visualization mode only.
+"""
+
+import os
+import sys
+
+# Make cli_args importable (local module in IsaacLab's rsl_rl script directory)
+_ISAACLAB_RSL_RL_DIR = os.path.join(
+    os.path.dirname(__file__),
+    "..",
+    "..",
+    "..",
+    "dependencies",
+    "IsaacLab",
+    "scripts",
+    "reinforcement_learning",
+    "rsl_rl",
+)
+sys.path.insert(0, os.path.abspath(_ISAACLAB_RSL_RL_DIR))
+
+import argparse
+import signal
+
+from isaaclab.app import AppLauncher
+
+import cli_args  # isort: skip
+
+parser = argparse.ArgumentParser(description="Play or record a trained leisaac RL policy.")
+parser.add_argument("--task", type=str, default=None, help="Name of the task.")
+parser.add_argument(
+    "--agent", type=str, default="rsl_rl_cfg_entry_point", help="Name of the RL agent configuration entry point."
+)
+parser.add_argument("--num_envs", type=int, default=None, help="Number of environments to simulate.")
+parser.add_argument("--seed", type=int, default=None, help="Seed used for the environment.")
+parser.add_argument("--num_episodes", type=int, default=0, help="Total episodes to run (0 = infinite).")
+parser.add_argument("--real-time", action="store_true", default=False, help="Run in real-time, if possible.")
+# HDF5 recording
+parser.add_argument("--record", action="store_true", help="Save episodes to HDF5 with success/failure tags.")
+parser.add_argument(
+    "--resume_recording", action="store_true", help="Append to an existing dataset file instead of creating a new one."
+)
+parser.add_argument("--dataset_file", type=str, default="./datasets/rl_eval.hdf5", help="Output HDF5 file path.")
+cli_args.add_rsl_rl_args(parser)
+AppLauncher.add_app_launcher_args(parser)
+args_cli, hydra_args = parser.parse_known_args()
+
+sys.argv = [sys.argv[0]] + hydra_args
+
+app_launcher = AppLauncher(args_cli)
+simulation_app = app_launcher.app
+
+import time
+
+import gymnasium as gym
+import isaaclab_tasks  # noqa: F401
+import leisaac.tasks  # noqa: F401
+import torch
+from isaaclab.envs import ManagerBasedRLEnvCfg
+from isaaclab.managers import DatasetExportMode
+from isaaclab.utils.assets import retrieve_file_path
+from isaaclab_rl.rsl_rl import RslRlBaseRunnerCfg, RslRlVecEnvWrapper
+from isaaclab_tasks.utils import get_checkpoint_path
+from isaaclab_tasks.utils.hydra import hydra_task_config
+from leisaac.enhance.managers import EnhanceDatasetExportMode, StreamingRecorderManager
+from rsl_rl.runners import OnPolicyRunner
+
+
+@hydra_task_config(args_cli.task, args_cli.agent)
+def main(env_cfg: ManagerBasedRLEnvCfg, agent_cfg: RslRlBaseRunnerCfg):
+    """Play or record with RSL-RL agent."""
+    agent_cfg: RslRlBaseRunnerCfg = cli_args.update_rsl_rl_cfg(agent_cfg, args_cli)
+    env_cfg.scene.num_envs = args_cli.num_envs if args_cli.num_envs is not None else env_cfg.scene.num_envs
+    env_cfg.seed = agent_cfg.seed
+    env_cfg.sim.device = args_cli.device if args_cli.device is not None else env_cfg.sim.device
+
+    # resolve checkpoint path
+    log_root_path = os.path.abspath(os.path.join("logs", "rsl_rl", agent_cfg.experiment_name))
+    print(f"[INFO] Loading experiment from directory: {log_root_path}")
+    if args_cli.checkpoint:
+        resume_path = retrieve_file_path(args_cli.checkpoint)
+    else:
+        resume_path = get_checkpoint_path(log_root_path, agent_cfg.load_run, agent_cfg.load_checkpoint)
+
+    log_dir = os.path.dirname(resume_path)
+    env_cfg.log_dir = log_dir
+
+    # configure HDF5 recording
+    if args_cli.record:
+        output_dir = os.path.dirname(os.path.abspath(args_cli.dataset_file))
+        output_filename = os.path.splitext(os.path.basename(args_cli.dataset_file))[0]
+        if args_cli.resume_recording:
+            env_cfg.recorders.dataset_export_mode = EnhanceDatasetExportMode.EXPORT_ALL_RESUME
+            assert os.path.exists(
+                args_cli.dataset_file
+            ), "Dataset file does not exist. Remove --resume_recording to create a new one."
+        else:
+            env_cfg.recorders.dataset_export_mode = DatasetExportMode.EXPORT_ALL
+            assert not os.path.exists(
+                args_cli.dataset_file
+            ), "Dataset file already exists. Use --resume_recording to append, or choose a different path."
+        env_cfg.recorders.dataset_export_dir_path = output_dir
+        env_cfg.recorders.dataset_filename = output_filename
+    else:
+        env_cfg.recorders = None
+
+    env = gym.make(args_cli.task, cfg=env_cfg)
+
+    # replace recorder manager with streaming version for HDF5 recording
+    if args_cli.record:
+        unwrapped = env.unwrapped
+        del unwrapped.recorder_manager
+        unwrapped.recorder_manager = StreamingRecorderManager(env_cfg.recorders, unwrapped)
+        unwrapped.recorder_manager.flush_steps = 100
+        unwrapped.recorder_manager.compression = "lzf"
+
+    env = RslRlVecEnvWrapper(env, clip_actions=agent_cfg.clip_actions)
+
+    print(f"[INFO]: Loading model checkpoint from: {resume_path}")
+    runner = OnPolicyRunner(env, agent_cfg.to_dict(), log_dir=None, device=agent_cfg.device)
+    runner.load(resume_path)
+
+    policy = runner.get_inference_policy(device=env.unwrapped.device)
+
+    dt = env.unwrapped.step_dt
+    episode_count = 0
+    success_count = 0
+    interrupted = False
+
+    def signal_handler(signum, frame):
+        nonlocal interrupted
+        interrupted = True
+        print("\n[INFO] Interrupted. Cleaning up...")
+
+    original_sigint_handler = signal.signal(signal.SIGINT, signal_handler)
+
+    try:
+        obs = env.get_observations()
+        while (
+            simulation_app.is_running()
+            and not interrupted
+            and (args_cli.num_episodes <= 0 or episode_count < args_cli.num_episodes)
+        ):
+            start_time = time.time()
+            with torch.inference_mode():
+                actions = policy(obs)
+                obs, _, dones, extras = env.step(actions)
+
+            time_outs = extras.get("time_outs", torch.zeros_like(dones))
+            finished = dones.bool()
+            if finished.any():
+                successes = finished & ~time_outs.bool()
+                for i in range(finished.shape[0]):
+                    if finished[i]:
+                        print("Episode success!" if successes[i] else "Episode failed!")
+                episode_count += int(finished.sum().item())
+                success_count += int(successes.sum().item())
+                print(f"Success rate: {success_count / episode_count:.1%} ({success_count}/{episode_count})")
+
+            sleep_time = dt - (time.time() - start_time)
+            if args_cli.real_time and sleep_time > 0:
+                time.sleep(sleep_time)
+
+    except Exception as e:
+        import traceback
+
+        print(f"\n[ERROR] {e}\n")
+        traceback.print_exc()
+    finally:
+        signal.signal(signal.SIGINT, original_sigint_handler)
+        if args_cli.record and hasattr(env.unwrapped.recorder_manager, "finalize"):
+            env.unwrapped.recorder_manager.finalize()
+        env.close()
+
+
+if __name__ == "__main__":
+    main()
+    simulation_app.close()
diff --git a/scripts/datagen/rl/replay.py b/scripts/datagen/rl/replay.py
new file mode 100644
index 00000000..e5a5d115
--- /dev/null
+++ b/scripts/datagen/rl/replay.py
@@ -0,0 +1,143 @@
+"""Script to replay recorded RL episodes from HDF5 dataset."""
+
+import multiprocessing
+
+if multiprocessing.get_start_method() != "spawn":
+    multiprocessing.set_start_method("spawn", force=True)
+
+import argparse
+
+from isaaclab.app import AppLauncher
+
+parser = argparse.ArgumentParser(description="Replay recorded RL episodes.")
+parser.add_argument("--task", type=str, default=None, help="Name of the task.")
+parser.add_argument("--num_envs", type=int, default=1, help="Number of environments to simulate.")
+parser.add_argument(
+    "--dataset_file", type=str, default="./datasets/rl_eval.hdf5", help="File path to load recorded episodes."
+)
+parser.add_argument(
+    "--replay_mode",
+    type=str,
+    default="action",
+    choices=["action", "state"],
+    help="Replay mode: action replays actions, state replays joint states.",
+)
+parser.add_argument(
+    "--select_episodes",
+    type=int,
+    nargs="+",
+    default=[],
+    help="List of episode indices to replay. Empty = replay all.",
+)
+AppLauncher.add_app_launcher_args(parser)
+args_cli = parser.parse_args()
+
+app_launcher = AppLauncher(vars(args_cli))
+simulation_app = app_launcher.app
+
+import contextlib
+import os
+
+import gymnasium as gym
+import leisaac.tasks  # noqa: F401
+import torch
+from isaaclab.envs import ManagerBasedRLEnv
+from isaaclab.utils.datasets import EpisodeData, HDF5DatasetFileHandler
+from isaaclab_tasks.utils import parse_env_cfg
+
+
+def get_next_action(episode_data: EpisodeData, return_state: bool = False) -> torch.Tensor | None:
+    if return_state:
+        next_state = episode_data.get_next_state()
+        if next_state is None:
+            return None
+        return next_state["articulation"]["robot"]["joint_position"]
+    else:
+        return episode_data.get_next_action()
+
+
+def main():
+    if not os.path.exists(args_cli.dataset_file):
+        raise FileNotFoundError(f"Dataset file not found: {args_cli.dataset_file}")
+
+    dataset_file_handler = HDF5DatasetFileHandler()
+    dataset_file_handler.open(args_cli.dataset_file)
+    episode_count = dataset_file_handler.get_num_episodes()
+
+    if episode_count == 0:
+        print("No episodes found in the dataset.")
+        return
+
+    episode_indices_to_replay = args_cli.select_episodes or list(range(episode_count))
+    num_envs = args_cli.num_envs
+
+    env_cfg = parse_env_cfg(args_cli.task, device=args_cli.device, num_envs=num_envs)
+    env_cfg.recorders = {}
+    env_cfg.terminations = {}
+
+    env: ManagerBasedRLEnv = gym.make(args_cli.task, cfg=env_cfg).unwrapped
+
+    idle_action = torch.zeros(env.action_space.shape)
+
+    if hasattr(env, "initialize"):
+        env.initialize()
+    env.reset()
+
+    episode_names = list(dataset_file_handler.get_episode_names())
+    replayed_episode_count = 0
+
+    with contextlib.suppress(KeyboardInterrupt) and torch.inference_mode():
+        while simulation_app.is_running() and not simulation_app.is_exiting():
+            env_episode_data_map = {i: EpisodeData() for i in range(num_envs)}
+            has_next_action = True
+            while has_next_action:
+                actions = idle_action.clone()
+                has_next_action = False
+                for env_id in range(num_envs):
+                    env_next_action = get_next_action(
+                        env_episode_data_map[env_id],
+                        return_state=args_cli.replay_mode == "state",
+                    )
+                    if env_next_action is None:
+                        next_episode_index = None
+                        while episode_indices_to_replay:
+                            next_episode_index = episode_indices_to_replay.pop(0)
+                            if next_episode_index < episode_count:
+                                break
+                            next_episode_index = None
+
+                        if next_episode_index is not None:
+                            replayed_episode_count += 1
+                            print(f"{replayed_episode_count:4}: Loading #{next_episode_index} episode to env_{env_id}")
+                            episode_data = dataset_file_handler.load_episode(
+                                episode_names[next_episode_index], env.device
+                            )
+                            env_episode_data_map[env_id] = episode_data
+                            initial_state = episode_data.get_initial_state()
+                            env.reset_to(
+                                initial_state,
+                                torch.tensor([env_id], device=env.device),
+                                seed=int(episode_data.seed) if episode_data.seed is not None else None,
+                                is_relative=True,
+                            )
+                            env_next_action = get_next_action(
+                                env_episode_data_map[env_id],
+                                return_state=args_cli.replay_mode == "state",
+                            )
+                            has_next_action = True
+                        else:
+                            continue
+                    else:
+                        has_next_action = True
+                    actions[env_id] = env_next_action
+
+                env.step(actions)
+            break
+
+    print(f"Finished replaying {replayed_episode_count} episode{'s' if replayed_episode_count != 1 else ''}.")
+    env.close()
+
+
+if __name__ == "__main__":
+    main()
+    simulation_app.close()
diff --git a/scripts/datagen/rl/train.py b/scripts/datagen/rl/train.py
new file mode 100644
index 00000000..9db31b75
--- /dev/null
+++ b/scripts/datagen/rl/train.py
@@ -0,0 +1,120 @@
+"""Train a leisaac RL task with RSL-RL (PPO) via IsaacLab's training script."""
+
+import os
+import sys
+
+# Make cli_args importable (local module in IsaacLab's rsl_rl script directory)
+_ISAACLAB_RSL_RL_DIR = os.path.join(
+    os.path.dirname(__file__),
+    "..",
+    "..",
+    "..",
+    "dependencies",
+    "IsaacLab",
+    "scripts",
+    "reinforcement_learning",
+    "rsl_rl",
+)
+sys.path.insert(0, os.path.abspath(_ISAACLAB_RSL_RL_DIR))
+
+import argparse
+
+from isaaclab.app import AppLauncher
+
+import cli_args  # isort: skip
+
+parser = argparse.ArgumentParser(description="Train a leisaac RL task with RSL-RL.")
+parser.add_argument("--num_envs", type=int, default=None, help="Number of environments to simulate.")
+parser.add_argument("--task", type=str, default=None, help="Name of the task.")
+parser.add_argument(
+    "--agent", type=str, default="rsl_rl_cfg_entry_point", help="Name of the RL agent configuration entry point."
+)
+parser.add_argument("--seed", type=int, default=None, help="Seed used for the environment.")
+parser.add_argument("--max_iterations", type=int, default=None, help="RL policy training iterations.")
+parser.add_argument(
+    "--distributed", action="store_true", default=False, help="Run training with multiple GPUs or nodes."
+)
+cli_args.add_rsl_rl_args(parser)
+AppLauncher.add_app_launcher_args(parser)
+args_cli, hydra_args = parser.parse_known_args()
+
+sys.argv = [sys.argv[0]] + hydra_args
+
+app_launcher = AppLauncher(args_cli)
+simulation_app = app_launcher.app
+
+from datetime import datetime
+
+import gymnasium as gym
+import isaaclab_tasks  # noqa: F401
+import leisaac.tasks  # noqa: F401
+import torch
+from isaaclab.envs import ManagerBasedRLEnvCfg
+from isaaclab.utils.io import dump_yaml
+from isaaclab_rl.rsl_rl import RslRlBaseRunnerCfg, RslRlVecEnvWrapper
+from isaaclab_tasks.utils import get_checkpoint_path
+from isaaclab_tasks.utils.hydra import hydra_task_config
+from rsl_rl.runners import OnPolicyRunner
+
+torch.backends.cuda.matmul.allow_tf32 = True
+torch.backends.cudnn.allow_tf32 = True
+torch.backends.cudnn.deterministic = False
+torch.backends.cudnn.benchmark = False
+
+
+@hydra_task_config(args_cli.task, args_cli.agent)
+def main(env_cfg: ManagerBasedRLEnvCfg, agent_cfg: RslRlBaseRunnerCfg):
+    """Train with RSL-RL agent."""
+    agent_cfg = cli_args.update_rsl_rl_cfg(agent_cfg, args_cli)
+    env_cfg.scene.num_envs = args_cli.num_envs if args_cli.num_envs is not None else env_cfg.scene.num_envs
+    agent_cfg.max_iterations = (
+        args_cli.max_iterations if args_cli.max_iterations is not None else agent_cfg.max_iterations
+    )
+
+    env_cfg.seed = agent_cfg.seed
+    env_cfg.sim.device = args_cli.device if args_cli.device is not None else env_cfg.sim.device
+
+    if args_cli.distributed and args_cli.device is not None and "cpu" in args_cli.device:
+        raise ValueError("Distributed training is not supported when using CPU device.")
+
+    if args_cli.distributed:
+        env_cfg.sim.device = f"cuda:{app_launcher.local_rank}"
+        agent_cfg.device = f"cuda:{app_launcher.local_rank}"
+        seed = agent_cfg.seed + app_launcher.local_rank
+        env_cfg.seed = seed
+        agent_cfg.seed = seed
+
+    log_root_path = os.path.abspath(os.path.join("logs", "rsl_rl", agent_cfg.experiment_name))
+    print(f"[INFO] Logging experiment in directory: {log_root_path}")
+    log_dir = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
+    if agent_cfg.run_name:
+        log_dir += f"_{agent_cfg.run_name}"
+    log_dir = os.path.join(log_root_path, log_dir)
+
+    env_cfg.log_dir = log_dir
+
+    env = gym.make(args_cli.task, cfg=env_cfg)
+
+    if agent_cfg.resume:
+        resume_path = get_checkpoint_path(log_root_path, agent_cfg.load_run, agent_cfg.load_checkpoint)
+
+    env = RslRlVecEnvWrapper(env, clip_actions=agent_cfg.clip_actions)
+
+    runner = OnPolicyRunner(env, agent_cfg.to_dict(), log_dir=log_dir, device=agent_cfg.device)
+    runner.add_git_repo_to_log(__file__)
+
+    if agent_cfg.resume:
+        print(f"[INFO]: Loading model checkpoint from: {resume_path}")
+        runner.load(resume_path)
+
+    dump_yaml(os.path.join(log_dir, "params", "env.yaml"), env_cfg)
+    dump_yaml(os.path.join(log_dir, "params", "agent.yaml"), agent_cfg)
+
+    runner.learn(num_learning_iterations=agent_cfg.max_iterations, init_at_random_ep_len=True)
+
+    env.close()
+
+
+if __name__ == "__main__":
+    main()
+    simulation_app.close()
diff --git a/scripts/datagen/state_machine/generate.py b/scripts/datagen/state_machine/generate.py
index 6d73407e..cb402451 100644
--- a/scripts/datagen/state_machine/generate.py
+++ b/scripts/datagen/state_machine/generate.py
@@ -58,7 +58,7 @@
 
 # Maps gym task id → (StateMachineClass, device_type)
 TASK_REGISTRY = {
-    "LeIsaac-SO101-PickOrange-v0": (PickOrangeStateMachine, "so101_state_machine"),
+    "LeIsaac-SO101-PickOrange-v0": (PickOrangeStateMachine, "ik_so101leader"),
 }
 
 
diff --git a/scripts/datagen/state_machine/replay.py b/scripts/datagen/state_machine/replay.py
index e8744afc..7a2224aa 100644
--- a/scripts/datagen/state_machine/replay.py
+++ b/scripts/datagen/state_machine/replay.py
@@ -35,8 +35,8 @@
     type=str,
     default=None,
     help=(
-        "State machine device type used during recording, e.g. 'so101_state_machine' or "
-        "'bi_so101_state_machine'. If not set, inferred from the task name."
+        "State machine device type used during recording, e.g. 'ik_so101leader' or "
+        "'bi_ik_so101leader'. If not set, inferred from the task name."
     ),
 )
 
@@ -83,7 +83,7 @@ def get_next_action(episode_data: EpisodeData, return_state: bool = False, task_
         next_state = episode_data.get_next_state()
         if next_state is None:
             return None
-        if task_type == "bi_so101_state_machine":
+        if task_type == "bi_ik_so101leader":
             left_joint_pos = next_state["articulation"]["left_arm"]["joint_position"]
             right_joint_pos = next_state["articulation"]["right_arm"]["joint_position"]
             return torch.cat([left_joint_pos, right_joint_pos], dim=0)
@@ -95,9 +95,9 @@ def get_next_action(episode_data: EpisodeData, return_state: bool = False, task_
 
 def apply_damping(env, task_type: str):
     """Apply joint damping each step to match state-machine recording behavior."""
-    if task_type == "so101_state_machine":
+    if task_type == "ik_so101leader":
         env.scene["robot"].write_joint_damping_to_sim(damping=10.0)
-    elif task_type == "bi_so101_state_machine":
+    elif task_type == "bi_ik_so101leader":
         env.scene["left_arm"].write_joint_damping_to_sim(damping=10.0)
         env.scene["right_arm"].write_joint_damping_to_sim(damping=10.0)
 
diff --git a/source/leisaac/leisaac/devices/action_process.py b/source/leisaac/leisaac/devices/action_process.py
index 6abcc5dd..665446c9 100644
--- a/source/leisaac/leisaac/devices/action_process.py
+++ b/source/leisaac/leisaac/devices/action_process.py
@@ -76,7 +76,7 @@ def init_action_cfg(action_cfg, device):
             joint_names=["gripper"],
             scale=1.0,
         )
-    elif device in ["so101_state_machine"]:  # IK-based: action = EE pose (7D) + binary gripper, not raw joint angles
+    elif device in ["ik_so101leader"]:  # IK-based: action = EE pose (7D) + binary gripper, not raw joint angles
         action_cfg.arm_action = mdp.DifferentialInverseKinematicsActionCfg(
             asset_name="robot",
             joint_names=["shoulder_pan", "shoulder_lift", "elbow_flex", "wrist_flex", "wrist_roll"],
@@ -91,7 +91,25 @@ def init_action_cfg(action_cfg, device):
             open_command_expr={"gripper": 1.0},
             close_command_expr={"gripper": 0.4},
         )
-    elif device in ["bi_so101_state_machine"]:  # IK-based: action = EE pose (7D) + binary gripper, not raw joint angles
+    elif device in ["rl_so101leader"]:  # RL: delta EE pose (6D) + binary gripper (1D) = 7D total
+        # use_relative_mode=True: action = (dx, dy, dz, droll, dpitch, dyaw)
+        # translation scale=0.02 → ±2cm/step; rotation scale=0.5 → ±0.5rad/step (~28.6°) at max action
+        action_cfg.arm_action = mdp.DifferentialInverseKinematicsActionCfg(
+            asset_name="robot",
+            joint_names=["shoulder_pan", "shoulder_lift", "elbow_flex", "wrist_flex", "wrist_roll"],
+            body_name="gripper",
+            scale=(0.02, 0.02, 0.02, 0.5, 0.5, 0.5),
+            controller=mdp.DifferentialIKControllerCfg(
+                command_type="pose", ik_method="dls", use_relative_mode=True, ik_params={"lambda_val": 0.04}
+            ),
+        )
+        action_cfg.gripper_action = mdp.BinaryJointPositionActionCfg(
+            asset_name="robot",
+            joint_names=["gripper"],
+            open_command_expr={"gripper": 1.0},
+            close_command_expr={"gripper": 0.2},
+        )
+    elif device in ["bi_ik_so101leader"]:  # IK-based: action = EE pose (7D) + binary gripper, not raw joint angles
         action_cfg.left_arm_action = mdp.DifferentialInverseKinematicsActionCfg(
             asset_name="left_arm",
             joint_names=["shoulder_pan", "shoulder_lift", "elbow_flex", "wrist_flex", "wrist_roll"],
@@ -120,6 +138,38 @@ def init_action_cfg(action_cfg, device):
             open_command_expr={"gripper": 1.0},
             close_command_expr={"gripper": 0.01},
         )
+    elif device in ["bi_rl_so101leader"]:  # RL: delta EE control for both arms, 6D×2 + binary gripper×2 = 14D total
+        # use_relative_mode=True: action = (dx, dy, dz, droll, dpitch, dyaw) per arm, scale=0.05 → ±5cm/step
+        action_cfg.left_arm_action = mdp.DifferentialInverseKinematicsActionCfg(
+            asset_name="left_arm",
+            joint_names=["shoulder_pan", "shoulder_lift", "elbow_flex", "wrist_flex", "wrist_roll"],
+            body_name="gripper",
+            scale=0.05,
+            controller=mdp.DifferentialIKControllerCfg(
+                command_type="pose", ik_method="dls", use_relative_mode=True, ik_params={"lambda_val": 0.04}
+            ),
+        )
+        action_cfg.left_gripper_action = mdp.BinaryJointPositionActionCfg(
+            asset_name="left_arm",
+            joint_names=["gripper"],
+            open_command_expr={"gripper": 1.0},
+            close_command_expr={"gripper": 0.01},
+        )
+        action_cfg.right_arm_action = mdp.DifferentialInverseKinematicsActionCfg(
+            asset_name="right_arm",
+            joint_names=["shoulder_pan", "shoulder_lift", "elbow_flex", "wrist_flex", "wrist_roll"],
+            body_name="gripper",
+            scale=0.05,
+            controller=mdp.DifferentialIKControllerCfg(
+                command_type="pose", ik_method="dls", use_relative_mode=True, ik_params={"lambda_val": 0.04}
+            ),
+        )
+        action_cfg.right_gripper_action = mdp.BinaryJointPositionActionCfg(
+            asset_name="right_arm",
+            joint_names=["gripper"],
+            open_command_expr={"gripper": 1.0},
+            close_command_expr={"gripper": 0.01},
+        )
     """LeKiwi action configuration"""
     if device in ["lekiwi-leader", "lekiwi-keyboard", "lekiwi-gamepad"]:
         action_cfg.wheel_action = mdp.JointVelocityActionCfg(
diff --git a/source/leisaac/leisaac/tasks/lift_cube/__init__.py b/source/leisaac/leisaac/tasks/lift_cube/__init__.py
index 54e2d807..71f13b21 100644
--- a/source/leisaac/leisaac/tasks/lift_cube/__init__.py
+++ b/source/leisaac/leisaac/tasks/lift_cube/__init__.py
@@ -1,5 +1,7 @@
 import gymnasium as gym
 
+from . import rl_agents
+
 gym.register(
     id="LeIsaac-SO101-LiftCube-v0",
     entry_point="isaaclab.envs:ManagerBasedRLEnv",
@@ -9,6 +11,16 @@
     },
 )
 
+gym.register(
+    id="LeIsaac-SO101-LiftCube-RL-v0",
+    entry_point="isaaclab.envs:ManagerBasedRLEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": f"{__name__}.lift_cube_rl_env_cfg:LiftCubeRLEnvCfg",
+        "rsl_rl_cfg_entry_point": f"{rl_agents.__name__}.rsl_rl_ppo_cfg:LiftCubeRLPPORunnerCfg",
+    },
+)
+
 gym.register(
     id="LeIsaac-SO101-LiftCube-DigitalTwin-v0",
     entry_point="leisaac.enhance.envs:ManagerBasedRLDigitalTwinEnv",
@@ -20,7 +32,7 @@
 
 gym.register(
     id="LeIsaac-SO101-LiftCube-Mimic-v0",
-    entry_point=f"leisaac.enhance.envs:ManagerBasedRLLeIsaacMimicEnv",
+    entry_point="leisaac.enhance.envs:ManagerBasedRLLeIsaacMimicEnv",
     disable_env_checker=True,
     kwargs={
         "env_cfg_entry_point": f"{__name__}.lift_cube_mimic_env_cfg:LiftCubeMimicEnvCfg",
diff --git a/source/leisaac/leisaac/tasks/lift_cube/lift_cube_rl_env_cfg.py b/source/leisaac/leisaac/tasks/lift_cube/lift_cube_rl_env_cfg.py
new file mode 100644
index 00000000..cad82d03
--- /dev/null
+++ b/source/leisaac/leisaac/tasks/lift_cube/lift_cube_rl_env_cfg.py
@@ -0,0 +1,122 @@
+import math
+
+from isaaclab.managers import ObservationGroupCfg as ObsGroup
+from isaaclab.managers import ObservationTermCfg as ObsTerm
+from isaaclab.managers import RewardTermCfg as RewTerm
+from isaaclab.managers import SceneEntityCfg
+from isaaclab.managers import TerminationTermCfg as DoneTerm
+from isaaclab.utils import configclass
+
+from . import mdp
+from .lift_cube_env_cfg import LiftCubeEnvCfg, LiftCubeSceneCfg
+
+_CUBE_CFG = SceneEntityCfg("cube")
+_ROBOT_CFG = SceneEntityCfg("robot")
+
+
+@configclass
+class LiftCubeRLObservationsCfg:
+    """Flat vector observations for RL (26D total).
+
+    joint_pos(6) + joint_vel(6) + ee_frame_state(7) + cube_rel_ee(3) + cube_quat(4) = 26D
+    """
+
+    @configclass
+    class PolicyCfg(ObsGroup):
+        joint_pos = ObsTerm(func=mdp.joint_pos)
+        joint_vel = ObsTerm(func=mdp.joint_vel)
+        ee_frame_state = ObsTerm(
+            func=mdp.ee_frame_state,
+            params={"ee_frame_cfg": SceneEntityCfg("ee_frame"), "robot_cfg": _ROBOT_CFG},
+        )
+        cube_pos_relative_to_ee = ObsTerm(
+            func=mdp.cube_pos_relative_to_ee,
+            params={"cube_cfg": _CUBE_CFG, "ee_frame_cfg": SceneEntityCfg("ee_frame")},
+        )
+        cube_quat = ObsTerm(
+            func=mdp.cube_quat,
+            params={"cube_cfg": _CUBE_CFG},
+        )
+
+        def __post_init__(self):
+            self.enable_corruption = False
+            self.concatenate_terms = True
+
+    policy: PolicyCfg = PolicyCfg()
+
+
+@configclass
+class LiftCubeRLRewardsCfg:
+    """Reward terms for lift-cube RL training."""
+
+    cube_success = RewTerm(
+        func=mdp.cube_success_bonus,
+        weight=200.0,
+        params={"cube_cfg": _CUBE_CFG, "robot_cfg": _ROBOT_CFG, "height_threshold": 0.20},
+    )
+    ee_to_cube = RewTerm(
+        func=mdp.ee_to_cube_reward,
+        weight=1.5,
+        params={"cube_cfg": _CUBE_CFG, "robot_cfg": _ROBOT_CFG},
+    )
+    cube_height = RewTerm(
+        func=mdp.cube_height_reward,
+        weight=10.0,
+        params={"cube_cfg": _CUBE_CFG, "robot_cfg": _ROBOT_CFG, "min_height": 0.046, "k": 3.0},
+    )
+
+
+@configclass
+class LiftCubeRLTerminationsCfg:
+    """Terminations: timeout + early success."""
+
+    time_out = DoneTerm(func=mdp.time_out, time_out=True)
+    success = DoneTerm(
+        func=mdp.cube_height_above_base,
+        params={"cube_cfg": _CUBE_CFG, "robot_cfg": _ROBOT_CFG, "height_threshold": 0.20},
+    )
+
+
+@configclass
+class LiftCubeRLEnvCfg(LiftCubeEnvCfg):
+    """RL-specific configuration for the lift-cube environment.
+
+    Observations: 26D flat vector.
+    Action space: 7D (rl_so101leader: 6D delta EE pose + 1D binary gripper).
+    Cameras disabled for faster training.
+    """
+
+    scene: LiftCubeSceneCfg = LiftCubeSceneCfg(env_spacing=8.0)
+    observations: LiftCubeRLObservationsCfg = LiftCubeRLObservationsCfg()
+    rewards: LiftCubeRLRewardsCfg = LiftCubeRLRewardsCfg()
+    terminations: LiftCubeRLTerminationsCfg = LiftCubeRLTerminationsCfg()
+
+    def __post_init__(self) -> None:
+        super().__post_init__()
+
+        self.use_teleop_device("rl_so101leader")
+
+        self.scene.front = None
+
+        for attr_name in list(vars(self.events).keys()):
+            term = getattr(self.events, attr_name, None)
+            if (
+                hasattr(term, "params")
+                and isinstance(term.params.get("asset_cfg"), SceneEntityCfg)
+                and term.params["asset_cfg"].name == "front"
+            ):
+                delattr(self.events, attr_name)
+
+        self.episode_length_s = 15.0
+
+        for actuator_cfg in self.scene.robot.actuators.values():
+            actuator_cfg.damping = 8.0
+
+        self.scene.robot.init_state.joint_pos = {
+            "shoulder_pan": 0.0,
+            "shoulder_lift": math.radians(-100.0),
+            "elbow_flex": math.radians(90.0),
+            "wrist_flex": math.radians(50.0),
+            "wrist_roll": 0.0,
+            "gripper": 1.0,
+        }
diff --git a/source/leisaac/leisaac/tasks/lift_cube/mdp/__init__.py b/source/leisaac/leisaac/tasks/lift_cube/mdp/__init__.py
index 38ebdb6c..1d602cf2 100644
--- a/source/leisaac/leisaac/tasks/lift_cube/mdp/__init__.py
+++ b/source/leisaac/leisaac/tasks/lift_cube/mdp/__init__.py
@@ -2,4 +2,5 @@
 from leisaac.enhance.envs.mdp import *
 
 from .observations import *
+from .rewards import *
 from .terminations import *
diff --git a/source/leisaac/leisaac/tasks/lift_cube/mdp/observations.py b/source/leisaac/leisaac/tasks/lift_cube/mdp/observations.py
index eb808e5b..985a4af1 100644
--- a/source/leisaac/leisaac/tasks/lift_cube/mdp/observations.py
+++ b/source/leisaac/leisaac/tasks/lift_cube/mdp/observations.py
@@ -25,3 +25,24 @@ def object_grasped(
     grasped = torch.logical_and(pos_diff < diff_threshold, robot.data.joint_pos[:, -1] < grasp_threshold)
 
     return grasped
+
+
+def cube_quat(
+    env: ManagerBasedRLEnv,
+    cube_cfg: SceneEntityCfg = SceneEntityCfg("cube"),
+) -> torch.Tensor:
+    """Returns cube orientation as quaternion (w, x, y, z) in world frame. (num_envs, 4)"""
+    cube: RigidObject = env.scene[cube_cfg.name]
+    return cube.data.root_quat_w
+
+
+def cube_pos_relative_to_ee(
+    env: ManagerBasedRLEnv,
+    cube_cfg: SceneEntityCfg = SceneEntityCfg("cube"),
+    ee_frame_cfg: SceneEntityCfg = SceneEntityCfg("ee_frame"),
+) -> torch.Tensor:
+    """Returns cube position relative to the EE jaw. (num_envs, 3)"""
+    ee_frame: FrameTransformer = env.scene[ee_frame_cfg.name]
+    ee_pos = ee_frame.data.target_pos_w[:, 1, :]  # jaw position
+    cube: RigidObject = env.scene[cube_cfg.name]
+    return cube.data.root_pos_w - ee_pos
diff --git a/source/leisaac/leisaac/tasks/lift_cube/mdp/rewards.py b/source/leisaac/leisaac/tasks/lift_cube/mdp/rewards.py
new file mode 100644
index 00000000..c164e275
--- /dev/null
+++ b/source/leisaac/leisaac/tasks/lift_cube/mdp/rewards.py
@@ -0,0 +1,113 @@
+"""Reward functions for the LiftCube RL task.
+
+Reward structure
+----------------
+The task uses three complementary reward terms to guide the policy through a
+natural lifting curriculum:
+
+1. ee_to_cube  (weight 1.5, shaped)
+   Guides the end-effector toward the cube before and during grasping.
+   Formula: 1 - tanh(k * dist(TCP, cube))
+   - Peaks at 1.0 when TCP is exactly at the cube center, decays smoothly with distance.
+   - k = 5.0 gives a reach radius of ~10 cm before the reward drops below 0.5.
+
+2. cube_height  (weight 10.0, shaped)
+   Encourages lifting the cube above the table surface.
+   Formula: tanh(k * max(h - min_height, 0))
+   - Zero below min_height (4.6 cm above robot base) — ignores ground-level noise.
+   - Monotonically increasing above min_height. Range [0, 1].
+
+3. cube_success  (weight 200.0, sparse)
+   One-time terminal bonus when h >= 20 cm. Also triggers early episode
+   termination so the policy does not receive further shaped reward after success.
+
+TCP computation
+---------------
+The Tool Center Point (TCP) is the midpoint between the two fingertip contact
+surfaces, computed from body_pos_w / body_quat_w plus calibrated local offsets
+derived from the USD collision mesh geometry:
+  - jaw body local frame  : (0.0, -0.05, 0.02)
+  - gripper body local frame : (-0.012, 0.0, -0.08)
+
+Height reference
+----------------
+All heights are measured relative to the robot base link ("base") in the world
+frame, making the reward invariant to the absolute table / mount height.
+"""
+
+import torch
+from isaaclab.assets import Articulation, RigidObject
+from isaaclab.envs import ManagerBasedRLEnv
+from isaaclab.managers import SceneEntityCfg
+from isaaclab.utils.math import quat_apply
+
+# Fingertip offsets in each body's local frame, calibrated from USD collision mesh geometry
+_JAW_TIP_OFFSET = torch.tensor([0.0, -0.05, 0.02])
+_GRIPPER_TIP_OFFSET = torch.tensor([-0.012, 0.0, -0.08])
+
+
+def _tcp_pos(env: ManagerBasedRLEnv, robot_cfg: SceneEntityCfg) -> torch.Tensor:
+    """TCP position: midpoint between jaw tip and gripper tip in world frame. (num_envs, 3)"""
+    robot: Articulation = env.scene[robot_cfg.name]
+    jaw_idx = robot.data.body_names.index("jaw")
+    gripper_idx = robot.data.body_names.index("gripper")
+
+    jaw_pos = robot.data.body_pos_w[:, jaw_idx, :]
+    jaw_quat = robot.data.body_quat_w[:, jaw_idx, :]
+    gripper_pos = robot.data.body_pos_w[:, gripper_idx, :]
+    gripper_quat = robot.data.body_quat_w[:, gripper_idx, :]
+
+    device = jaw_pos.device
+    N = jaw_pos.shape[0]
+    jaw_offset = _JAW_TIP_OFFSET.to(device).unsqueeze(0).expand(N, -1)
+    gripper_offset = _GRIPPER_TIP_OFFSET.to(device).unsqueeze(0).expand(N, -1)
+
+    jaw_tip = jaw_pos + quat_apply(jaw_quat, jaw_offset)
+    gripper_tip = gripper_pos + quat_apply(gripper_quat, gripper_offset)
+    return (jaw_tip + gripper_tip) / 2.0
+
+
+def _cube_pos(env: ManagerBasedRLEnv, cube_cfg: SceneEntityCfg) -> torch.Tensor:
+    cube: RigidObject = env.scene[cube_cfg.name]
+    return cube.data.root_pos_w  # (num_envs, 3)
+
+
+def _robot_base_height(env: ManagerBasedRLEnv, robot_cfg: SceneEntityCfg, base_name: str = "base") -> torch.Tensor:
+    robot: Articulation = env.scene[robot_cfg.name]
+    base_index = robot.data.body_names.index(base_name)
+    return robot.data.body_pos_w[:, base_index, 2]  # (num_envs,)
+
+
+def ee_to_cube_reward(
+    env: ManagerBasedRLEnv,
+    cube_cfg: SceneEntityCfg = SceneEntityCfg("cube"),
+    robot_cfg: SceneEntityCfg = SceneEntityCfg("robot"),
+    k: float = 5.0,
+) -> torch.Tensor:
+    """Reaching reward: 1 - tanh(k * dist) from TCP (grasp midpoint) to cube center. Range [0, 1]."""
+    dist = torch.linalg.vector_norm(_cube_pos(env, cube_cfg) - _tcp_pos(env, robot_cfg), dim=1)
+    return 1.0 - torch.tanh(k * dist)
+
+
+def cube_height_reward(
+    env: ManagerBasedRLEnv,
+    cube_cfg: SceneEntityCfg = SceneEntityCfg("cube"),
+    robot_cfg: SceneEntityCfg = SceneEntityCfg("robot"),
+    min_height: float = 0.046,
+    k: float = 5.0,
+) -> torch.Tensor:
+    """Height reward: tanh(k * max(h - min_height, 0)). Zero below min_height, monotonically increasing above. Range [0, 1]."""
+    height_above_base = _cube_pos(env, cube_cfg)[:, 2] - _robot_base_height(env, robot_cfg)
+    h = torch.clamp(height_above_base - min_height, min=0.0)
+    return torch.tanh(k * h)
+
+
+def cube_success_bonus(
+    env: ManagerBasedRLEnv,
+    cube_cfg: SceneEntityCfg = SceneEntityCfg("cube"),
+    robot_cfg: SceneEntityCfg = SceneEntityCfg("robot"),
+    height_threshold: float = 0.20,
+) -> torch.Tensor:
+    """One-time large bonus when cube reaches the success height threshold."""
+    height_above_base = _cube_pos(env, cube_cfg)[:, 2] - _robot_base_height(env, robot_cfg)
+    return (height_above_base >= height_threshold).float()
diff --git a/source/leisaac/leisaac/tasks/lift_cube/rl_agents/__init__.py b/source/leisaac/leisaac/tasks/lift_cube/rl_agents/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/source/leisaac/leisaac/tasks/lift_cube/rl_agents/rsl_rl_ppo_cfg.py b/source/leisaac/leisaac/tasks/lift_cube/rl_agents/rsl_rl_ppo_cfg.py
new file mode 100644
index 00000000..c8d68ad3
--- /dev/null
+++ b/source/leisaac/leisaac/tasks/lift_cube/rl_agents/rsl_rl_ppo_cfg.py
@@ -0,0 +1,39 @@
+from isaaclab.utils import configclass
+from isaaclab_rl.rsl_rl import (
+    RslRlOnPolicyRunnerCfg,
+    RslRlPpoActorCriticCfg,
+    RslRlPpoAlgorithmCfg,
+)
+
+
+@configclass
+class LiftCubeRLPPORunnerCfg(RslRlOnPolicyRunnerCfg):
+    num_steps_per_env = 100
+    max_iterations = 1500
+    save_interval = 50
+    experiment_name = "lift_cube_rl"
+    obs_groups = {"actor": ["policy"], "critic": ["policy"]}
+
+    policy = RslRlPpoActorCriticCfg(
+        init_noise_std=0.3,
+        actor_obs_normalization=True,
+        critic_obs_normalization=True,
+        actor_hidden_dims=[256, 128, 64],
+        critic_hidden_dims=[256, 128, 64],
+        activation="elu",
+    )
+
+    algorithm = RslRlPpoAlgorithmCfg(
+        value_loss_coef=1.0,
+        use_clipped_value_loss=True,
+        clip_param=0.1,
+        entropy_coef=0.005,
+        num_learning_epochs=5,
+        num_mini_batches=4,
+        learning_rate=1.0e-4,
+        schedule="adaptive",
+        gamma=0.95,
+        lam=0.95,
+        desired_kl=0.01,
+        max_grad_norm=0.5,
+    )
diff --git a/source/leisaac/leisaac/tasks/pick_orange/agents/__init__.py b/source/leisaac/leisaac/tasks/pick_orange/agents/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/source/leisaac/leisaac/tasks/template/bi_arm_env_cfg.py b/source/leisaac/leisaac/tasks/template/bi_arm_env_cfg.py
index 6a07c6ff..ea2d2af9 100644
--- a/source/leisaac/leisaac/tasks/template/bi_arm_env_cfg.py
+++ b/source/leisaac/leisaac/tasks/template/bi_arm_env_cfg.py
@@ -211,7 +211,7 @@ def __post_init__(self) -> None:
     def use_teleop_device(self, teleop_device) -> None:
         self.task_type = teleop_device
         self.actions = init_action_cfg(self.actions, device=teleop_device)
-        if teleop_device in ["bi_so101_state_machine"]:
+        if teleop_device in ["bi_ik_so101leader"]:
             self.scene.left_arm.spawn.rigid_props.disable_gravity = True
             self.scene.right_arm.spawn.rigid_props.disable_gravity = True
 
diff --git a/source/leisaac/leisaac/tasks/template/direct/bi_arm_env.py b/source/leisaac/leisaac/tasks/template/direct/bi_arm_env.py
index 614cdd5f..f0a92960 100644
--- a/source/leisaac/leisaac/tasks/template/direct/bi_arm_env.py
+++ b/source/leisaac/leisaac/tasks/template/direct/bi_arm_env.py
@@ -92,7 +92,7 @@ def __post_init__(self) -> None:
     def use_teleop_device(self, teleop_device) -> None:
         self.task_type = teleop_device
         # self.actions = init_action_cfg(self.actions, device=teleop_device)
-        if teleop_device in ["bi_so101_state_machine"]:
+        if teleop_device in ["bi_ik_so101leader"]:
             self.scene.left_arm.spawn.rigid_props.disable_gravity = True
             self.scene.right_arm.spawn.rigid_props.disable_gravity = True
 
diff --git a/source/leisaac/leisaac/tasks/template/direct/single_arm_env.py b/source/leisaac/leisaac/tasks/template/direct/single_arm_env.py
index aa7ea7a6..147cd370 100644
--- a/source/leisaac/leisaac/tasks/template/direct/single_arm_env.py
+++ b/source/leisaac/leisaac/tasks/template/direct/single_arm_env.py
@@ -83,7 +83,7 @@ def __post_init__(self) -> None:
 
     def use_teleop_device(self, teleop_device) -> None:
         self.task_type = teleop_device
-        if teleop_device in ["keyboard", "gamepad", "so101_state_machine"]:
+        if teleop_device in ["keyboard", "gamepad", "ik_so101leader"]:
             self.scene.robot.spawn.rigid_props.disable_gravity = True
 
     def preprocess_device_action(self, action: dict[str, Any], teleop_device) -> torch.Tensor:
diff --git a/source/leisaac/leisaac/tasks/template/single_arm_env_cfg.py b/source/leisaac/leisaac/tasks/template/single_arm_env_cfg.py
index b11664a3..a2acdeb1 100644
--- a/source/leisaac/leisaac/tasks/template/single_arm_env_cfg.py
+++ b/source/leisaac/leisaac/tasks/template/single_arm_env_cfg.py
@@ -194,7 +194,7 @@ def __post_init__(self) -> None:
     def use_teleop_device(self, teleop_device) -> None:
         self.task_type = teleop_device
         self.actions = init_action_cfg(self.actions, device=teleop_device)
-        if teleop_device in ["keyboard", "gamepad", "so101_state_machine"]:
+        if teleop_device in ["keyboard", "gamepad", "ik_so101leader", "rl_so101leader"]:
             self.scene.robot.spawn.rigid_props.disable_gravity = True
 
     def preprocess_device_action(self, action: dict[str, Any], teleop_device) -> torch.Tensor: