diff --git a/docs/docs/docs/features/rl_training.md b/docs/docs/docs/features/rl_training.md new file mode 100644 index 00000000..a254e901 --- /dev/null +++ b/docs/docs/docs/features/rl_training.md @@ -0,0 +1,212 @@ +# RL Training + +The RL training module enables training manipulation policies with reinforcement learning using [rsl_rl](https://github.com/leggedrobotics/rsl_rl) (PPO). It runs fully in simulation with parallel environments and no human teleoperation required. + +:::note +End-to-end RL for manipulation is challenging — reward design, exploration, and sim-to-real transfer all require significant task-specific tuning. Currently only the **LiftCube** task is supported. Support for additional tasks will be added in future updates. +::: + +## Training + +```shell +python scripts/datagen/rl/train.py \ + --task LeIsaac-SO101-LiftCube-RL-v0 \ + --num_envs 512 \ + --max_iterations 1500 \ + --headless +``` + +
+Parameter descriptions for train.py + +- `--task`: Gym task ID to train. Required. + +- `--num_envs`: Number of parallel simulation environments. More environments = faster data collection. Default: value from task config. + +- `--max_iterations`: Number of PPO update iterations. Default: value from agent config. + +- `--seed`: Random seed for reproducibility. Default: value from agent config. + +- `--headless`: Run without rendering window for faster training. + +- `--device`: Computation device, such as `cpu` or `cuda`. + +
+ +::::tip +Training logs (tensorboard) are written to `logs/rsl_rl///`. Monitor progress with: + +```shell +tensorboard --logdir logs/rsl_rl +``` + +Key metrics to watch: `Train/mean_reward` (total episode reward) and individual reward terms such as `Episode/rew_cube_height`. +:::: + +## Evaluation & Recording + +Evaluate a checkpoint visually (no recording): + +```shell +python scripts/datagen/rl/play.py \ + --task LeIsaac-SO101-LiftCube-RL-v0 \ + --checkpoint logs/rsl_rl/lift_cube_rl//model_.pt \ + --num_envs 1 +``` + +Save all episodes to HDF5 (both success and failure) by adding `--record`: + +```shell +python scripts/datagen/rl/play.py \ + --task LeIsaac-SO101-LiftCube-RL-v0 \ + --checkpoint logs/rsl_rl/lift_cube_rl//model_.pt \ + --num_envs 1 \ + --num_episodes 100 \ + --record --dataset_file ./datasets/rl_eval.hdf5 +``` + +
+Parameter descriptions for play.py + +- `--task`: Gym task ID. Required. + +- `--checkpoint`: Path to a saved model checkpoint (`.pt`). Required. + +- `--num_envs`: Number of parallel environments. Default: value from task config. + +- `--num_episodes`: Total episodes to run across all envs. `0` = run indefinitely. Default: `0`. + +- `--seed`: Random seed. Default: value from agent config. + +- `--record`: Enable HDF5 recording. Both successful and failed episodes are saved. + +- `--resume_recording`: Append to an existing dataset file instead of creating a new one. + +- `--dataset_file`: Output HDF5 file path. Default: `./datasets/rl_eval.hdf5`. + +- `--real-time`: Slow down simulation to real-time speed. + +
+ +## Reward Design + +The LiftCube RL task uses three reward terms: + +| Term | Weight | Description | +|------|--------|-------------| +| `cube_success` | 200.0 | One-time bonus when cube height ≥ 20 cm above robot base. Episode ends immediately after (early termination). | +| `ee_to_cube` | 1.5 | `1 - tanh(5 × dist(TCP, cube))` — guides TCP to cube center. Range [0, 1]. | +| `cube_height` | 10.0 | `tanh(3 × max(h - 4.6 cm, 0))` — zero below 4.6 cm, monotonically increasing above. Range [0, 1]. | + +**TCP (Tool Center Point)** is computed as the midpoint between the two fingertip contact surfaces, derived from `body_pos_w` and `body_quat_w` with calibrated local offsets from the USD collision mesh: + +- Jaw tip offset (jaw body local frame): `(0.0, -0.05, 0.02)` +- Gripper tip offset (gripper body local frame): `(-0.012, 0.0, -0.08)` + +**Termination**: episode ends on timeout (15 s) or when cube height ≥ 20 cm (success). + +## Action Space + +RL training uses the `rl_so101leader` device mode — delta end-effector control with a binary gripper: + +| Component | Dims | Description | +|-----------|------|-------------| +| `arm_action` | 6 | Delta EE pose (dx, dy, dz, droll, dpitch, dyaw), scale=(0.02, 0.02, 0.02, 0.5, 0.5, 0.5) → ±2 cm / ±0.5 rad per step | +| `gripper_action` | 1 | Binary: action > 0 → open (1.0 rad), action < 0 → close (0.2 rad) | +| **Total** | **7** | | + +## Observation Space + +26D flat vector (concatenated): + +| Term | Dims | +|------|------| +| `joint_pos` | 6 | +| `joint_vel` | 6 | +| `ee_frame_state` (pos + quat, robot frame) | 7 | +| `cube_pos_relative_to_ee` | 3 | +| `cube_quat` (orientation in world frame) | 4 | +| **Total** | **26** | + +## Adding a New RL Task + +1. Create `/mdp/rewards.py` with reward functions. +2. Create `/_rl_env_cfg.py` with the RL env config class: + +```python +@configclass +class MyTaskRLEnvCfg(MyTaskEnvCfg): + observations: MyTaskRLObsCfg = MyTaskRLObsCfg() + rewards: MyTaskRLRewardsCfg = MyTaskRLRewardsCfg() + terminations: MyTaskRLTerminationsCfg = MyTaskRLTerminationsCfg() + + def __post_init__(self): + super().__post_init__() + self.use_teleop_device("rl_so101leader") # or "bi_rl_so101leader" for bi-arm + self.scene.front = None # disable camera for faster training + self.episode_length_s = 15.0 +``` + +3. Create `/rl_agents/rsl_rl_ppo_cfg.py` with the PPO runner config: + +```python +from isaaclab.utils import configclass +from isaaclab_rl.rsl_rl import RslRlOnPolicyRunnerCfg, RslRlPpoActorCriticCfg, RslRlPpoAlgorithmCfg + +@configclass +class MyTaskRLPPORunnerCfg(RslRlOnPolicyRunnerCfg): + num_steps_per_env = 100 + max_iterations = 1500 + save_interval = 50 + experiment_name = "my_task_rl" + obs_groups = {"actor": ["policy"], "critic": ["policy"]} + + policy = RslRlPpoActorCriticCfg( + init_noise_std=0.3, + actor_obs_normalization=True, + critic_obs_normalization=True, + actor_hidden_dims=[256, 128, 64], + critic_hidden_dims=[256, 128, 64], + activation="elu", + ) + + algorithm = RslRlPpoAlgorithmCfg( + value_loss_coef=1.0, + use_clipped_value_loss=True, + clip_param=0.2, + entropy_coef=0.005, + num_learning_epochs=5, + num_mini_batches=4, + learning_rate=1.0e-3, + schedule="adaptive", + gamma=0.99, + lam=0.95, + desired_kl=0.01, + max_grad_norm=1.0, + ) +``` + +4. Register the gym environment in `/__init__.py`: + +```python +from . import rl_agents + +gym.register( + id="LeIsaac-SO101-MyTask-RL-v0", + entry_point="isaaclab.envs:ManagerBasedRLEnv", + disable_env_checker=True, + kwargs={ + "env_cfg_entry_point": f"{__name__}._rl_env_cfg:MyTaskRLEnvCfg", + "rsl_rl_cfg_entry_point": f"{rl_agents.__name__}.rsl_rl_ppo_cfg:MyTaskRLPPORunnerCfg", + }, +) +``` + +5. Train: + +```bash +python scripts/datagen/rl/train.py \ + --task LeIsaac-SO101-MyTask-RL-v0 \ + --num_envs 512 \ + --headless +``` diff --git a/docs/docs/docs/features/state_machine.md b/docs/docs/docs/features/state_machine.md index 081b9def..994e9d14 100644 --- a/docs/docs/docs/features/state_machine.md +++ b/docs/docs/docs/features/state_machine.md @@ -60,7 +60,7 @@ After recording, you can replay the collected demonstrations in simulation: python scripts/datagen/state_machine/replay.py \ --task LeIsaac-SO101-PickOrange-v0 \ --dataset_file ./datasets/pick_orange.hdf5 \ - --task_type so101_state_machine \ + --task_type ik_so101leader \ --select_episodes 0 \ --device cuda \ --enable_cameras \ @@ -82,7 +82,7 @@ python scripts/datagen/state_machine/replay.py \ - `--replay_mode`: Replay mode — `action` replays IK pose targets, `state` replays joint positions. -- `--task_type`: State machine device type used during recording, e.g., `so101_state_machine` or `bi_so101_state_machine`. Inferred from task name if not set. +- `--task_type`: State machine device type used during recording, e.g., `ik_so101leader` or `bi_ik_so101leader`. Inferred from task name if not set. - `--select_episodes`: List of episode indices to replay. Leave empty to replay all episodes. @@ -97,7 +97,7 @@ python scripts/datagen/state_machine/replay.py \ ```python TASK_REGISTRY = { - "LeIsaac-SO101-PickOrange-v0": (PickOrangeStateMachine, "so101_state_machine"), - "LeIsaac-MY-NewTask-v0": (MyNewStateMachine, "so101_state_machine"), + "LeIsaac-SO101-PickOrange-v0": (PickOrangeStateMachine, "ik_so101leader"), + "LeIsaac-MY-NewTask-v0": (MyNewStateMachine, "ik_so101leader"), } ``` diff --git a/docs/sidebars.js b/docs/sidebars.js index 7f6402b3..2b037604 100644 --- a/docs/sidebars.js +++ b/docs/sidebars.js @@ -123,6 +123,12 @@ const sidebars = { link: { type: 'doc', id: 'docs/features/state_machine' }, items: [], }, + { + type: 'category', + label: 'RL Training', + link: { type: 'doc', id: 'docs/features/rl_training' }, + items: [], + }, ], }, 'docs/trouble_shooting', diff --git a/scripts/datagen/rl/play.py b/scripts/datagen/rl/play.py new file mode 100644 index 00000000..c17637a6 --- /dev/null +++ b/scripts/datagen/rl/play.py @@ -0,0 +1,179 @@ +"""Play or record a trained leisaac RL policy. + +Use --record to save episodes to HDF5. Without --record, runs in visualization mode only. +""" + +import os +import sys + +# Make cli_args importable (local module in IsaacLab's rsl_rl script directory) +_ISAACLAB_RSL_RL_DIR = os.path.join( + os.path.dirname(__file__), + "..", + "..", + "..", + "dependencies", + "IsaacLab", + "scripts", + "reinforcement_learning", + "rsl_rl", +) +sys.path.insert(0, os.path.abspath(_ISAACLAB_RSL_RL_DIR)) + +import argparse +import signal + +from isaaclab.app import AppLauncher + +import cli_args # isort: skip + +parser = argparse.ArgumentParser(description="Play or record a trained leisaac RL policy.") +parser.add_argument("--task", type=str, default=None, help="Name of the task.") +parser.add_argument( + "--agent", type=str, default="rsl_rl_cfg_entry_point", help="Name of the RL agent configuration entry point." +) +parser.add_argument("--num_envs", type=int, default=None, help="Number of environments to simulate.") +parser.add_argument("--seed", type=int, default=None, help="Seed used for the environment.") +parser.add_argument("--num_episodes", type=int, default=0, help="Total episodes to run (0 = infinite).") +parser.add_argument("--real-time", action="store_true", default=False, help="Run in real-time, if possible.") +# HDF5 recording +parser.add_argument("--record", action="store_true", help="Save episodes to HDF5 with success/failure tags.") +parser.add_argument( + "--resume_recording", action="store_true", help="Append to an existing dataset file instead of creating a new one." +) +parser.add_argument("--dataset_file", type=str, default="./datasets/rl_eval.hdf5", help="Output HDF5 file path.") +cli_args.add_rsl_rl_args(parser) +AppLauncher.add_app_launcher_args(parser) +args_cli, hydra_args = parser.parse_known_args() + +sys.argv = [sys.argv[0]] + hydra_args + +app_launcher = AppLauncher(args_cli) +simulation_app = app_launcher.app + +import time + +import gymnasium as gym +import isaaclab_tasks # noqa: F401 +import leisaac.tasks # noqa: F401 +import torch +from isaaclab.envs import ManagerBasedRLEnvCfg +from isaaclab.managers import DatasetExportMode +from isaaclab.utils.assets import retrieve_file_path +from isaaclab_rl.rsl_rl import RslRlBaseRunnerCfg, RslRlVecEnvWrapper +from isaaclab_tasks.utils import get_checkpoint_path +from isaaclab_tasks.utils.hydra import hydra_task_config +from leisaac.enhance.managers import EnhanceDatasetExportMode, StreamingRecorderManager +from rsl_rl.runners import OnPolicyRunner + + +@hydra_task_config(args_cli.task, args_cli.agent) +def main(env_cfg: ManagerBasedRLEnvCfg, agent_cfg: RslRlBaseRunnerCfg): + """Play or record with RSL-RL agent.""" + agent_cfg: RslRlBaseRunnerCfg = cli_args.update_rsl_rl_cfg(agent_cfg, args_cli) + env_cfg.scene.num_envs = args_cli.num_envs if args_cli.num_envs is not None else env_cfg.scene.num_envs + env_cfg.seed = agent_cfg.seed + env_cfg.sim.device = args_cli.device if args_cli.device is not None else env_cfg.sim.device + + # resolve checkpoint path + log_root_path = os.path.abspath(os.path.join("logs", "rsl_rl", agent_cfg.experiment_name)) + print(f"[INFO] Loading experiment from directory: {log_root_path}") + if args_cli.checkpoint: + resume_path = retrieve_file_path(args_cli.checkpoint) + else: + resume_path = get_checkpoint_path(log_root_path, agent_cfg.load_run, agent_cfg.load_checkpoint) + + log_dir = os.path.dirname(resume_path) + env_cfg.log_dir = log_dir + + # configure HDF5 recording + if args_cli.record: + output_dir = os.path.dirname(os.path.abspath(args_cli.dataset_file)) + output_filename = os.path.splitext(os.path.basename(args_cli.dataset_file))[0] + if args_cli.resume_recording: + env_cfg.recorders.dataset_export_mode = EnhanceDatasetExportMode.EXPORT_ALL_RESUME + assert os.path.exists( + args_cli.dataset_file + ), "Dataset file does not exist. Remove --resume_recording to create a new one." + else: + env_cfg.recorders.dataset_export_mode = DatasetExportMode.EXPORT_ALL + assert not os.path.exists( + args_cli.dataset_file + ), "Dataset file already exists. Use --resume_recording to append, or choose a different path." + env_cfg.recorders.dataset_export_dir_path = output_dir + env_cfg.recorders.dataset_filename = output_filename + else: + env_cfg.recorders = None + + env = gym.make(args_cli.task, cfg=env_cfg) + + # replace recorder manager with streaming version for HDF5 recording + if args_cli.record: + unwrapped = env.unwrapped + del unwrapped.recorder_manager + unwrapped.recorder_manager = StreamingRecorderManager(env_cfg.recorders, unwrapped) + unwrapped.recorder_manager.flush_steps = 100 + unwrapped.recorder_manager.compression = "lzf" + + env = RslRlVecEnvWrapper(env, clip_actions=agent_cfg.clip_actions) + + print(f"[INFO]: Loading model checkpoint from: {resume_path}") + runner = OnPolicyRunner(env, agent_cfg.to_dict(), log_dir=None, device=agent_cfg.device) + runner.load(resume_path) + + policy = runner.get_inference_policy(device=env.unwrapped.device) + + dt = env.unwrapped.step_dt + episode_count = 0 + success_count = 0 + interrupted = False + + def signal_handler(signum, frame): + nonlocal interrupted + interrupted = True + print("\n[INFO] Interrupted. Cleaning up...") + + original_sigint_handler = signal.signal(signal.SIGINT, signal_handler) + + try: + obs = env.get_observations() + while ( + simulation_app.is_running() + and not interrupted + and (args_cli.num_episodes <= 0 or episode_count < args_cli.num_episodes) + ): + start_time = time.time() + with torch.inference_mode(): + actions = policy(obs) + obs, _, dones, extras = env.step(actions) + + time_outs = extras.get("time_outs", torch.zeros_like(dones)) + finished = dones.bool() + if finished.any(): + successes = finished & ~time_outs.bool() + for i in range(finished.shape[0]): + if finished[i]: + print("Episode success!" if successes[i] else "Episode failed!") + episode_count += int(finished.sum().item()) + success_count += int(successes.sum().item()) + print(f"Success rate: {success_count / episode_count:.1%} ({success_count}/{episode_count})") + + sleep_time = dt - (time.time() - start_time) + if args_cli.real_time and sleep_time > 0: + time.sleep(sleep_time) + + except Exception as e: + import traceback + + print(f"\n[ERROR] {e}\n") + traceback.print_exc() + finally: + signal.signal(signal.SIGINT, original_sigint_handler) + if args_cli.record and hasattr(env.unwrapped.recorder_manager, "finalize"): + env.unwrapped.recorder_manager.finalize() + env.close() + + +if __name__ == "__main__": + main() + simulation_app.close() diff --git a/scripts/datagen/rl/replay.py b/scripts/datagen/rl/replay.py new file mode 100644 index 00000000..e5a5d115 --- /dev/null +++ b/scripts/datagen/rl/replay.py @@ -0,0 +1,143 @@ +"""Script to replay recorded RL episodes from HDF5 dataset.""" + +import multiprocessing + +if multiprocessing.get_start_method() != "spawn": + multiprocessing.set_start_method("spawn", force=True) + +import argparse + +from isaaclab.app import AppLauncher + +parser = argparse.ArgumentParser(description="Replay recorded RL episodes.") +parser.add_argument("--task", type=str, default=None, help="Name of the task.") +parser.add_argument("--num_envs", type=int, default=1, help="Number of environments to simulate.") +parser.add_argument( + "--dataset_file", type=str, default="./datasets/rl_eval.hdf5", help="File path to load recorded episodes." +) +parser.add_argument( + "--replay_mode", + type=str, + default="action", + choices=["action", "state"], + help="Replay mode: action replays actions, state replays joint states.", +) +parser.add_argument( + "--select_episodes", + type=int, + nargs="+", + default=[], + help="List of episode indices to replay. Empty = replay all.", +) +AppLauncher.add_app_launcher_args(parser) +args_cli = parser.parse_args() + +app_launcher = AppLauncher(vars(args_cli)) +simulation_app = app_launcher.app + +import contextlib +import os + +import gymnasium as gym +import leisaac.tasks # noqa: F401 +import torch +from isaaclab.envs import ManagerBasedRLEnv +from isaaclab.utils.datasets import EpisodeData, HDF5DatasetFileHandler +from isaaclab_tasks.utils import parse_env_cfg + + +def get_next_action(episode_data: EpisodeData, return_state: bool = False) -> torch.Tensor | None: + if return_state: + next_state = episode_data.get_next_state() + if next_state is None: + return None + return next_state["articulation"]["robot"]["joint_position"] + else: + return episode_data.get_next_action() + + +def main(): + if not os.path.exists(args_cli.dataset_file): + raise FileNotFoundError(f"Dataset file not found: {args_cli.dataset_file}") + + dataset_file_handler = HDF5DatasetFileHandler() + dataset_file_handler.open(args_cli.dataset_file) + episode_count = dataset_file_handler.get_num_episodes() + + if episode_count == 0: + print("No episodes found in the dataset.") + return + + episode_indices_to_replay = args_cli.select_episodes or list(range(episode_count)) + num_envs = args_cli.num_envs + + env_cfg = parse_env_cfg(args_cli.task, device=args_cli.device, num_envs=num_envs) + env_cfg.recorders = {} + env_cfg.terminations = {} + + env: ManagerBasedRLEnv = gym.make(args_cli.task, cfg=env_cfg).unwrapped + + idle_action = torch.zeros(env.action_space.shape) + + if hasattr(env, "initialize"): + env.initialize() + env.reset() + + episode_names = list(dataset_file_handler.get_episode_names()) + replayed_episode_count = 0 + + with contextlib.suppress(KeyboardInterrupt) and torch.inference_mode(): + while simulation_app.is_running() and not simulation_app.is_exiting(): + env_episode_data_map = {i: EpisodeData() for i in range(num_envs)} + has_next_action = True + while has_next_action: + actions = idle_action.clone() + has_next_action = False + for env_id in range(num_envs): + env_next_action = get_next_action( + env_episode_data_map[env_id], + return_state=args_cli.replay_mode == "state", + ) + if env_next_action is None: + next_episode_index = None + while episode_indices_to_replay: + next_episode_index = episode_indices_to_replay.pop(0) + if next_episode_index < episode_count: + break + next_episode_index = None + + if next_episode_index is not None: + replayed_episode_count += 1 + print(f"{replayed_episode_count:4}: Loading #{next_episode_index} episode to env_{env_id}") + episode_data = dataset_file_handler.load_episode( + episode_names[next_episode_index], env.device + ) + env_episode_data_map[env_id] = episode_data + initial_state = episode_data.get_initial_state() + env.reset_to( + initial_state, + torch.tensor([env_id], device=env.device), + seed=int(episode_data.seed) if episode_data.seed is not None else None, + is_relative=True, + ) + env_next_action = get_next_action( + env_episode_data_map[env_id], + return_state=args_cli.replay_mode == "state", + ) + has_next_action = True + else: + continue + else: + has_next_action = True + actions[env_id] = env_next_action + + env.step(actions) + break + + print(f"Finished replaying {replayed_episode_count} episode{'s' if replayed_episode_count != 1 else ''}.") + env.close() + + +if __name__ == "__main__": + main() + simulation_app.close() diff --git a/scripts/datagen/rl/train.py b/scripts/datagen/rl/train.py new file mode 100644 index 00000000..9db31b75 --- /dev/null +++ b/scripts/datagen/rl/train.py @@ -0,0 +1,120 @@ +"""Train a leisaac RL task with RSL-RL (PPO) via IsaacLab's training script.""" + +import os +import sys + +# Make cli_args importable (local module in IsaacLab's rsl_rl script directory) +_ISAACLAB_RSL_RL_DIR = os.path.join( + os.path.dirname(__file__), + "..", + "..", + "..", + "dependencies", + "IsaacLab", + "scripts", + "reinforcement_learning", + "rsl_rl", +) +sys.path.insert(0, os.path.abspath(_ISAACLAB_RSL_RL_DIR)) + +import argparse + +from isaaclab.app import AppLauncher + +import cli_args # isort: skip + +parser = argparse.ArgumentParser(description="Train a leisaac RL task with RSL-RL.") +parser.add_argument("--num_envs", type=int, default=None, help="Number of environments to simulate.") +parser.add_argument("--task", type=str, default=None, help="Name of the task.") +parser.add_argument( + "--agent", type=str, default="rsl_rl_cfg_entry_point", help="Name of the RL agent configuration entry point." +) +parser.add_argument("--seed", type=int, default=None, help="Seed used for the environment.") +parser.add_argument("--max_iterations", type=int, default=None, help="RL policy training iterations.") +parser.add_argument( + "--distributed", action="store_true", default=False, help="Run training with multiple GPUs or nodes." +) +cli_args.add_rsl_rl_args(parser) +AppLauncher.add_app_launcher_args(parser) +args_cli, hydra_args = parser.parse_known_args() + +sys.argv = [sys.argv[0]] + hydra_args + +app_launcher = AppLauncher(args_cli) +simulation_app = app_launcher.app + +from datetime import datetime + +import gymnasium as gym +import isaaclab_tasks # noqa: F401 +import leisaac.tasks # noqa: F401 +import torch +from isaaclab.envs import ManagerBasedRLEnvCfg +from isaaclab.utils.io import dump_yaml +from isaaclab_rl.rsl_rl import RslRlBaseRunnerCfg, RslRlVecEnvWrapper +from isaaclab_tasks.utils import get_checkpoint_path +from isaaclab_tasks.utils.hydra import hydra_task_config +from rsl_rl.runners import OnPolicyRunner + +torch.backends.cuda.matmul.allow_tf32 = True +torch.backends.cudnn.allow_tf32 = True +torch.backends.cudnn.deterministic = False +torch.backends.cudnn.benchmark = False + + +@hydra_task_config(args_cli.task, args_cli.agent) +def main(env_cfg: ManagerBasedRLEnvCfg, agent_cfg: RslRlBaseRunnerCfg): + """Train with RSL-RL agent.""" + agent_cfg = cli_args.update_rsl_rl_cfg(agent_cfg, args_cli) + env_cfg.scene.num_envs = args_cli.num_envs if args_cli.num_envs is not None else env_cfg.scene.num_envs + agent_cfg.max_iterations = ( + args_cli.max_iterations if args_cli.max_iterations is not None else agent_cfg.max_iterations + ) + + env_cfg.seed = agent_cfg.seed + env_cfg.sim.device = args_cli.device if args_cli.device is not None else env_cfg.sim.device + + if args_cli.distributed and args_cli.device is not None and "cpu" in args_cli.device: + raise ValueError("Distributed training is not supported when using CPU device.") + + if args_cli.distributed: + env_cfg.sim.device = f"cuda:{app_launcher.local_rank}" + agent_cfg.device = f"cuda:{app_launcher.local_rank}" + seed = agent_cfg.seed + app_launcher.local_rank + env_cfg.seed = seed + agent_cfg.seed = seed + + log_root_path = os.path.abspath(os.path.join("logs", "rsl_rl", agent_cfg.experiment_name)) + print(f"[INFO] Logging experiment in directory: {log_root_path}") + log_dir = datetime.now().strftime("%Y-%m-%d_%H-%M-%S") + if agent_cfg.run_name: + log_dir += f"_{agent_cfg.run_name}" + log_dir = os.path.join(log_root_path, log_dir) + + env_cfg.log_dir = log_dir + + env = gym.make(args_cli.task, cfg=env_cfg) + + if agent_cfg.resume: + resume_path = get_checkpoint_path(log_root_path, agent_cfg.load_run, agent_cfg.load_checkpoint) + + env = RslRlVecEnvWrapper(env, clip_actions=agent_cfg.clip_actions) + + runner = OnPolicyRunner(env, agent_cfg.to_dict(), log_dir=log_dir, device=agent_cfg.device) + runner.add_git_repo_to_log(__file__) + + if agent_cfg.resume: + print(f"[INFO]: Loading model checkpoint from: {resume_path}") + runner.load(resume_path) + + dump_yaml(os.path.join(log_dir, "params", "env.yaml"), env_cfg) + dump_yaml(os.path.join(log_dir, "params", "agent.yaml"), agent_cfg) + + runner.learn(num_learning_iterations=agent_cfg.max_iterations, init_at_random_ep_len=True) + + env.close() + + +if __name__ == "__main__": + main() + simulation_app.close() diff --git a/scripts/datagen/state_machine/generate.py b/scripts/datagen/state_machine/generate.py index 6d73407e..cb402451 100644 --- a/scripts/datagen/state_machine/generate.py +++ b/scripts/datagen/state_machine/generate.py @@ -58,7 +58,7 @@ # Maps gym task id → (StateMachineClass, device_type) TASK_REGISTRY = { - "LeIsaac-SO101-PickOrange-v0": (PickOrangeStateMachine, "so101_state_machine"), + "LeIsaac-SO101-PickOrange-v0": (PickOrangeStateMachine, "ik_so101leader"), } diff --git a/scripts/datagen/state_machine/replay.py b/scripts/datagen/state_machine/replay.py index e8744afc..7a2224aa 100644 --- a/scripts/datagen/state_machine/replay.py +++ b/scripts/datagen/state_machine/replay.py @@ -35,8 +35,8 @@ type=str, default=None, help=( - "State machine device type used during recording, e.g. 'so101_state_machine' or " - "'bi_so101_state_machine'. If not set, inferred from the task name." + "State machine device type used during recording, e.g. 'ik_so101leader' or " + "'bi_ik_so101leader'. If not set, inferred from the task name." ), ) @@ -83,7 +83,7 @@ def get_next_action(episode_data: EpisodeData, return_state: bool = False, task_ next_state = episode_data.get_next_state() if next_state is None: return None - if task_type == "bi_so101_state_machine": + if task_type == "bi_ik_so101leader": left_joint_pos = next_state["articulation"]["left_arm"]["joint_position"] right_joint_pos = next_state["articulation"]["right_arm"]["joint_position"] return torch.cat([left_joint_pos, right_joint_pos], dim=0) @@ -95,9 +95,9 @@ def get_next_action(episode_data: EpisodeData, return_state: bool = False, task_ def apply_damping(env, task_type: str): """Apply joint damping each step to match state-machine recording behavior.""" - if task_type == "so101_state_machine": + if task_type == "ik_so101leader": env.scene["robot"].write_joint_damping_to_sim(damping=10.0) - elif task_type == "bi_so101_state_machine": + elif task_type == "bi_ik_so101leader": env.scene["left_arm"].write_joint_damping_to_sim(damping=10.0) env.scene["right_arm"].write_joint_damping_to_sim(damping=10.0) diff --git a/source/leisaac/leisaac/devices/action_process.py b/source/leisaac/leisaac/devices/action_process.py index 6abcc5dd..665446c9 100644 --- a/source/leisaac/leisaac/devices/action_process.py +++ b/source/leisaac/leisaac/devices/action_process.py @@ -76,7 +76,7 @@ def init_action_cfg(action_cfg, device): joint_names=["gripper"], scale=1.0, ) - elif device in ["so101_state_machine"]: # IK-based: action = EE pose (7D) + binary gripper, not raw joint angles + elif device in ["ik_so101leader"]: # IK-based: action = EE pose (7D) + binary gripper, not raw joint angles action_cfg.arm_action = mdp.DifferentialInverseKinematicsActionCfg( asset_name="robot", joint_names=["shoulder_pan", "shoulder_lift", "elbow_flex", "wrist_flex", "wrist_roll"], @@ -91,7 +91,25 @@ def init_action_cfg(action_cfg, device): open_command_expr={"gripper": 1.0}, close_command_expr={"gripper": 0.4}, ) - elif device in ["bi_so101_state_machine"]: # IK-based: action = EE pose (7D) + binary gripper, not raw joint angles + elif device in ["rl_so101leader"]: # RL: delta EE pose (6D) + binary gripper (1D) = 7D total + # use_relative_mode=True: action = (dx, dy, dz, droll, dpitch, dyaw) + # translation scale=0.02 → ±2cm/step; rotation scale=0.5 → ±0.5rad/step (~28.6°) at max action + action_cfg.arm_action = mdp.DifferentialInverseKinematicsActionCfg( + asset_name="robot", + joint_names=["shoulder_pan", "shoulder_lift", "elbow_flex", "wrist_flex", "wrist_roll"], + body_name="gripper", + scale=(0.02, 0.02, 0.02, 0.5, 0.5, 0.5), + controller=mdp.DifferentialIKControllerCfg( + command_type="pose", ik_method="dls", use_relative_mode=True, ik_params={"lambda_val": 0.04} + ), + ) + action_cfg.gripper_action = mdp.BinaryJointPositionActionCfg( + asset_name="robot", + joint_names=["gripper"], + open_command_expr={"gripper": 1.0}, + close_command_expr={"gripper": 0.2}, + ) + elif device in ["bi_ik_so101leader"]: # IK-based: action = EE pose (7D) + binary gripper, not raw joint angles action_cfg.left_arm_action = mdp.DifferentialInverseKinematicsActionCfg( asset_name="left_arm", joint_names=["shoulder_pan", "shoulder_lift", "elbow_flex", "wrist_flex", "wrist_roll"], @@ -120,6 +138,38 @@ def init_action_cfg(action_cfg, device): open_command_expr={"gripper": 1.0}, close_command_expr={"gripper": 0.01}, ) + elif device in ["bi_rl_so101leader"]: # RL: delta EE control for both arms, 6D×2 + binary gripper×2 = 14D total + # use_relative_mode=True: action = (dx, dy, dz, droll, dpitch, dyaw) per arm, scale=0.05 → ±5cm/step + action_cfg.left_arm_action = mdp.DifferentialInverseKinematicsActionCfg( + asset_name="left_arm", + joint_names=["shoulder_pan", "shoulder_lift", "elbow_flex", "wrist_flex", "wrist_roll"], + body_name="gripper", + scale=0.05, + controller=mdp.DifferentialIKControllerCfg( + command_type="pose", ik_method="dls", use_relative_mode=True, ik_params={"lambda_val": 0.04} + ), + ) + action_cfg.left_gripper_action = mdp.BinaryJointPositionActionCfg( + asset_name="left_arm", + joint_names=["gripper"], + open_command_expr={"gripper": 1.0}, + close_command_expr={"gripper": 0.01}, + ) + action_cfg.right_arm_action = mdp.DifferentialInverseKinematicsActionCfg( + asset_name="right_arm", + joint_names=["shoulder_pan", "shoulder_lift", "elbow_flex", "wrist_flex", "wrist_roll"], + body_name="gripper", + scale=0.05, + controller=mdp.DifferentialIKControllerCfg( + command_type="pose", ik_method="dls", use_relative_mode=True, ik_params={"lambda_val": 0.04} + ), + ) + action_cfg.right_gripper_action = mdp.BinaryJointPositionActionCfg( + asset_name="right_arm", + joint_names=["gripper"], + open_command_expr={"gripper": 1.0}, + close_command_expr={"gripper": 0.01}, + ) """LeKiwi action configuration""" if device in ["lekiwi-leader", "lekiwi-keyboard", "lekiwi-gamepad"]: action_cfg.wheel_action = mdp.JointVelocityActionCfg( diff --git a/source/leisaac/leisaac/tasks/lift_cube/__init__.py b/source/leisaac/leisaac/tasks/lift_cube/__init__.py index 54e2d807..71f13b21 100644 --- a/source/leisaac/leisaac/tasks/lift_cube/__init__.py +++ b/source/leisaac/leisaac/tasks/lift_cube/__init__.py @@ -1,5 +1,7 @@ import gymnasium as gym +from . import rl_agents + gym.register( id="LeIsaac-SO101-LiftCube-v0", entry_point="isaaclab.envs:ManagerBasedRLEnv", @@ -9,6 +11,16 @@ }, ) +gym.register( + id="LeIsaac-SO101-LiftCube-RL-v0", + entry_point="isaaclab.envs:ManagerBasedRLEnv", + disable_env_checker=True, + kwargs={ + "env_cfg_entry_point": f"{__name__}.lift_cube_rl_env_cfg:LiftCubeRLEnvCfg", + "rsl_rl_cfg_entry_point": f"{rl_agents.__name__}.rsl_rl_ppo_cfg:LiftCubeRLPPORunnerCfg", + }, +) + gym.register( id="LeIsaac-SO101-LiftCube-DigitalTwin-v0", entry_point="leisaac.enhance.envs:ManagerBasedRLDigitalTwinEnv", @@ -20,7 +32,7 @@ gym.register( id="LeIsaac-SO101-LiftCube-Mimic-v0", - entry_point=f"leisaac.enhance.envs:ManagerBasedRLLeIsaacMimicEnv", + entry_point="leisaac.enhance.envs:ManagerBasedRLLeIsaacMimicEnv", disable_env_checker=True, kwargs={ "env_cfg_entry_point": f"{__name__}.lift_cube_mimic_env_cfg:LiftCubeMimicEnvCfg", diff --git a/source/leisaac/leisaac/tasks/lift_cube/lift_cube_rl_env_cfg.py b/source/leisaac/leisaac/tasks/lift_cube/lift_cube_rl_env_cfg.py new file mode 100644 index 00000000..cad82d03 --- /dev/null +++ b/source/leisaac/leisaac/tasks/lift_cube/lift_cube_rl_env_cfg.py @@ -0,0 +1,122 @@ +import math + +from isaaclab.managers import ObservationGroupCfg as ObsGroup +from isaaclab.managers import ObservationTermCfg as ObsTerm +from isaaclab.managers import RewardTermCfg as RewTerm +from isaaclab.managers import SceneEntityCfg +from isaaclab.managers import TerminationTermCfg as DoneTerm +from isaaclab.utils import configclass + +from . import mdp +from .lift_cube_env_cfg import LiftCubeEnvCfg, LiftCubeSceneCfg + +_CUBE_CFG = SceneEntityCfg("cube") +_ROBOT_CFG = SceneEntityCfg("robot") + + +@configclass +class LiftCubeRLObservationsCfg: + """Flat vector observations for RL (26D total). + + joint_pos(6) + joint_vel(6) + ee_frame_state(7) + cube_rel_ee(3) + cube_quat(4) = 26D + """ + + @configclass + class PolicyCfg(ObsGroup): + joint_pos = ObsTerm(func=mdp.joint_pos) + joint_vel = ObsTerm(func=mdp.joint_vel) + ee_frame_state = ObsTerm( + func=mdp.ee_frame_state, + params={"ee_frame_cfg": SceneEntityCfg("ee_frame"), "robot_cfg": _ROBOT_CFG}, + ) + cube_pos_relative_to_ee = ObsTerm( + func=mdp.cube_pos_relative_to_ee, + params={"cube_cfg": _CUBE_CFG, "ee_frame_cfg": SceneEntityCfg("ee_frame")}, + ) + cube_quat = ObsTerm( + func=mdp.cube_quat, + params={"cube_cfg": _CUBE_CFG}, + ) + + def __post_init__(self): + self.enable_corruption = False + self.concatenate_terms = True + + policy: PolicyCfg = PolicyCfg() + + +@configclass +class LiftCubeRLRewardsCfg: + """Reward terms for lift-cube RL training.""" + + cube_success = RewTerm( + func=mdp.cube_success_bonus, + weight=200.0, + params={"cube_cfg": _CUBE_CFG, "robot_cfg": _ROBOT_CFG, "height_threshold": 0.20}, + ) + ee_to_cube = RewTerm( + func=mdp.ee_to_cube_reward, + weight=1.5, + params={"cube_cfg": _CUBE_CFG, "robot_cfg": _ROBOT_CFG}, + ) + cube_height = RewTerm( + func=mdp.cube_height_reward, + weight=10.0, + params={"cube_cfg": _CUBE_CFG, "robot_cfg": _ROBOT_CFG, "min_height": 0.046, "k": 3.0}, + ) + + +@configclass +class LiftCubeRLTerminationsCfg: + """Terminations: timeout + early success.""" + + time_out = DoneTerm(func=mdp.time_out, time_out=True) + success = DoneTerm( + func=mdp.cube_height_above_base, + params={"cube_cfg": _CUBE_CFG, "robot_cfg": _ROBOT_CFG, "height_threshold": 0.20}, + ) + + +@configclass +class LiftCubeRLEnvCfg(LiftCubeEnvCfg): + """RL-specific configuration for the lift-cube environment. + + Observations: 26D flat vector. + Action space: 7D (rl_so101leader: 6D delta EE pose + 1D binary gripper). + Cameras disabled for faster training. + """ + + scene: LiftCubeSceneCfg = LiftCubeSceneCfg(env_spacing=8.0) + observations: LiftCubeRLObservationsCfg = LiftCubeRLObservationsCfg() + rewards: LiftCubeRLRewardsCfg = LiftCubeRLRewardsCfg() + terminations: LiftCubeRLTerminationsCfg = LiftCubeRLTerminationsCfg() + + def __post_init__(self) -> None: + super().__post_init__() + + self.use_teleop_device("rl_so101leader") + + self.scene.front = None + + for attr_name in list(vars(self.events).keys()): + term = getattr(self.events, attr_name, None) + if ( + hasattr(term, "params") + and isinstance(term.params.get("asset_cfg"), SceneEntityCfg) + and term.params["asset_cfg"].name == "front" + ): + delattr(self.events, attr_name) + + self.episode_length_s = 15.0 + + for actuator_cfg in self.scene.robot.actuators.values(): + actuator_cfg.damping = 8.0 + + self.scene.robot.init_state.joint_pos = { + "shoulder_pan": 0.0, + "shoulder_lift": math.radians(-100.0), + "elbow_flex": math.radians(90.0), + "wrist_flex": math.radians(50.0), + "wrist_roll": 0.0, + "gripper": 1.0, + } diff --git a/source/leisaac/leisaac/tasks/lift_cube/mdp/__init__.py b/source/leisaac/leisaac/tasks/lift_cube/mdp/__init__.py index 38ebdb6c..1d602cf2 100644 --- a/source/leisaac/leisaac/tasks/lift_cube/mdp/__init__.py +++ b/source/leisaac/leisaac/tasks/lift_cube/mdp/__init__.py @@ -2,4 +2,5 @@ from leisaac.enhance.envs.mdp import * from .observations import * +from .rewards import * from .terminations import * diff --git a/source/leisaac/leisaac/tasks/lift_cube/mdp/observations.py b/source/leisaac/leisaac/tasks/lift_cube/mdp/observations.py index eb808e5b..985a4af1 100644 --- a/source/leisaac/leisaac/tasks/lift_cube/mdp/observations.py +++ b/source/leisaac/leisaac/tasks/lift_cube/mdp/observations.py @@ -25,3 +25,24 @@ def object_grasped( grasped = torch.logical_and(pos_diff < diff_threshold, robot.data.joint_pos[:, -1] < grasp_threshold) return grasped + + +def cube_quat( + env: ManagerBasedRLEnv, + cube_cfg: SceneEntityCfg = SceneEntityCfg("cube"), +) -> torch.Tensor: + """Returns cube orientation as quaternion (w, x, y, z) in world frame. (num_envs, 4)""" + cube: RigidObject = env.scene[cube_cfg.name] + return cube.data.root_quat_w + + +def cube_pos_relative_to_ee( + env: ManagerBasedRLEnv, + cube_cfg: SceneEntityCfg = SceneEntityCfg("cube"), + ee_frame_cfg: SceneEntityCfg = SceneEntityCfg("ee_frame"), +) -> torch.Tensor: + """Returns cube position relative to the EE jaw. (num_envs, 3)""" + ee_frame: FrameTransformer = env.scene[ee_frame_cfg.name] + ee_pos = ee_frame.data.target_pos_w[:, 1, :] # jaw position + cube: RigidObject = env.scene[cube_cfg.name] + return cube.data.root_pos_w - ee_pos diff --git a/source/leisaac/leisaac/tasks/lift_cube/mdp/rewards.py b/source/leisaac/leisaac/tasks/lift_cube/mdp/rewards.py new file mode 100644 index 00000000..c164e275 --- /dev/null +++ b/source/leisaac/leisaac/tasks/lift_cube/mdp/rewards.py @@ -0,0 +1,113 @@ +"""Reward functions for the LiftCube RL task. + +Reward structure +---------------- +The task uses three complementary reward terms to guide the policy through a +natural lifting curriculum: + +1. ee_to_cube (weight 1.5, shaped) + Guides the end-effector toward the cube before and during grasping. + Formula: 1 - tanh(k * dist(TCP, cube)) + - Peaks at 1.0 when TCP is exactly at the cube center, decays smoothly with distance. + - k = 5.0 gives a reach radius of ~10 cm before the reward drops below 0.5. + +2. cube_height (weight 10.0, shaped) + Encourages lifting the cube above the table surface. + Formula: tanh(k * max(h - min_height, 0)) + - Zero below min_height (4.6 cm above robot base) — ignores ground-level noise. + - Monotonically increasing above min_height. Range [0, 1]. + +3. cube_success (weight 200.0, sparse) + One-time terminal bonus when h >= 20 cm. Also triggers early episode + termination so the policy does not receive further shaped reward after success. + +TCP computation +--------------- +The Tool Center Point (TCP) is the midpoint between the two fingertip contact +surfaces, computed from body_pos_w / body_quat_w plus calibrated local offsets +derived from the USD collision mesh geometry: + - jaw body local frame : (0.0, -0.05, 0.02) + - gripper body local frame : (-0.012, 0.0, -0.08) + +Height reference +---------------- +All heights are measured relative to the robot base link ("base") in the world +frame, making the reward invariant to the absolute table / mount height. +""" + +import torch +from isaaclab.assets import Articulation, RigidObject +from isaaclab.envs import ManagerBasedRLEnv +from isaaclab.managers import SceneEntityCfg +from isaaclab.utils.math import quat_apply + +# Fingertip offsets in each body's local frame, calibrated from USD collision mesh geometry +_JAW_TIP_OFFSET = torch.tensor([0.0, -0.05, 0.02]) +_GRIPPER_TIP_OFFSET = torch.tensor([-0.012, 0.0, -0.08]) + + +def _tcp_pos(env: ManagerBasedRLEnv, robot_cfg: SceneEntityCfg) -> torch.Tensor: + """TCP position: midpoint between jaw tip and gripper tip in world frame. (num_envs, 3)""" + robot: Articulation = env.scene[robot_cfg.name] + jaw_idx = robot.data.body_names.index("jaw") + gripper_idx = robot.data.body_names.index("gripper") + + jaw_pos = robot.data.body_pos_w[:, jaw_idx, :] + jaw_quat = robot.data.body_quat_w[:, jaw_idx, :] + gripper_pos = robot.data.body_pos_w[:, gripper_idx, :] + gripper_quat = robot.data.body_quat_w[:, gripper_idx, :] + + device = jaw_pos.device + N = jaw_pos.shape[0] + jaw_offset = _JAW_TIP_OFFSET.to(device).unsqueeze(0).expand(N, -1) + gripper_offset = _GRIPPER_TIP_OFFSET.to(device).unsqueeze(0).expand(N, -1) + + jaw_tip = jaw_pos + quat_apply(jaw_quat, jaw_offset) + gripper_tip = gripper_pos + quat_apply(gripper_quat, gripper_offset) + return (jaw_tip + gripper_tip) / 2.0 + + +def _cube_pos(env: ManagerBasedRLEnv, cube_cfg: SceneEntityCfg) -> torch.Tensor: + cube: RigidObject = env.scene[cube_cfg.name] + return cube.data.root_pos_w # (num_envs, 3) + + +def _robot_base_height(env: ManagerBasedRLEnv, robot_cfg: SceneEntityCfg, base_name: str = "base") -> torch.Tensor: + robot: Articulation = env.scene[robot_cfg.name] + base_index = robot.data.body_names.index(base_name) + return robot.data.body_pos_w[:, base_index, 2] # (num_envs,) + + +def ee_to_cube_reward( + env: ManagerBasedRLEnv, + cube_cfg: SceneEntityCfg = SceneEntityCfg("cube"), + robot_cfg: SceneEntityCfg = SceneEntityCfg("robot"), + k: float = 5.0, +) -> torch.Tensor: + """Reaching reward: 1 - tanh(k * dist) from TCP (grasp midpoint) to cube center. Range [0, 1].""" + dist = torch.linalg.vector_norm(_cube_pos(env, cube_cfg) - _tcp_pos(env, robot_cfg), dim=1) + return 1.0 - torch.tanh(k * dist) + + +def cube_height_reward( + env: ManagerBasedRLEnv, + cube_cfg: SceneEntityCfg = SceneEntityCfg("cube"), + robot_cfg: SceneEntityCfg = SceneEntityCfg("robot"), + min_height: float = 0.046, + k: float = 5.0, +) -> torch.Tensor: + """Height reward: tanh(k * max(h - min_height, 0)). Zero below min_height, monotonically increasing above. Range [0, 1].""" + height_above_base = _cube_pos(env, cube_cfg)[:, 2] - _robot_base_height(env, robot_cfg) + h = torch.clamp(height_above_base - min_height, min=0.0) + return torch.tanh(k * h) + + +def cube_success_bonus( + env: ManagerBasedRLEnv, + cube_cfg: SceneEntityCfg = SceneEntityCfg("cube"), + robot_cfg: SceneEntityCfg = SceneEntityCfg("robot"), + height_threshold: float = 0.20, +) -> torch.Tensor: + """One-time large bonus when cube reaches the success height threshold.""" + height_above_base = _cube_pos(env, cube_cfg)[:, 2] - _robot_base_height(env, robot_cfg) + return (height_above_base >= height_threshold).float() diff --git a/source/leisaac/leisaac/tasks/lift_cube/rl_agents/__init__.py b/source/leisaac/leisaac/tasks/lift_cube/rl_agents/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/source/leisaac/leisaac/tasks/lift_cube/rl_agents/rsl_rl_ppo_cfg.py b/source/leisaac/leisaac/tasks/lift_cube/rl_agents/rsl_rl_ppo_cfg.py new file mode 100644 index 00000000..c8d68ad3 --- /dev/null +++ b/source/leisaac/leisaac/tasks/lift_cube/rl_agents/rsl_rl_ppo_cfg.py @@ -0,0 +1,39 @@ +from isaaclab.utils import configclass +from isaaclab_rl.rsl_rl import ( + RslRlOnPolicyRunnerCfg, + RslRlPpoActorCriticCfg, + RslRlPpoAlgorithmCfg, +) + + +@configclass +class LiftCubeRLPPORunnerCfg(RslRlOnPolicyRunnerCfg): + num_steps_per_env = 100 + max_iterations = 1500 + save_interval = 50 + experiment_name = "lift_cube_rl" + obs_groups = {"actor": ["policy"], "critic": ["policy"]} + + policy = RslRlPpoActorCriticCfg( + init_noise_std=0.3, + actor_obs_normalization=True, + critic_obs_normalization=True, + actor_hidden_dims=[256, 128, 64], + critic_hidden_dims=[256, 128, 64], + activation="elu", + ) + + algorithm = RslRlPpoAlgorithmCfg( + value_loss_coef=1.0, + use_clipped_value_loss=True, + clip_param=0.1, + entropy_coef=0.005, + num_learning_epochs=5, + num_mini_batches=4, + learning_rate=1.0e-4, + schedule="adaptive", + gamma=0.95, + lam=0.95, + desired_kl=0.01, + max_grad_norm=0.5, + ) diff --git a/source/leisaac/leisaac/tasks/pick_orange/agents/__init__.py b/source/leisaac/leisaac/tasks/pick_orange/agents/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/source/leisaac/leisaac/tasks/template/bi_arm_env_cfg.py b/source/leisaac/leisaac/tasks/template/bi_arm_env_cfg.py index 6a07c6ff..ea2d2af9 100644 --- a/source/leisaac/leisaac/tasks/template/bi_arm_env_cfg.py +++ b/source/leisaac/leisaac/tasks/template/bi_arm_env_cfg.py @@ -211,7 +211,7 @@ def __post_init__(self) -> None: def use_teleop_device(self, teleop_device) -> None: self.task_type = teleop_device self.actions = init_action_cfg(self.actions, device=teleop_device) - if teleop_device in ["bi_so101_state_machine"]: + if teleop_device in ["bi_ik_so101leader"]: self.scene.left_arm.spawn.rigid_props.disable_gravity = True self.scene.right_arm.spawn.rigid_props.disable_gravity = True diff --git a/source/leisaac/leisaac/tasks/template/direct/bi_arm_env.py b/source/leisaac/leisaac/tasks/template/direct/bi_arm_env.py index 614cdd5f..f0a92960 100644 --- a/source/leisaac/leisaac/tasks/template/direct/bi_arm_env.py +++ b/source/leisaac/leisaac/tasks/template/direct/bi_arm_env.py @@ -92,7 +92,7 @@ def __post_init__(self) -> None: def use_teleop_device(self, teleop_device) -> None: self.task_type = teleop_device # self.actions = init_action_cfg(self.actions, device=teleop_device) - if teleop_device in ["bi_so101_state_machine"]: + if teleop_device in ["bi_ik_so101leader"]: self.scene.left_arm.spawn.rigid_props.disable_gravity = True self.scene.right_arm.spawn.rigid_props.disable_gravity = True diff --git a/source/leisaac/leisaac/tasks/template/direct/single_arm_env.py b/source/leisaac/leisaac/tasks/template/direct/single_arm_env.py index aa7ea7a6..147cd370 100644 --- a/source/leisaac/leisaac/tasks/template/direct/single_arm_env.py +++ b/source/leisaac/leisaac/tasks/template/direct/single_arm_env.py @@ -83,7 +83,7 @@ def __post_init__(self) -> None: def use_teleop_device(self, teleop_device) -> None: self.task_type = teleop_device - if teleop_device in ["keyboard", "gamepad", "so101_state_machine"]: + if teleop_device in ["keyboard", "gamepad", "ik_so101leader"]: self.scene.robot.spawn.rigid_props.disable_gravity = True def preprocess_device_action(self, action: dict[str, Any], teleop_device) -> torch.Tensor: diff --git a/source/leisaac/leisaac/tasks/template/single_arm_env_cfg.py b/source/leisaac/leisaac/tasks/template/single_arm_env_cfg.py index b11664a3..a2acdeb1 100644 --- a/source/leisaac/leisaac/tasks/template/single_arm_env_cfg.py +++ b/source/leisaac/leisaac/tasks/template/single_arm_env_cfg.py @@ -194,7 +194,7 @@ def __post_init__(self) -> None: def use_teleop_device(self, teleop_device) -> None: self.task_type = teleop_device self.actions = init_action_cfg(self.actions, device=teleop_device) - if teleop_device in ["keyboard", "gamepad", "so101_state_machine"]: + if teleop_device in ["keyboard", "gamepad", "ik_so101leader", "rl_so101leader"]: self.scene.robot.spawn.rigid_props.disable_gravity = True def preprocess_device_action(self, action: dict[str, Any], teleop_device) -> torch.Tensor: