You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi RVT team,
First, thank you for open-sourcing this impressive work! I'm attempting to reproduce RVT-2's RLBench results and encountered an inconsistency in the "slide block to color target" task that I'd like to discuss. Issue Description
Reported vs Reproduced Results:
While the paper reports 92% success rate, my implementation achieves only 48%.
Failure Analysis:
Through video analysis of failed cases, I observed that the model frequently mispredicts the ignore_collision flag. When this flag is manually forced to True, success rate aligns with the paper's results (92-100%).
Key Insight:
Since collision prediction should be a simpler subtask, I suspect a potential discrepancy in the data generation logic.
In dataset.py (specifically the _add_keypoints_to_replay function), I noticed the following implementation:
(
trans_indicies,
rot_grip_indicies,
ignore_collisions, # this ignore collisions flage corresponds to keypoint but not usedaction,
attention_coordinates,
) =_get_action(
obs_tp1,
obs_tm1,
rlbench_scene_bounds,
voxel_sizes,
rotation_resolution,
crop_augmentation,
)
terminal=k==len(episode_keypoints) -1reward=float(terminal) *1.0ifterminalelse0obs_dict=extract_obs(
obs,
CAMERAS,
t=k-next_keypoint_idx,
prev_action=prev_action,
episode_length=25,
) # the ignore_collisions in the dict corresponds to current observation
Request
Could you please clarify:
Whether this is a known implementation-paper discrepancy?
If modifying the ignore_collision source to keypoint observations would align with your original design?
Hi RVT team,
First, thank you for open-sourcing this impressive work! I'm attempting to reproduce RVT-2's RLBench results and encountered an inconsistency in the "slide block to color target" task that I'd like to discuss.
Issue Description
Reported vs Reproduced Results:
While the paper reports 92% success rate, my implementation achieves only 48%.
Failure Analysis:
Through video analysis of failed cases, I observed that the model frequently mispredicts the ignore_collision flag. When this flag is manually forced to True, success rate aligns with the paper's results (92-100%).
Key Insight:
Since collision prediction should be a simpler subtask, I suspect a potential discrepancy in the data generation logic.
In
dataset.py(specifically the_add_keypoints_to_replayfunction), I noticed the following implementation:( trans_indicies, rot_grip_indicies, ignore_collisions, # this ignore collisions flage corresponds to keypoint but not used action, attention_coordinates, ) = _get_action( obs_tp1, obs_tm1, rlbench_scene_bounds, voxel_sizes, rotation_resolution, crop_augmentation, ) terminal = k == len(episode_keypoints) - 1 reward = float(terminal) * 1.0 if terminal else 0 obs_dict = extract_obs( obs, CAMERAS, t=k - next_keypoint_idx, prev_action=prev_action, episode_length=25, ) # the ignore_collisions in the dict corresponds to current observationRequest
Could you please clarify:
Whether this is a known implementation-paper discrepancy?
If modifying the ignore_collision source to keypoint observations would align with your original design?
Thank you for your time and insights!
slide_block_to_color_target_fail_3.mp4