Skip to content

Question about this code line : new_static_reference = ref_image * (~seg_ref) + static_reference_dyn * seg_ref #13

@Asuna88

Description

@Asuna88

Dear Kaichen,

I tested your current code, specifically this line:
`
new_static_reference = ref_image * (~seg_ref) + static_reference_dyn * seg_ref

`

I computed the difference between the two images (ref_image - new_static_reference) and found that the result is almost zero.

Therefore, I have a similar question as Qianqian3764:

“I have a question about synthetic ref images: new_static_reference = ref_image*(~seg_ref) + static_reference_dyn*seg_ref. However, I think this is still the original reference image, unchanged, because the reconstruction of the dynamic object in the reference image is obtained using optical flow.”

I feel that your previous response did not fully clarify this point. Sorry for my directness, but I still don’t understand the explanation:

“The question posted by Qianqian3764 was primarily about how to compute static_reference_dyn, not about the equation itself. We have since updated the computation process, which is why the issue has been closed.”

Could you please clarify what you meant by “updated the computation process”? Specifically:

Which part of the code was modified?

Where can I find the updated version?

For reference, here is the output from my test:



(Pdb) p ref_image
tensor([...])
(Pdb) p new_static_reference
tensor([...])
(Pdb) p a1 = ref_image - new_static_reference
(Pdb) p a1.sum()
tensor(102.4379, device='cuda:0')
(Pdb) p a1.shape
torch.Size([1, 3, 48, 160])
(Pdb) 102 / (3*48*160)
0.0044


As you can see, the difference between ref_image and new_static_reference is very small.

I would greatly appreciate it if you could point me to the updated code or provide guidance on the modification. Thank you very much for your time and help.

Here is the output from my debugging prints.

(Pdb) p ref_image
tensor([[[[0.1020, 0.2275, 0.1892, ..., 0.9990, 0.9990, 0.9941],
[0.0549, 0.1275, 0.1265, ..., 0.9882, 0.8588, 0.6363],
[0.0667, 0.1667, 0.1275, ..., 0.5392, 0.5127, 0.5431],
...,
[0.1069, 0.1176, 0.1686, ..., 0.7000, 0.7598, 0.7196],
[0.4912, 0.5304, 0.5245, ..., 0.7039, 0.7529, 0.6147],
[0.5235, 0.5382, 0.5196, ..., 0.6657, 0.7196, 0.7147]],

     [[0.1294, 0.2539, 0.1775,  ..., 0.9990, 1.0000, 0.9980],
      [0.0765, 0.1392, 0.1480,  ..., 0.9931, 0.8235, 0.6422],
      [0.0833, 0.1696, 0.1775,  ..., 0.5333, 0.3853, 0.4412],
      ...,
      [0.1314, 0.1461, 0.1931,  ..., 0.6431, 0.7549, 0.7176],
      [0.4912, 0.5275, 0.5127,  ..., 0.6529, 0.7392, 0.5902],
      [0.5216, 0.5333, 0.5108,  ..., 0.6980, 0.7431, 0.6980]],

     [[0.0608, 0.1216, 0.1127,  ..., 0.9990, 0.9951, 0.9912],
      [0.0490, 0.0618, 0.0833,  ..., 0.9951, 0.8225, 0.6235],
      [0.0412, 0.0745, 0.0873,  ..., 0.3961, 0.2990, 0.3235],
      ...,
      [0.1755, 0.1833, 0.2206,  ..., 0.6039, 0.7186, 0.7363],
      [0.4784, 0.5206, 0.5098,  ..., 0.6431, 0.7245, 0.5833],
      [0.5225, 0.5157, 0.5147,  ..., 0.7039, 0.7088, 0.7157]]]],
   device='cuda:0')

(Pdb) p new_static_reference
tensor([[[[0.1020, 0.2275, 0.1892, ..., 0.9990, 0.9980, 0.9942],
[0.0549, 0.1275, 0.1265, ..., 0.9946, 0.9813, 0.8655],
[0.0667, 0.1667, 0.1275, ..., 0.7843, 0.5894, 0.5757],
...,
[0.1069, 0.1176, 0.1686, ..., 0.7000, 0.7598, 0.7196],
[0.4912, 0.5304, 0.5245, ..., 0.7039, 0.7529, 0.6147],
[0.5235, 0.5382, 0.5196, ..., 0.6657, 0.7196, 0.7147]],

     [[0.1294, 0.2539, 0.1775,  ..., 1.0000, 0.9998, 0.9954],
      [0.0765, 0.1392, 0.1480,  ..., 0.9949, 0.9795, 0.9250],
      [0.0833, 0.1696, 0.1775,  ..., 0.7963, 0.5203, 0.4811],
      ...,
      [0.1314, 0.1461, 0.1931,  ..., 0.6431, 0.7549, 0.7176],
      [0.4912, 0.5275, 0.5127,  ..., 0.6529, 0.7392, 0.5902],
      [0.5216, 0.5333, 0.5108,  ..., 0.6980, 0.7431, 0.6980]],

     [[0.0608, 0.1216, 0.1127,  ..., 0.9996, 0.9951, 0.9936],
      [0.0490, 0.0618, 0.0833,  ..., 0.9974, 0.9859, 0.9433],
      [0.0412, 0.0745, 0.0873,  ..., 0.8339, 0.4256, 0.4154],
      ...,
      [0.1755, 0.1833, 0.2206,  ..., 0.6039, 0.7186, 0.7363],
      [0.4784, 0.5206, 0.5098,  ..., 0.6431, 0.7245, 0.5833],
      [0.5225, 0.5157, 0.5147,  ..., 0.7039, 0.7088, 0.7157]]]],
   device='cuda:0')

(Pdb) a1= ref_image - new_static_reference
(Pdb) p a1
tensor([[[[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 5.3585e-05,
1.0438e-03, -9.8348e-05],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., -6.3898e-03,
-1.2247e-01, -2.2927e-01],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., -2.4505e-01,
-7.6695e-02, -3.2570e-02],
...,
[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]],

     [[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -9.6852e-04,
        2.3800e-04,  2.6339e-03],
      [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -1.7551e-03,
       -1.5596e-01, -2.8287e-01],
      [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -2.6293e-01,
       -1.3500e-01, -3.9906e-02],
      ...,
      [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  0.0000e+00,
        0.0000e+00,  0.0000e+00],
      [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  0.0000e+00,
        0.0000e+00,  0.0000e+00],
      [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  0.0000e+00,
        0.0000e+00,  0.0000e+00]],

     [[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -5.9104e-04,
       -4.5836e-05, -2.4466e-03],
      [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -2.3460e-03,
       -1.6335e-01, -3.1973e-01],
      [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -4.3777e-01,
       -1.2654e-01, -9.1863e-02],
      ...,
      [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  0.0000e+00,
        0.0000e+00,  0.0000e+00],
      [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  0.0000e+00,
        0.0000e+00,  0.0000e+00],
      [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  0.0000e+00,
        0.0000e+00,  0.0000e+00]]]], device='cuda:0')

(Pdb) p a1.sum()
tensor(102.4379, device='cuda:0')
(Pdb) p a1.shape
torch.Size([1, 3, 48, 160])
(Pdb) 102/(348160)
0.004427083333333333

def warping(self, lookup_images, current_image, flow_bwd, seg_ref):
    """
    基于当前帧 current_image、反向光流 flow_bwd 和动态掩码 seg_ref,构造融合图像 new_static_reference
    lookup_images[:, 0] → 上一帧参考图像 (ref_image)
    current_image → 当前帧图像 (cur_image)
    flow_bwd → 反向光流,表示从参考帧到当前帧的像素位移
    seg_ref → mask,True 表示动态区域,False 表示静态区域

    """
    # 先将图像下采样匹配 flow_bwd 的分辨率
    # ref_image 是上一帧图像, Previous image
    ref_image = F.interpolate(lookup_images[:, 0], scale_factor=1/4, mode='bilinear', align_corners=False)
    cur_image = F.interpolate(current_image, scale_factor=1/4, mode='bilinear', align_corners=False)
    # (Pdb) p ref_image.shape
    # torch.Size([12, 3, 48, 160])
    # (Pdb) p cur_image.shape
    # torch.Size([12, 3, 48, 160]) 

    # 构造一个 [B, 2, H, W] 的像素坐标网格。
    # 每个像素的坐标是 (x, y),用于加上光流。
    # 作用:给每个像素提供原始参考帧的坐标,用于加上光流进行 warp
    B, _, H, W = flow_bwd.shape
    grid_y, grid_x = torch.meshgrid(torch.arange(H), torch.arange(W))
    grid = torch.stack((grid_x, grid_y), dim=0).float().to(flow_bwd.device)
    grid = grid.unsqueeze(0).repeat(B, 1, 1, 1)

    # 添加反向光流,得到 warp 后的新坐标, 将反向光流添加到标准网格,得到每个像素在原图中对应的新位置。
    # 按照这个代码的写法,反向光流 flow_bwd 的方向是指从 “参考帧” 到 “当前帧”
    
    # new_coords 是表示从 参考帧 变换到 当前帧后的坐标,
    # flow_bwd 的方向是 参考帧 → 当前帧
    # 因此 grid + flow_bwd = 参考帧每个像素在当前帧中的位置
    # new_coords[b, :, y, x] = [x', y'],在当前帧中对应参考帧的像素

    # grid = 参考帧坐标
    # flow_bwd = 从参考帧到当前帧的光流
    # new_coords = warp 后的新位置 (参考帧 → 当前帧映射)
    new_coords = grid + flow_bwd

    # (Pdb) p grid.shape
    # torch.Size([12, 2, 48, 160])
    # (Pdb) p flow_bwd.shape
    # torch.Size([12, 2, 48, 160])

    # (Pdb) p grid
    # tensor([[[[  0.,   1.,   2.,  ..., 157., 158., 159.],
    #         [  0.,   1.,   2.,  ..., 157., 158., 159.],
    #         [  0.,   1.,   2.,  ..., 157., 158., 159.],
    #         ...,
    #         [  0.,   1.,   2.,  ..., 157., 158., 159.],
    #         [  0.,   1.,   2.,  ..., 157., 158., 159.],
    #         [  0.,   1.,   2.,  ..., 157., 158., 159.]],

    #         [[  0.,   0.,   0.,  ...,   0.,   0.,   0.],
    #         [  1.,   1.,   1.,  ...,   1.,   1.,   1.],
    #         [  2.,   2.,   2.,  ...,   2.,   2.,   2.],
    #         ...,
    #         [ 45.,  45.,  45.,  ...,  45.,  45.,  45.],
    #         [ 46.,  46.,  46.,  ...,  46.,  46.,  46.],
    #         [ 47.,  47.,  47.,  ...,  47.,  47.,  47.]]],


    # pdb.set_trace()
    # 将像素坐标从 [0, W-1] / [0, H-1] 区间映射到 [-1, 1] 区间,以便用于 F.grid_sample() 中的采样操作。
    new_coords[:, 0, :, :] = (new_coords[:, 0, :, :] / (W - 1)) * 2 - 1  # 坐标归一化操作,数值范围[0,1]*2-1 ,也就是 [0,2]-1 = [-1, 1]
    new_coords[:, 1, :, :] = (new_coords[:, 1, :, :] / (H - 1)) * 2 - 1  #   坐标归一化操作,最终数值范围是[-1, 1]
   
    new_coords = new_coords.permute(0, 2, 3, 1)  
    # (Pdb) p new_coords.shape
    # torch.Size([12, 48, 160, 2])
    # (Pdb) p new_coords.min()
    # tensor(-1., device='cuda:0')
    # (Pdb) p new_coords.max()
    # tensor(1.0000, device='cuda:0')

    # 对 cur_image 应用反向采样,得到 warp 到参考视角的图像, padding_mode='zeros' 表示如果映射坐标越界,就用 0 补。
    # 它使用给定的采样网格 new_coords,从当前帧 cur_image 中采样像素值,生成一个从“参考帧视角看过去”的当前帧图像 —— 我们称之为 动态静态图像 static_reference_dyn
    # 说明 warp 当前帧,目标是参考帧视角,所以 flow_bwd 是 从当前帧 t → 参考帧 r(如 t-1) 的 flow。

    # new_coords是从 相邻帧上的点经过变换到 当前帧后的点。
    # 重建参考帧
    static_reference_dyn = F.grid_sample(cur_image, new_coords, mode='bilinear', padding_mode='zeros', align_corners=True)

    # 最终 new_static_reference 就是: “把动态区域用 warp 当前帧修复,静态区域保持参考帧原样”的图像

    # 重点理解:
    # flow_bwd 的方向是 参考帧 → 当前帧
    # grid 的位置是 参考帧坐标
    # grid + flow_bwd = 参考帧每个像素在当前帧的位置 → 用这个坐标从当前帧采样 → warp 图
    # 所以 warp 后得到的图像在参考帧视角。

    # cur_image = 当前帧
    # new_coords = 当前帧中每个参考帧像素的位置
    # grid_sample 会 根据 new_coords 从 cur_image 中采样
    # 输出是 static_reference_dyn,即 把当前帧 warp 到参考帧视角的图像
    # 解释:
    #     对每个参考帧Ir像素 (x, y):
    #     在当前帧中It找到 flow_bwd 指向的像素位置 (x', y')
    #     取该位置的颜色值 → 构建 warp 图像

    # (Pdb) cur_image.shape
    # torch.Size([12, 3, 48, 160])
    # (Pdb) p new_coords.shape
    # torch.Size([12, 48, 160, 2])


    # seg_ref: 为 True 表示当前像素为动态区域。
    # ~seg_ref: 为 True 表示当前像素为静态区域
    # seg_ref → 选取动态区域,使用 warp 得到的图像。  ~seg_ref → 选取静态区域,使用原始参考图

    # ~seg_ref = 静态区域 → 使用原始参考图像
    # seg_ref = 动态区域 → 使用 warp 后的当前帧图像
    # 目的:构造一个 伪静态参考图:
    # 静态区域保持原始参考帧不动
    # 动态区域用 warp 当前帧填充,避免动态物体污染

    #  静态区域使用相邻帧 ref_image, 动态区域使用  static_reference_dyn, static_reference_dyn是“当前帧变换到相邻帧视角后的图像”
    new_static_reference = ref_image*(~seg_ref) + static_reference_dyn*seg_ref
    # 所以:
    # 区域类型	            用哪张图像?
    # 静态区域 (~seg_ref)	用原始参考图像 ref_image(因为静态区域可信)
    # 动态区域 (seg_ref)	用重建图像 static_reference_dyn(来自当前图 warping 后结果)
    # 这样就构造出一个:
    #     “保持静态区域不动,替换掉动态区域” 的伪静态参考图像。
    #这句代码的目的是:利用当前帧和反向光流,把动态区域 warp 到参考帧位置,替代原始参考图中那些被动态物体污染的区域,最终输出更“静态”的参考图像。


    ############  可视化 #######################
    # 这部分是我自己添加的,可注释
    mask = seg_ref[0, 0].cpu().numpy().astype(np.uint8) * 255  # True → 255,False → 0
    # 保存为图像文件
    cv2.imwrite("seg_ref_mask.png", mask)

    self.save_tensor_image_with_opencv(ref_image[0], 'ref_image.png')
    self.save_tensor_image_with_opencv(static_reference_dyn[0], 'static_reference_dyn.png')
    self.save_tensor_image_with_opencv(new_static_reference[0], 'new_static_reference.png')
    pdb.set_trace()

    ############   可视化 #######################
        
    return new_static_reference

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions