Skip to content

Actor and Action Video Segmentation from a Sentence [2018-08-02] #4

@jessiSYJ

Description

@jessiSYJ

Actor and Action Video Segmentation from a Sentence

https://arxiv.org/pdf/1803.07485.pdf

Textual Encoder

Word2Vec :

using the pretrained model on 'GoogleNews'

each words = 300 dimension vec

each sentence padding to have the same size (eg:15x300)

CNN:

  • details:

    temporal filter size = 2x2

    channel = 300(same as word2vec representation)

  • ablation study:

    51.8 for lstm

    52.1 for bi-lstm

    53.6 for cnn

Video Encoder

I3D Two-srteam

  • detials:

I3d last max-pooling layer --> average pooling over temporal dimention --> l2norm for each spatial position in feature map

  • ablation study:

    49.5 for flow_only

    53.6 for RGB_only

    55.1for two-stream

tanh() is better

Decoding with dynamic filters

r676tcmbbu 6kuqfd l j i

bottom up top down?

fgcts t47 sjj_6i aeoaay

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions