iCVTEAM.github.io/MMNet.html at master · iCVTEAM/iCVTEAM.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <title>MMNet</title>
    <link rel="stylesheet" type="text/css" href="assets/scripts/bulma.min.css">
    <link rel="stylesheet" type="text/css" href="assets/scripts/theme.css">
    <link rel="stylesheet" type="text/css" href="https://cdn.bootcdn.net/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
  </head>
  <body>
    <section class="hero is-light" style="">
      <div class="hero-body" style="padding-top: 50px;">
        <div class="container" style="text-align: center;margin-bottom:5px;">
          <h1 class="title">
            Model-guided Multi-path Knowledge Aggregation
          </h1>
          <h1 class="title">
            for Aerial Saliency Prediction
          </h1>

          <div class="author">Kui Fu<sup>1</sup></div>
          <div class="author">Jia Li<sup>1</sup></div>
          <div class="author">Yu Zhang<sup>3</sup></div>
          <div class="author">Hongze Shen<sup>1</sup></div>
          <div class="author">Yonghong Tian<sup>2</sup></div>
          <div class="group">
            <a href="http://cvteam.net/">CVTEAM</a>
          </div>
          <div class="aff">
            <p><sup>1</sup>State Key Laboratory of Virtual Reality Technology and Systems, SCSE, Beihang University, Beijing, China</p>
            <p><sup>2</sup>Peng Cheng Laboratory, Shenzhen, China</p>
            <p><sup>3</sup>SenseTime Research, Beijing China</p>
          </div>
          <div class="con">
            <p  style="font-size: 24px; margin-top:5px; margin-bottom: 15px;">
            TIP 2020
            </p>
          </div>
          <div class="columns">
            <div class="column"></div>
            <div class="column"></div>
            <div class="column">
              <a href="http://cvteam.net/papers/2020-TIP-Fu-Model-guided%20Multi-path%20Knowledge%20Aggregation%20for%20Aerial%20Saliency%20Prediction.pdf" target="_blank">
                <p class="link">Paper</p>
              </a>
            </div>
            <div class="column">
              <a href="https://github.com/iCVTEAM/MMNet/" target="_blank">
                <p class="link">Code</p>
              </a>
            </div>
            <div class="column"></div>
            <div class="column"></div>
          </div>
        </div>
      </div>
    </section>
    <div style="text-align: center;">
      <div class="container" style="max-width:850px">
        <div style="text-align: center;">
          <img src="assets/MMNet/head.png" class="centerImage">
        </div>
      </div>
      <div class="head_cap">
        <p style="color:gray;">
          System framework of baseline model MM-Net.
        </p>
      </div>
    </div>
    <section class="hero">
      <div class="hero-body">
        <div class="container" style="max-width: 800px" >
          <h1 style="">Abstract</h1>
          <p  style="text-align: justify; font-size: 17px;">
            As an emerging vision platform, a drone can look
            from many abnormal viewpoints which brings many new
            challenges into the classic vision task of video
            saliency prediction.To investigate these challenges,
            this paper proposes a large-scale video dataset for
            aerial saliency prediction, which consists of ground-truth
            salient object regions of 1,000 aerial videos,annotated by
            24 subjects. To the best of our knowledge, it is
            the first large-scale video dataset that focuses on visual
            saliency prediction on drones. Based on this dataset,
            we propose a Model-guided Multi-path Network (MM-Net)
            that serves as a baseline model for aerial video saliency
            prediction. Inspired by the annotation process in
            eye-tracking experiments, MM-Net adopts multiple
            information paths, each of which is initialized under
            the guidance of a classic saliency model. After that, the visual
            saliency knowledge encoded in the most representative paths is
            selected and aggregated to improve the capability of MM-Net
            in predicting spatial saliency in aerial scenarios. Finally, these
            spatial predictions are adaptively combined with the temporal
            saliency predictions via a spatiotemporal optimization algorithm.
            Experimental results show that MM-Net outperforms ten state-of-the-art
            models in predicting aerial video saliency.
          </p>
        </div>
      </div>
    </section>
    <section class="hero is-light" style="background-color:#FFFFFF;">
      <div class="hero-body">
        <div class="container" style="max-width:800px;margin-bottom:20px;">
          <h1>
            Qualitative comparisons
          </h1>
        </div>
      <div style="text-align: center;">
        <div class="container" style="max-width:850px">
          <div style="text-align: center;">
            <img src="assets/MMNet/comp.png" class="centerImage">
          </div>
        </div>
        <div class="head_cap">
          <p style="color:gray;">
            Representative frames of state-of-the-art models on AVS1K. (a) Video frame, (b) Ground truth, (c) HFT, (d) SP,
            (e) PNSP, (f) SSD, (g) LDS, (h) eDN, (i) iSEEL, (j) SalNet, (k) DVA, (l) STS, (m) MM-Net, (n) MM-Net-, (o) MM-Net+.
          </p>
        </div>
      </div>
      </div>
    </section>
  <section class="hero" style="padding-top:0px;">
    <div class="hero-body">
      <div class="container" style="max-width:800px;">
  <div class="card">
  <header class="card-header">
    <p class="card-header-title">
      BibTex Citation
    </p>

    <a class="card-header-icon button-clipboard" style="border:0px; background: inherit;" data-clipboard-target="#bibtex-info" >
      <i class="fa fa-copy" height="20px"></i>
    </a>
  </header>
    <div class="card-content">
<pre style="background-color:inherit;padding: 0px;" id="bibtex-info">@article{fu2020model,
  title={Model-guided Multi-path Knowledge Aggregation for Aerial Saliency Prediction},
  author={Fu, Kui and Li, Jia and Zhang, Yu and Shen, Hongze and Tian, Yonghong},
  journal={IEEE Transactions on Image Processing},
  year={2020},
  publisher={IEEE}
}</pre>
    </div>
    </section>
    <script type="text/javascript" src="assets/scripts/clipboard.min.js"></script>
    <script>
      new ClipboardJS('.button-clipboard');
    </script>
  </body>
</html>