I see that the available model checkpoint supports 4 frames. Is there a way to increase this number? Will checkpoints with more extensive temporal information be available in the future?
If the 4-frame model is the one used in the paper, what was the frame selection strategy? For instance, if I have a video clip of 30 frames, what is the best strategy (and the one used in the paper) for selecting the 4 frames to be used?