Hello,
The report mentions that the NLQ is based on GroundNLQ, but could you explain specifically how it is implemented?
Are you replacing the InternVideo features with EgoVideo features?
If so, I would like to conduct a replication experiment using the GroundNLQ code. Could you explain the steps for replacing InternVideo features with EgoVideo features?