A question about zero-grad settings in VL-adapter's multitask.py file.

Thanks for your brilliant work.

```
                batch['log_train_accuracy'] = self.args.log_train_accuracy

                # self.optim.zero_grad()
                if self.args.fp16 and _use_native_amp:
                    with autocast():
                        if self.args.distributed:
                            results = self.model.module.train_step(batch)
                        else:
                            results = self.model.train_step(batch)
                else:
                    if self.args.distributed:
                        results = self.model.module.train_step(batch)
                    else:
                        results = self.model.train_step(batch)

                loss = results['loss']
```

Looking at the code, it appears that you are training without initializing the gradients before performing backpropagation.

Is there a reason why this works?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question about zero-grad settings in VL-adapter's multitask.py file. #19

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

A question about zero-grad settings in VL-adapter's multitask.py file. #19

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions