-
Notifications
You must be signed in to change notification settings - Fork 13
Open
Description
Hi, I was going through your work. After understanding the paper somewhat, I found that you guys mentioned using diffusion loss, embedded in the autoregressive optimization objective, during pre-training.
However, in your code, I only see MSELoss or CrossEntropyLoss being used:
def pretrain_one_epoch(self, train_loader, model_optim, model_scheduler):
train_loss = []
model_criterion = self._select_criterion()
self.model.train()
for i, (batch_x, batch_y, batch_x_mark, batch_y_mark) in enumerate(
train_loader
):
model_optim.zero_grad()
batch_x = batch_x.float().to(self.device)
batch_y = batch_y.float().to(self.device)
pred_x = self.model(batch_x)
diff_loss = model_criterion(pred_x, batch_x)
diff_loss.backward()
model_optim.step()
train_loss.append(diff_loss.item())
model_scheduler.step()
train_loss = np.mean(train_loss)
where, the _select_criterion() function, is:
def _select_criterion(self):
if self.args.task_name == "finetune" and self.args.downstream_task == "classification":
criterion = nn.CrossEntropyLoss()
print("Using CrossEntropyLoss")
else:
criterion = nn.MSELoss()
print("Using MSELoss")
return criterion
Can you please clarify what is "diffusion loss instead of MSE" actually being used for?
Thank You
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels