Hi, thank you for sharing this great work! ๐น๐น๐นใพ(โขฯโข`)o
I have a question regarding the parameter count mentioned in the paper. According to the paper, the model has 400M parameters. However, when I traverse the model parameters, I only get around 30M parameters.
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
I'm wondering if the traversal method might not be capturing the parameters from embedded/nested models within the architecture. Could you clarify how the 400M parameter count was calculated? Are there additional model components or sub-modules that need to be counted separately?
Any guidance on the correct way to calculate the total parameters would be greatly appreciated.
Thank you for your time and for making this work available!โค๐๐
Hi, thank you for sharing this great work! ๐น๐น๐นใพ(โขฯโข`)o
I have a question regarding the parameter count mentioned in the paper. According to the paper, the model has 400M parameters. However, when I traverse the model parameters, I only get around 30M parameters.
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)I'm wondering if the traversal method might not be capturing the parameters from embedded/nested models within the architecture. Could you clarify how the 400M parameter count was calculated? Are there additional model components or sub-modules that need to be counted separately?
Any guidance on the correct way to calculate the total parameters would be greatly appreciated.
Thank you for your time and for making this work available!โค๐๐