Question about parameter count calculation

Hi, thank you for sharing this great work!  🌹🌹🌹ヾ(•ω•`)o

I have a question regarding the parameter count mentioned in the paper. According to the paper, the model has 400M parameters. However, when I traverse the model parameters, I only get around 30M parameters.
`trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)`

I'm wondering if the traversal method might not be capturing the parameters from embedded/nested models within the architecture. Could you clarify how the 400M parameter count was calculated? Are there additional model components or sub-modules that need to be counted separately?
Any guidance on the correct way to calculate the total parameters would be greatly appreciated.

Thank you for your time and for making this work available!❤🎉🎇

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about parameter count calculation #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question about parameter count calculation #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions