Skip to content

Question about parameter count calculationย #3

@buyi6666

Description

@buyi6666

Hi, thank you for sharing this great work! ๐ŸŒน๐ŸŒน๐ŸŒนใƒพ(โ€ขฯ‰โ€ข`)o

I have a question regarding the parameter count mentioned in the paper. According to the paper, the model has 400M parameters. However, when I traverse the model parameters, I only get around 30M parameters.
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

I'm wondering if the traversal method might not be capturing the parameters from embedded/nested models within the architecture. Could you clarify how the 400M parameter count was calculated? Are there additional model components or sub-modules that need to be counted separately?
Any guidance on the correct way to calculate the total parameters would be greatly appreciated.

Thank you for your time and for making this work available!โค๐ŸŽ‰๐ŸŽ‡

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions