Skip to content

关于零初始化和扩展层的位置 #28

@ouyanxi1125

Description

@ouyanxi1125
  1. 关于扩展层的位置,源码上是等间距均匀分布,请问有什么理论或者实验依据吗?
  2. 关于零初始化,选的是 'down_proj' in k or 'o_proj' in k 这两个,即attention和MLP的最后一层,请问有啥理论和实验依据吗?
    感谢解答【拱手】

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions