关于 engram_vocab_size 中词表系数的设置

`engram_vocab_size: List[int] = field(default_factory=lambda: [129280×5, 129280×5])`
这里 [129280×5, 129280×5] 我理解分别是 2-gram 和 3-gram 的词表大小，×5 应该是为了避免高频 n-gram 全头碰撞？这里我理解这个 factor 应该是某个下届，比如，取最常出现的 top-K n-gram，使得它们累计出现次数覆盖全部 n-gram 出现次数的95%，此时发生碰撞的概率就是
$\tbinom{K}{2}(\frac{1}{\text{vocab size}\times 5})^{\text{n head per ngram}}$
这里 (1/129280×5)^8 非常小，即使 ×1 都会非常小，所以不太明白这里设置 ×5 是一个经验设计，还是有依据呢？与 ×5 相关的设计，在论文中只看到有优化器的学习率那里：
For optimization, the embedding parameters are updated using Adam [kingma2014adam] with a learning rate scaled by **5×** and no weight decay, while the convolution parameters are initialized to zero to strictly preserve the identity mapping at the start of training.
但是也感觉不是很相关？
烦请指教下这里的设计！感谢！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于 engram_vocab_size 中词表系数的设置 #21

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

关于 engram_vocab_size 中词表系数的设置 #21

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions