Skip to content

Is the Fast version the final version for interactive use? #44

@Orion-Zheng

Description

@Orion-Zheng

Thanks for the release of the Lingbot-World-Fast! While evaluating this model, I noticed a strange phenomenon regarding the chunk inference latency:
The chunk latency increases proportionally as the sequence length grows.

Image Image

Then I reviewed the code to investigate (with claude code). It seems that the current fast version does not implement a sliding window or use sink tokens yet, so the KV cache gradually becomes a bottleneck when the sequence goes long.

Image

Again, really appreciate your contribution to the community!
May i ask is this model support local attention size? Or alternatively, are there plans to train a version with local attention/some kv cache compression mechanism in the future? I feel that without some constraint on KV cache growth, the model might face some challenges in the interactive use cases :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions