Skip to content

about Multimodal Feature Injection in 3D Attention #4

@cdfan0627

Description

@cdfan0627

Thank you for your great work. I'm curious if you've explored an alternative method for injecting multimodal features into 3D attention, besides the direct injection into 3D Attention followed by an FFN, as shown in the provided image.

Image

Specifically, have you attempted a method similar to "Diffusion as a Shader" where multimodal features are added to a DIT block via a zero linear layer? I'm interested to know which of these two approaches yields better results.

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions