feat(defog): Add GammaGL reproduction of DeFoG （Discrete Flow matching for Graph Generation） by ZXJC-niusile · Pull Request #257 · BUPT-GAMMA/GammaGL

ZXJC-niusile · 2026-06-01T18:34:37Z

Paper / 论文

Title: DeFoG: Discrete Flow Matching for Graph Generation (ICML 2025)
Paper: https://arxiv.org/abs/2410.04263
Original Code: https://github.com/manuelmlmadeira/DeFoG

Description / 描述

(English)
This PR introduces the official GammaGL reproduction of DeFoG. DeFoG adapts continuous flow matching to discrete structured spaces via CTMC rate matrices, enabling scalable graph generation. This PR ports the model to GammaGL/TensorLayerX with full backend compatibility.

(中文)
本 PR 提供了 DeFoG（ICML 2025）在 GammaGL 框架下的官方复现。DeFoG 通过 CTMC 速率矩阵将流匹配扩展到离散结构空间，实现高效的图生成。本 PR 将原版 PyTorch 实现完整迁移至 GammaGL，并确保了多后端兼容。

Key Features / 核心内容

(English)

Models: DeFoGModel (Graph Transformer denoiser) and CTMC flow matching pipelines.
Datasets: Integrated 9 datasets (planar, tree, SBM, QM9, GuacaMol, MOSES, ZINC250k, TLS, Comm20) with standard splits.
Training: Added DeFoGWithLoss wrapper, EMA support, and Classifier-Free Guidance (CFG).

(中文)

核心模型：DeFoGModel 去噪器与完整的 CTMC 离散流匹配采样管道。
数据集：集成了 9 个图生成数据集及标准切分逻辑。
训练流程：新增 DeFoGWithLoss 封装，支持 EMA 和 CFG 条件生成。

Testing & Validation / 测试与验证

(English)

Backend: Successfully tested with TL_BACKEND="torch".
Clean Code: All debug prints, temporary probes, and local launch scripts (.sh) have been strictly removed to keep the repository clean.
Status: Preliminary results on Planar , Tree and QM9 align with the original paper. Rigorous evaluation with 10k samples is currently in progress.

(中文)

运行环境：基于 TL_BACKEND="torch" 测试通过。
代码规范：已彻底清理所有调试打印（debug prints），并主动排除了本地启动脚本等无关文件。
当前状态：在 Planar @、Tree和 QM9 上的初步评估与论文对齐，10k 大样本的严格验证正在进行中。

…tion)

…d boost::python conflicts with PyTorch

…s, metrics)

主要修改内容此 PR 主要解决了之前审查中关于 DeFoG 代码实现的反馈问题，重点是将可能引起报错的重型可选依赖与 GammaGL 核心库进行隔离，并补充了轻量化的测试覆盖。具体改动收缩依赖边界：将带有特定领域依赖的数据集（QM9, ZINC250k, MOSES, Guacamol, SPECTRE, TLS）从核心的 gammagl/datasets 移回至 examples/defog/datasets 目录下。这有效防止了 RDKit、Graph-Tool 和 ORCA 等重型依赖污染 gammagl 核心命名空间。重构导入路径：移除了 gammagl/datasets/__init__.py 中上述数据集的导出，并更新了 examples/defog/dataset_utils.py 中的导入路径，使其直接从本地 datasets 模块加载。新增 CPU Smoke Test：在 tests/models/test_defog_smoke.py 中引入了基于极小合成图 (synthetic) 的单轮 (1-epoch) 测试。该测试无需任何领域特定的第三方库，纯 CPU 环境下即可验证 GammaGL 底层训练器 (TrainOneStep) 和流匹配 (Flow Matching) 引擎的可用性。完善文档说明：更新了 examples/defog/readme.md，明确指出了模型要求 TL_BACKEND=torch，并清晰划分了哪些依赖（如 rdkit, graph-tool 等）是仅在高级评估阶段才需要的。验证与测试验证 gammagl.datasets 可以被安全导入，不再触发 rdkit 或 graph-tool 的缺失报错。验证 test_defog_smoke.py 能够在纯 CPU 环境下基于 torch 后端顺利完成训练循环。验证 defog_trainer.py 的结构未被破坏，且保持对 GammaGL 原生 DataLoader 和 TrainOneStep 的规范复用。

主要修改内容此 PR 旨在将 DeFoG（基于离散流匹配的图生成模型）对齐到 GammaGL 的标准复现规范中。根据审查反馈，本 PR 重点解决了依赖边界污染问题，在隔离领域特有重型依赖（如 RDKit、Graph-Tool）的同时，确保了核心代码对多后端的兼容性。具体改动彻底隔离重依赖：将所有具备领域专属依赖的数据集（QM9, ZINC250k, MOSES, Guacamol, SPECTRE, TLS）完全从 gammagl/datasets 核心库中移出，统一放置于 examples/defog/datasets 目录下。这确保了引入 gammagl 核心包时，不会触发任何与 RDKit、Graph-Tool 相关的意外 Import 错误。清理 CI 测试环境：删除了之前针对核心模块数据遗留的旧测试脚本（tests/datasets/test_defog_datasets.py），解决了其在 CI 流水线中引发的 Import 报错，彻底切断核心库对 example-local 依赖的调用。多后端兼容性验证 (Backend-neutral)：新增了跨后端导入测试脚本 tests/models/test_defog_backend.py。该测试证明了在类似 TL_BACKEND=tensorflow 的非 Torch 后端环境下，共享模块（如 gammagl/models/defog.py、defog_layer.py）可以被安全解析和导入，且不包含任何硬编码的 PyTorch 绑定（如 .cpu(), .cuda(), .numpy() 等）。极简 CPU Smoke Test：引入了自动测试脚本 tests/models/test_defog_smoke.py，并在 README.md 中补充了使用 synthetic 数据集进行纯 CPU 环境单轮 (1-epoch) 训练的极简执行命令和预期输出。这能够最快地帮助用户验证核心训练流水线（包含 TrainOneStep 和流匹配引擎）的正确性。验证与测试验证 gammagl.datasets 及核心层/模型在 TL_BACKEND=tensorflow 环境下的可导入性。删除无效数据测试，解决 CI 环境的潜在 Import 崩溃。确认 defog_trainer.py 内部规范复用了 GammaGL 的 DataLoader 和 TrainOneStep 机制。

主要修改内容此 PR 旨在将 DeFoG（基于离散流匹配的图生成模型）对齐到 GammaGL 的标准规范中。为了避免污染 GammaGL 的核心命名空间，本 PR 执行了极其严格的解耦，彻底隔离了涉及分子计算、图匹配等具有较重外部依赖的代码逻辑。具体改动彻底隔离数据集与特征 Transform：将所有具备领域专属依赖的数据集（QM9, ZINC250k, MOSES, Guacamol, SPECTRE, TLS）从 gammagl/datasets 中移除，统一下沉至 examples/defog/datasets。 [最新修复] 将与分子电荷/价态、流匹配特征高度耦合的特征提取层 gammagl/transforms/dense_features.py 移除，并转移至 examples/defog/extra_features.py，彻底净化了 gammagl/transforms 核心模块的对外暴露面。清理 CI 测试环境：删除了之前针对这些被隔离数据集和特性的旧核心测试用例（如 test_defog_datasets.py），切断了核心库层面对 Example-local 代码的误调用，解决了 CI 导入崩溃。严格的多后端兼容验证 (Backend-neutral)：保证 gammagl/models/defog.py 等核心共享入口绝不包含任何 torch 特定的硬编码（如 .cpu(), .cuda(), .numpy()）。新增单测 tests/models/test_defog_backend.py 验证了代码在 TL_BACKEND=tensorflow 模式下可以被安全解析。极简 CPU Smoke Test：在 tests/models/test_defog_smoke.py 中增加了基于 synthetic 小图数据的极轻量级 (1-epoch) 自动测试。用户无需依赖 RDKit 等工具，即可在纯 CPU 环境下跑通底层的 TrainOneStep 和流匹配主干循环。详细运行指令和预期输出已同步至 examples/defog/readme.md。

主要修改内容此 PR 旨在将 DeFoG（基于离散流匹配的图生成模型）对齐到 GammaGL 的标准复现规范中。针对代码架构进行了极其严格的解耦，彻底清除了 GammaGL 核心层对于特定领域（如分子、数字病理）复杂依赖的隐式绑定，并提供了适用于严格 CI 沙盒环境的极简测试链路。

修复 Bug: 在 defog_trainer.py 中，修复了当开启条件生成功能（例如使用 --conditional 训练 qm9 等数据集）且 sample_bs 大于等于目标生成数时，由于代码错误引用未定义变量 batch_cond 而引发崩溃的问题。已统一修正为外层正确的上下文变量 cond_labels，确保基于分类器无指引（CFG）的采样流程顺畅运行。

ZXJC-niusile force-pushed the feat/defog branch 8 times, most recently from efb5ed2 to 823dcc9 Compare June 3, 2026 06:17

feat(defog): implement DeFoG (Discrete Flow Matching for Graph Genera…

a835f8c

…tion)

ZXJC-niusile force-pushed the feat/defog branch from 8a70e16 to a835f8c Compare June 5, 2026 08:07

ZXJC-niusile added 12 commits June 5, 2026 16:11

fix(defog): fix shape mismatch by removing redundant mask_diag logic

58616e9

fix(defog): isolate SBM graph-tool evaluation into subprocess to avoi…

20cbf8f

…d boost::python conflicts with PyTorch

refactor: integrate DeFoG into GammaGL ecosystem (datasets, transform…

0c1d28e

…s, metrics)

feat(defog): add specialized conditional TLS evaluation metrics

b9f64bb

参数对齐

754ab26

Update defog_trainer.py

0dd865b

Update readme.md

6dae11d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(defog): Add GammaGL reproduction of DeFoG （Discrete Flow matching for Graph Generation）#257

feat(defog): Add GammaGL reproduction of DeFoG （Discrete Flow matching for Graph Generation）#257
ZXJC-niusile wants to merge 13 commits into
BUPT-GAMMA:mainfrom
ZXJC-niusile:feat/defog

ZXJC-niusile commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ZXJC-niusile commented Jun 1, 2026

Paper / 论文

Description / 描述

Key Features / 核心内容

Testing & Validation / 测试与验证

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant