Wasserstein GAN for Knowledge Graph Completion

semantic-gan is a Python implementation of a Wasserstein GAN architecture for knowledge graph completion.

Installation

The package can be installed from PyPI:

pip install semantic-gan

Or install from source:

git clone https://github.com/erdemonal/SemanticGAN.git
cd SemanticGAN
pip install -e .

Usage

The following example demonstrates usage with a generic knowledge graph dataset:

from semanticgan import KnowledgeGraphDataset, Generator, Discriminator
import torch
from torch.utils.data import DataLoader

# 1. Load a generic knowledge graph dataset
# Format: head_id [tab] relation_id [tab] tail_id
dataset = KnowledgeGraphDataset(
    triples_path="my_custom_data.txt", 
    sep='\t', 
    names=['h', 'r', 't']
)

# 2. Initialize Models
G = Generator(
    embedding_dim=256, 
    hidden_dim=512, 
    num_relations=dataset.num_relations
)
D = Discriminator(
    num_entities=dataset.num_entities,
    num_relations=dataset.num_relations,
    embedding_dim=256,
    hidden_dim=512
)

# 3. Create data loader and train
dataloader = DataLoader(dataset, batch_size=1024, shuffle=True)

Technical Report: DBLP Case Study

This repository supports the technical report "Knowledge Graph Completion and RDF Triple Generation with a Wasserstein GAN" and presents experiments on the DBLP Computer Science Bibliography.

Technical Report

A detailed description of the model architecture, training procedure, and evaluation protocol is provided in the technical report:

paper/knowledge-graph-completion-wasserstein-gan.pdf

The LaTeX source is available in paper/main.tex

Results

Training artifacts and generated RDF triples are available at: https://erdemonal.github.io/SemanticGAN

Methodology

The preprocessing pipeline parses the DBLP XML dump from https://dblp.uni-trier.de/xml to extract a knowledge graph with entity types Publication, Author, Venue, and Year. Relations include dblp:wrote, dblp:hasAuthor, dblp:publishedIn, and dblp:inYear.

The preprocessing script scripts/prepare_dblp_kg.py reads the XML file incrementally and produces RDF triples in tab separated format. The preprocessed 1M triple dataset is versioned and maintained in the Hugging Face Dataset Hub.

The WGAN model consists of a Generator that produces tail entity embeddings from noise and relation embeddings, and a Discriminator that scores triples using a scalar Wasserstein distance. Training uses RMSprop with gradient clipping to enforce the Lipschitz constraint.

Training and synchronization are automated via a continuous integration workflow. Training is executed on external compute infrastructure, and the resulting artifacts are synchronized after each run.

Model Storage and Data Decoupling

Model weights and processed knowledge graph artifacts are hosted on the Hugging Face Hub across two repositories:

Model Hub: erdemonal/SemanticGAN stores the persistent WGAN checkpoints.

Dataset Hub: erdemonal/SemanticGAN-Dataset contains the processed DBLP triples and ID mappings.

The automated training workflow fetches processed data from the Dataset Hub and restores model states from the Model Hub before each training run.

Data Availability

The DBLP dataset is publicly available from https://dblp.uni-trier.de/xml

Documentation is available at https://dblp.org/xml/docu/dblpxml.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
.github/workflows		.github/workflows
data		data
paper		paper
scripts		scripts
semanticgan		semanticgan
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dashboard_data.json		dashboard_data.json
index.html		index.html
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
semantic_map.html		semantic_map.html
setup.py		setup.py
styles.css		styles.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wasserstein GAN for Knowledge Graph Completion

Installation

Usage

Technical Report: DBLP Case Study

Technical Report

Results

Methodology

Model Storage and Data Decoupling

Data Availability

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Wasserstein GAN for Knowledge Graph Completion

Installation

Usage

Technical Report: DBLP Case Study

Technical Report

Results

Methodology

Model Storage and Data Decoupling

Data Availability

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages