Skip to content

Melinda315/MuseGraph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MuseGraph

The official implementation of Graph-oriented Instruction Tuning of Large Language Models for Generic Graph Mining (TPAMI 2025).

Environment Setup


  1. Llama-factory v0.8.0
  2. Python 3.7.15

Quick Start


Dataset Generation

After downloading the required dataset files, you can use the dataset generation scripts in the data directory (e.g., nc_imdb.ipynb) to prepare the corresponding datasets.
For CoT-based instruction generation, please refer to the Prompt Template section.

After generating and mixing multiple datasets, you can configure and register them in dataset_info.json under the llama-factory directory, for example:

  "train_nc_IMDB": {
    "file_name": "train_nc_IMDB.json",
    "columns": {
      "prompt": "instruction",
      "query": "input",
      "response": "output"
    }
  }

Train&&Test&&Evaulate

The script lora_process.py provides an end-to-end pipeline for model training, testing, and evaluation:

python src/lora_process.py

Data Download


The datasets used in this project can be accessed from the following links:

Prompt Template


IMDB

Type Prompt
Task-specific Instruction Input: Given the target MOVIE with the compact graph description in the IMDB dataset, what the following categories does this MOVIE belong to: {Category List}. This MOVIE may have one or more categories. Directly give the answer of this MOVIE's categories.

The compact graph description of this MOVIE is listed as follows: Title: {Title of MOVIE} Ego Graph Nodes: {Ego Graph Node List} One-hop Neighbors: {1-hop Neighbor List} Random Walks: {Random Walk Paths}.

Output: {Ground-truth Category List}.
Querying GPT-4 I have a question as below: {Task-specific Instruction Input} and the answer is {Task-specific Instruction Output}. Imagine that you have made the correct choice and proceed with step-by-step reasoning. Your reasoning needs to incorporate Ego Graph Nodes, One-hop Neighbors, and Random Walks in the given compact graph description.
CoT-based Instruction Input: Given the target MOVIE with the compact graph description in the IMDB dataset, what the following categories does this MOVIE belong to: {Category List}. This MOVIE may have one or more categories. Please think about the categorization in a step-by-step manner and avoid making false associations. Then provide your reasoning. Using the following format: Answer: {Answer}; Reasoning: {Reason}.

The compact graph description of this MOVIE is listed as follows: Title: {Title of MOVIE} Ego Graph Nodes: {Ego Graph Node List} One-hop Neighbors: {1-hop Neighbor List} Random Walks: {Random Walk Paths}.

Output: Answer: {Ground-truth Category List}; Reasoning: {Generated by GPT-4}.

Acknowledgements

We sincerely thank the following open-source repositories for their valuable codebases and contributions, which greatly helped this project:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors