The official implementation of Graph-oriented Instruction Tuning of Large Language Models for Generic Graph Mining (TPAMI 2025).
- Llama-factory v0.8.0
- Python 3.7.15
Dataset Generation
After downloading the required dataset files, you can use the dataset generation scripts in the data directory (e.g., nc_imdb.ipynb) to prepare the corresponding datasets.
For CoT-based instruction generation, please refer to the Prompt Template section.
After generating and mixing multiple datasets, you can configure and register them in dataset_info.json under the llama-factory directory, for example:
"train_nc_IMDB": {
"file_name": "train_nc_IMDB.json",
"columns": {
"prompt": "instruction",
"query": "input",
"response": "output"
}
}Train&&Test&&Evaulate
The script lora_process.py provides an end-to-end pipeline for model training, testing, and evaluation:
python src/lora_process.pyThe datasets used in this project can be accessed from the following links:
| Type | Prompt |
|---|---|
| Task-specific Instruction | Input: Given the target MOVIE with the compact graph description in the IMDB dataset, what the following categories does this MOVIE belong to: {Category List}. This MOVIE may have one or more categories. Directly give the answer of this MOVIE's categories. The compact graph description of this MOVIE is listed as follows: Title: {Title of MOVIE} Ego Graph Nodes: {Ego Graph Node List} One-hop Neighbors: {1-hop Neighbor List} Random Walks: {Random Walk Paths}. Output: {Ground-truth Category List}. |
| Querying GPT-4 | I have a question as below: {Task-specific Instruction Input} and the answer is {Task-specific Instruction Output}. Imagine that you have made the correct choice and proceed with step-by-step reasoning. Your reasoning needs to incorporate Ego Graph Nodes, One-hop Neighbors, and Random Walks in the given compact graph description. |
| CoT-based Instruction | Input: Given the target MOVIE with the compact graph description in the IMDB dataset, what the following categories does this MOVIE belong to: {Category List}. This MOVIE may have one or more categories. Please think about the categorization in a step-by-step manner and avoid making false associations. Then provide your reasoning. Using the following format: Answer: {Answer}; Reasoning: {Reason}. The compact graph description of this MOVIE is listed as follows: Title: {Title of MOVIE} Ego Graph Nodes: {Ego Graph Node List} One-hop Neighbors: {1-hop Neighbor List} Random Walks: {Random Walk Paths}. Output: Answer: {Ground-truth Category List}; Reasoning: {Generated by GPT-4}. |
We sincerely thank the following open-source repositories for their valuable codebases and contributions, which greatly helped this project:
