Skip to content

davidsamy1/KnowledgeGraph-Chatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Building a Chatbot for Students Mentorship based on Extracted Knowledge Graphs

License Grade

📖 Abstract

The student mentorship process is often time-consuming for students and repetitive for university staff due to the high volume of inquiries regarding university regulations. This project introduces a chatbot that utilizes a Knowledge Graph constructed from unstructured text (specifically university PDF regulations). The system employs KG embedding models to predict missing links and infer relationships, enabling it to answer complex student queries with high accuracy.

🎓 Academic Resources

For a deep dive into the methodology, architectural design, and evaluation metrics of this project, please refer to the following documents:

  • Full Thesis PDF – A comprehensive breakdown of the research, implementation, and results.
  • Presentation Slides – The final defense deck used for the graduation committee.

🎬 Demo

Experience the chatbot in action by viewing the recorded demonstration:

Click here to watch the Demo Video

(The demo showcases the end-to-end pipeline from processing a student's natural language query to the generation of a factual response based on the Knowledge Graph.)


🏗️ Architecture

The chatbot pipeline consists of four sequential stages:

  1. Pre-processing User Input: Normalizing text via spell-checking, grammar correction, and lemmatization.
  2. Input Comprehension: Extracting subjects and predicates using spaCy for dependency parsing and Fuzzywuzzy for entity mapping.
  3. Knowledge Graph Embedding Model: Utilizing trained embeddings (TransE, DistMult, or ComplEx) to predict the missing head or tail of a triplet.
  4. Response Generation: Converting predicted triplets back into natural language using NLTK and the Pattern library for grammatical conjugation.

🛠️ Tech Stack

The project is built using Python 3.10 and the following core libraries[cite: 1326]:

Library Version Purpose
AmpliGraph 2.0.0 KG Embedding and Link Prediction
spaCy 3.5.1 NER, POS tagging, and Coreference Resolution
Stanford-OpenIE 1.3.1 Information extraction of (Subject-Relation-Object) triplets
Flask 2.2.2 Web framework for the chatbot interface
NLTK / Pattern 3.6.3 / 3.6 Natural Language Generation and text processing
PyPDF2 3.0.1 Extracting raw text from university regulation PDFs

📁 Repository Structure

├── data/
│   ├── raw_pdfs/           # University regulation documents 
│   └── triplets.csv        # Extracted Subject-Relation-Object data 
├── src/
│   ├── preprocessing.py    # Text cleaning and normalization 
│   ├── kg_construction.py  # Triple extraction and KG building 
│   ├── embedding_model.py  # Model training (TransE, ComplEx, etc.) 
│   └── app.py              # Flask application for user interaction 
├── docs/
│   └── Thesis_Full_PDF.pdf # Full academic documentation
└── README.md

🚀 Installation & Usage

  1. Clone the repository:
    git clone https://github.com/davidsamy1/Thesis-Chatbot.git
    cd Thesis-Chatbot
  2. Install dependencies:
    pip install -r requirements.txt
  3. Run the Application:
    python app.py

📊 Evaluation

The system was tested using three primary KG embedding algorithms to predict missing academic facts:

  • ComplEx: Captured anti-symmetric relations and complex interactions.
  • TransE: Provided efficient distance-based reasoning.
  • DistMult: Used for semantic matching energy modeling. The experimental results demonstrated that the models successfully captured semantic relationships and structural properties of the university KG.

🎓 Citation

If you use this work in your research, please cite it as follows:

@bachelorthesis{Samy2023,
  author = {David Samy},
  title  = {Building a Chatbot for Students Mentorship based on Extracted Knowledge Graphs},
  school = {German University in Cairo (GUC)},
  faculty = {Media Engineering and Technology},
  year   = {2023},
  month  = {June}
}

About

Bachelor Project, this repository contains the implementation of a task-oriented conversational agent designed to automate the student mentorship process. By leveraging Knowledge Graphs (KG) and KG Embedding models, the system provides personalized support by extracting and reasoning over university regulations.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors