RepliCheck: Question Duplication Detection

Here's a detailed and well-structured README for your project. This README provides a comprehensive overview of your project, including setup instructions, usage, and a detailed explanation of the code.

RepliCheck: Question Duplication Detection

RepliCheck is a web application designed to detect whether two input questions are duplicates. This is achieved through machine learning models trained to identify the semantic similarity between text pairs. The app is built with Streamlit for an interactive user experience and uses a pre-trained model for predictions.

Overview

RepliCheck aims to automate the detection of duplicate questions in various platforms, such as Q&A websites, forums, and chatbots. The application allows users to input two questions and quickly see if they are duplicates or not based on a pre-trained model.

Key Features:

Question Duplication Detection: Using a machine learning model to predict if two questions are duplicates.
Streamlit Web Interface: A user-friendly web interface to interact with the model.
Interactive Predictions: Users input questions and immediately see results.

Technologies Used

Streamlit: A Python library to create interactive web applications.
Python: The primary programming language for data processing and model training.
Pickle: Used for saving and loading the trained machine learning model.
scikit-learn: For model training and evaluation.

Installation

To run the RepliCheck application locally, follow these steps:

Prerequisites

Python 3.6+ (preferably 3.7 or above)
Streamlit: pip install streamlit
Scikit-learn: pip install scikit-learn
Pickle (part of Python standard library, no need to install)
Other dependencies listed in requirements.txt

Steps

Clone the repository:

git clone https://github.com/your-username/replcheck.git cd replcheck
Install dependencies: Create a virtual environment and activate it (optional but recommended):

python3 -m venv venv source venv/bin/activate # On Windows, use venv\Scripts\activate

Then install the dependencies:

pip install -r requirements.txt

Run the app: streamlit run app.py
Open your browser and visit http://localhost:8501 to interact with the app.

Usage

Once the app is running:

Enter Two Questions: Type your two questions in the input boxes on the app interface.
View Results: After entering both questions, the app will predict whether they are duplicates or not based on the trained model.

Project Structure

Here’s the structure of the repository:

│ ├── app.py # Streamlit web app to interact with the model ├── helper.py # Helper functions used in app.py (e.g., text preprocessing) ├── model.pkl # Trained machine learning model (saved with Pickle) ├── requirements.txt # List of Python dependencies ├── Component 7-2.png # Logo image for the web app └── README.md # This README fil

Model Training

The model used for duplicate question detection is based on Text Classification. Here’s an overview of the training process:

Data Preprocessing:
- Text normalization: Lowercasing, removing stopwords, punctuation, etc.
- Tokenization: Splitting questions into individual tokens for processing.
Feature Extraction:
- TF-IDF (Term Frequency-Inverse Document Frequency) vectors were generated from the preprocessed text to convert the text into numerical format.
Model Training:
- The dataset consists of pairs of questions, and the model predicts whether the pair is a duplicate.
- A machine learning classifier (e.g., Logistic Regression, Random Forests, or SVM) is trained using scikit-learn.
Model Evaluation:
- The model was evaluated on a validation set, and performance metrics like accuracy were computed.

Deployment

This app can be deployed on platforms like Streamlit Cloud, Heroku, or AWS. For local deployment:

The app is built using Streamlit, which allows you to create a web app that runs on your local machine.
The trained model is loaded using Pickle (model.pkl).
Custom CSS is added to style the app with a dark theme, providing a better user experience.

For cloud deployment, follow these steps:

Streamlit Cloud: Upload the repository and deploy directly.
Heroku: Push the repository to a Heroku app for deployment.

🔧 Model Improvement Suggestions To enhance model accuracy from 80% to 85–90%, the following steps can be implemented:

Switch to deep learning architectures (e.g., RNN, LSTM, Transformers)

Use advanced word embeddings (e.g., Word2Vec, GloVe, contextual embeddings)

Perform additional feature engineering

Combine multiple models (hybrid or ensemble approaches)

Apply advanced text preprocessing (e.g., stemming, lemmatization

Contributing

We welcome contributions to improve the project! You can contribute in the following ways:

Fork the repository and submit a pull request with bug fixes or new features.
Report issues by creating an issue on GitHub.
Suggest improvements for better performance or usability.

Contact:

Creator: Samardeep Singh
Email: samareduforcollege@gmail.com

This README file ensures anyone can understand your project, set it up, and contribute. Would you like to adjust anything in this or add more details?

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Procfile.txt		Procfile.txt
README.md		README.md
app.py		app.py
helper.py		helper.py
quora-question-pairs-project-using-nlp-techniques.ipynb		quora-question-pairs-project-using-nlp-techniques.ipynb
requirements.txt		requirements.txt
screenshot.png		screenshot.png
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RepliCheck: Question Duplication Detection

Table of Contents

Overview

Key Features:

Technologies Used

Installation

Prerequisites

Steps

Usage

Project Structure

Model Training

Deployment

Contributing

Contact:

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RepliCheck: Question Duplication Detection

Table of Contents

Overview

Key Features:

Technologies Used

Installation

Prerequisites

Steps

Usage

Project Structure

Model Training

Deployment

Contributing

Contact:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages