SLM Test Summarization

A Python framework for testing Small Language Models (SLMs) locally for automated test case and test result summarization. This project enables evaluation and comparison of different language models for their ability to generate concise, accurate summaries of software testing outcomes.

🚀 Features

Multi-Backend Support: Ollama (local models) and Hugging Face (cloud models)
Comprehensive Evaluation: ROUGE, BLEU, and custom metrics for summary quality
Batch Processing: Efficient processing of multiple test cases
Model Comparison: Side-by-side evaluation of different SLMs
Extensible Architecture: Easy to add new model providers and evaluation metrics

📊 Supported Models

Hugging Face Models

BART-Large-CNN: Facebook's state-of-the-art summarization model ✅ Working
T5-Small/Base/Large: Google's text-to-text transformer models
Any Hugging Face seq2seq model

Ollama Models (Local)

Llama 3.2: Meta's latest language model (3B, 7B, 70B variants)
Phi 3.5: Microsoft's efficient small language model
Gemma: Google's lightweight model family
Any model available through Ollama

🛠️ Installation

Prerequisites

Python 3.8+
4GB+ RAM (8GB+ recommended for larger models)
Git

Quick Setup

Clone the repository

git clone https://github.com/yourusername/slm-test-summarization.git
cd slm-test-summarization

Create virtual environment

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```
Run the example
```
python examples/basic_summarization.py
```

Optional: Ollama Setup for Local Models

Install Ollama from ollama.ai
Pull a model
```
ollama pull llama3.2:3b
```
Start Ollama service
```
ollama serve
```

🎯 Usage

Basic Example

from src.models.huggingface_model import HuggingFaceModel
from src.models.base import SummarizationRequest
from src.evaluation import SummarizationEvaluator

# Initialize model
model = HuggingFaceModel('facebook/bart-large-cnn')

# Create test case
request = SummarizationRequest(
    test_content="""
    Unit Test Results - User Authentication
    Test: test_valid_login()
    Status: PASSED
    User provided valid credentials (user@example.com, correct_password)
    Expected: Login successful, session token generated
    Actual: Login successful, session token: abc123xyz
    Assertions: 3/3 passed
    Duration: 0.25s
    """,
    test_type="UNIT",
    max_length=50,
    style="concise"
)

# Generate summary
response = model.summarize(request)
print(f"Summary: {response.summary}")
print(f"Processing time: {response.processing_time:.2f}s")

# Evaluate quality
evaluator = SummarizationEvaluator()
reference = "User authentication test passed with valid credentials"
evaluation = evaluator.evaluate_single(reference, response.summary, "BART")
print(f"ROUGE-1 Score: {evaluation['rouge']['rouge1_fmeasure']:.3f}")

Batch Processing

from src.models.ollama_model import OllamaModel

# Initialize Ollama model
model = OllamaModel('llama3.2:3b')

# Process multiple test cases
test_cases = [
    SummarizationRequest(test_content="...", test_type="UNIT"),
    SummarizationRequest(test_content="...", test_type="E2E"),
    SummarizationRequest(test_content="...", test_type="INTEGRATION")
]

responses = model.summarize_batch(test_cases)
for i, response in enumerate(responses):
    print(f"Test {i+1}: {response.summary}")

Model Comparison

from src.evaluation.comparison import ModelComparator

# Compare multiple models
models = {
    'BART': HuggingFaceModel('facebook/bart-large-cnn'),
    'Llama3.2': OllamaModel('llama3.2:3b')
}

comparator = ModelComparator()
results = comparator.compare_models(models, test_cases, references)

# Generate report
from src.evaluation.reports import ReportGenerator
generator = ReportGenerator()
report = generator.generate_comparison_report(results)
print(report)

📊 Evaluation Metrics

Standard Metrics

ROUGE-1/2/L: Content overlap and recall
BLEU: Translation quality adapted for summarization

Custom Metrics

Length Ratio: Summary conciseness (target vs actual length)
Keyword Coverage: Important terms preservation
Readability Score: Text complexity analysis
Completeness: Information retention assessment

🏗️ Project Structure

slm-test-summarization/
├── src/
│   ├── models/              # SLM integrations
│   │   ├── base.py         # Abstract base classes
│   │   ├── huggingface_model.py
│   │   └── ollama_model.py
│   ├── evaluation/          # Metrics and comparison
│   │   ├── metrics.py      # ROUGE, BLEU, custom metrics
│   │   ├── comparison.py   # Model comparison tools
│   │   └── reports.py      # Report generation
│   ├── data/               # Test data processing
│   └── utils/              # Configuration and helpers
├── examples/               # Usage examples
├── requirements.txt        # Python dependencies
└── README.md              # This file

📈 Performance Benchmarks

Model	Avg Time/Test	ROUGE-1 Score	Memory Usage
BART-Large-CNN	4.2s	0.354	2.1GB
Llama 3.2:3B	2.8s*	0.331*	3.2GB*
T5-Base	3.1s*	0.298*	1.8GB*

Estimated performance based on model specifications

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🎯 Use Cases

CI/CD Integration: Automatic test result summarization in build pipelines
QA Reporting: Generate executive summaries of test suite outcomes
Model Research: Compare different SLMs for domain-specific summarization
Test Analysis: Quickly understand large test suite results
Documentation: Auto-generate test case descriptions

🙏 Acknowledgments

Hugging Face for transformer models and libraries
Ollama for local model serving
ROUGE for evaluation metrics
Open source community for inspiration and contributions

⭐ Star this repository if it helps your testing workflow!

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
config		config
data		data
docs		docs
examples		examples
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SLM Test Summarization

🚀 Features

📊 Supported Models

Hugging Face Models

Ollama Models (Local)

🛠️ Installation

Prerequisites

Quick Setup

Optional: Ollama Setup for Local Models

🎯 Usage

Basic Example

Batch Processing

Model Comparison

📊 Evaluation Metrics

Standard Metrics

Custom Metrics

🏗️ Project Structure

📈 Performance Benchmarks

🤝 Contributing

📄 License

🎯 Use Cases

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SLM Test Summarization

🚀 Features

📊 Supported Models

Hugging Face Models

Ollama Models (Local)

🛠️ Installation

Prerequisites

Quick Setup

Optional: Ollama Setup for Local Models

🎯 Usage

Basic Example

Batch Processing

Model Comparison

📊 Evaluation Metrics

Standard Metrics

Custom Metrics

🏗️ Project Structure

📈 Performance Benchmarks

🤝 Contributing

📄 License

🎯 Use Cases

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages