| title | LayerLens - Adaptive Low-Rank Adaptation Selector |
|---|
LayerLens is a Cython-accelerated selection engine that automatically determines the optimal PEFT (Parameter-Efficient Fine-Tuning) method and rank for each layer in large language and vision models.
LayerLens is a Cython-accelerated selection engine that automatically determines the optimal PEFT (Parameter-Efficient Fine-Tuning) method and rank for each layer in large language and vision models.
Fine-tuning large language and vision models is expensive. Techniques like LoRA, adapters, and prefix tuning help reduce costs, but deciding which layers to adapt and what rank to use is still mostly done by hand. Recent work like AdaLoRA, AutoLoRA, and HyperLoRA (2023-2024) shows some automation, but they either stick to one method or need heavy AutoML that doesn't fit well in production pipelines.
LayerLens profiles a pretrained model, estimates layer sensitivities, and solves a constrained optimization problem to pick the best PEFT method (LoRA, adapter, prefix, or none) and rank for each layer. The output is a configuration that feeds directly into fine-tuning jobs, making iterations faster and decisions reproducible.
The system works in three steps:
-
Layer Sensitivity Profiling
- Compute gradient energy, Fisher information, or NTK proxies using a calibration set.
- Run short proxy fine-tune steps on selected layers to validate theoretical scores.
- Heavy linear algebra runs in Cython using memory views and fused types for cache-friendly access.
-
Multi-Objective Optimization
- For each layer, choose a method (LoRA, adapter, prefix, or none) and a rank/size.
- Respect constraints: total trainable parameters, FLOPs, VRAM, and latency budgets.
- Goal: maximize predicted utility under constraints. A fast greedy-Lagrangian solver (with optional metaheuristic refinement) runs entirely in Cython for deterministic performance.
-
Configuration Output
- Generate a JSON/YAML manifest describing per-layer PEFT strategy and rank.
- An optional hypernetwork plug-in can generate parameters, but the core workflow works without it.
┌────────────────────┐
│ Model Checkpoint │
└──────┬─────────────┘
│
┌──────▼─────────────┐ ┌─────────────────────┐
│ Cython Profiler │─────▶│ Sensitivity Cache │
└──────┬─────────────┘ └─────────────────────┘
│
┌──────▼─────────────┐
│ Cython Optimizer │─────▶ Optimization logs / metrics
└──────┬─────────────┘
│
┌──────▼─────────────┐
│ Config Generator │─────▶ JSON/YAML manifest
└──────┬─────────────┘
│
┌──────▼─────────────┐
│ Fine-Tune Runner │ (LoRA/adapter/prefix training)
└────────────────────┘
| Work | Key Idea | What LayerLens Addresses |
|---|---|---|
| LoRA (Hu et al., 2021) | Low-rank updates for attention projections | Manual rank and layer selection |
| AdaLoRA (Zhang et al., 2023) | Adaptive rank re-allocation during training | Single technique, GPU-focused heuristics |
| AutoLoRA / PEFT-NAS (2024) | Reinforcement learning search for LoRA placement | High search cost, limited MLOps integration |
| HyperLoRA (2024) | Hypernetworks generate LoRA parameters | Doesn't jointly reason about method and rank |
Cython 3.0 brings better pure-Python typing support and deterministic builds for Python 3.12, making it practical for high-performance kernels deployed as wheels in containers.
LayerLens includes an end-to-end pipeline that orchestrates YOLO object detection and LLM textual analysis, with detailed latency measurement at each step. This is particularly useful for scenarios like X-ray analysis where object detection results are analyzed by an LLM.
Image → YOLO Detection → Bounding Boxes → LLM Analysis → Textual Report
↓ ↓ ↓ ↓ ↓
Load Inference Format Prompt Generate Save Results
(ms) (ms) (ms) (ms) (JSON)
# Install pipeline dependencies
pip install -e .[pipeline]
# Run pipeline demo
python demos/llm_yolo_pipeline.pyThe pipeline measures and reports:
- Image loading time
- YOLO detection latency (per detection)
- LLM analysis latency (per token generation)
- Communication overhead between models
- Total end-to-end latency
Results are saved as JSON with complete timing breakdown, enabling analysis of bottlenecks and optimization opportunities.
The system integrates into MLOps pipelines as follows:
-
Artifact Packaging: Cython modules compile into wheels in CI, get signed, and publish to an internal package registry. Docker images pull the wheel at runtime.
-
Pipeline Hook: In orchestration platforms (Kubeflow, Airflow, Azure ML, etc.), a dedicated stage triggers the profiler/optimizer and writes the manifest to object storage.
-
Fine-Tune Stage: Training scripts read the manifest, instantiate the requested PEFT modules, and log metrics back to the tracking server.
-
Observation: Telemetry (Prometheus/OpenTelemetry) captures runtime statistics for auditing and future meta-learning.
The benchmark script is at benchmarks/profile_batch_benchmark.py. The target was at least 10x speedup compared to pure Python loops in gradient/Fisher batch scoring.
| Layers x Hidden | Gradient Speedup | Fisher Speedup |
|---|---|---|
| 256 x 1024 | 95.8x | 9.8x |
| 512 x 2048 | 124.3x | 13.5x |
| 1024 x 4096 | 119.9x | 14.8x |
The gradient side exceeded the target significantly. The Fisher side met the target for large configurations, with additional optimization planned for smaller ones.
| Aspect | Description |
|---|---|
| Models | BERT-base, LLaMA-2-7B, ViT variants |
| Tasks | GLUE/SuperGLUE subsets, SQuAD, ImageNet-1k fine-grained |
| Baselines | Fixed LoRA (uniform rank), AdaLoRA, AutoLoRA, adapter-only |
| Metrics | Fine-tune wall time, GPU memory usage, task accuracy/F1, inference latency |
| Ablations | Sensitivity metric variants, optimizer heuristics, with/without hypernetwork |
# Clone the repository
git clone https://github.com/ErenAta16/LayerLens.git
cd LayerLens
# Install in editable mode with all dependencies
pip install -e ".[demo,yolo,pipeline]"
# Or install minimal dependencies
pip install -e .from layerlens.pipeline import run_pipeline
from layerlens.config import ProfilingConfig, OptimizationConfig, LatencyProfile
from layerlens.models import ModelSpec, LayerSpec
# Define your model
model_spec = ModelSpec(
model_name="bert-base-uncased",
total_params=110_000_000,
layers=[
LayerSpec(name=f"encoder.layer.{i}", hidden_size=768, layer_type="transformer")
for i in range(12)
]
)
# Configure profiling
profiling_cfg = ProfilingConfig(
metric_weights={"gradient_energy": 0.4, "fisher": 0.4, "proxy_eval": 0.2}
)
# Configure optimization with latency profile
latency_profile = LatencyProfile(
device_type="gpu",
model_family="llm",
batch_size=4,
sequence_length=512
)
optimization_cfg = OptimizationConfig(
max_trainable_params=50_000,
max_flops=1e9,
max_vram_gb=8.0,
latency_target_ms=100.0,
latency_profile=latency_profile
)
# Prepare activation cache (from your model profiling)
activation_cache = {
f"encoder.layer.{i}": {
"grad_norm": 0.5 + i * 0.1,
"fisher_trace": 0.3 + i * 0.05
}
for i in range(12)
}
# Run pipeline
from pathlib import Path
output_dir = Path("./output")
manifest_path = run_pipeline(
model_spec=model_spec,
profiling_cfg=profiling_cfg,
optimization_cfg=optimization_cfg,
activation_cache=activation_cache,
output_dir=output_dir
)
print(f"Manifest saved to: {manifest_path}")# BERT demo
python demos/demo_bert.py
# YOLO demo
python demos/demo_yolo.py
# LLM-YOLO pipeline
python demos/llm_yolo_pipeline.py# Run all tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=layerlens --cov-report=htmlLayerLens provides comprehensive error handling with specific exception types:
from layerlens.exceptions import (
ConfigurationError,
ModelSpecError,
ActivationCacheError,
ProfilingError,
OptimizationError,
ManifestError,
)
from layerlens.utils.validation import (
validate_model_spec,
validate_activation_cache,
validate_config,
)
# Validate inputs before running pipeline
try:
validate_model_spec(model_spec)
validate_activation_cache(activation_cache, model_spec)
validate_config(profiling_cfg, optimization_cfg)
manifest_path = run_pipeline(...)
except (ModelSpecError, ActivationCacheError, ConfigurationError) as e:
print(f"Validation failed: {e}")
# Fix inputs and retrySee docs/ERROR_HANDLING.md for detailed error handling guide.
For running on Google Colab with A100 GPU, use the following setup cells:
!git clone https://github.com/ErenAta16/LayerLens.git
%cd LayerLens
!pip install -e ".[demo,yolo,pipeline]" -qimport torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'CPU'}")# Copy content from demos/demo_bert.py
# Or import and run directly
from demos.demo_bert import main
main()- ✅ Implement profiling kernels in Cython (gradient/Fisher + proxy training).
- ✅ Develop the constrained optimization solver and manifest generator.
- ✅ Build CLI/REST interface and sample pipeline integrations.
- ✅ Execute benchmark suite and publish results with reproducible scripts.
- 🔄 Prepare academic report and public technical documentation.
- A reproducible methodology for joint method-and-rank selection across PEFT families.
- An open-source Cython engine that integrates into diverse MLOps environments.
- Empirical evidence on large language and vision models showing improved cost-performance trade-offs over existing adaptive approaches.