Ilya Shaposhnikov IlyaShaposhnikov

👋 Hi, I'm Ilya Shaposhnikov

Python Developer building backend services and integrating machine learning (ML) and NLP to solve real-world problems. I approach development as solving business problems: I value clean, maintainable architecture and measurable impact.

I focus on building robust, production-ready end-to-end solutions — from designing architecture and business logic to containerization, CI/CD setup, and deployment. Below are projects I architected and implemented: from high-load APIs and modern AI/ML solutions to comprehensive web services.

Previous Experience

For over 12 years, I optimized complex processes at an international corporation, where I initiated and implemented IT solutions (from reporting automation to enterprise CAT system deployment). This experience shaped my systems thinking, process analysis skills, and ability to drive projects to production (tangible results: 90% reduction in manual effort, 70% increase in operational speed).

🛠 Technology Stack

Backend & Frameworks:

Data Science & Machine Learning:

Databases:

DevOps & Tools:

Testing:

Frontend & Other:

🗣️ [Natural] Languages

🇷🇺 Russian: Native
🇬🇧 English: C2 (Proficient)
🇪🇸 Spanish: C1 (Advanced)
🇫🇷 French: B2 (Upper-Intermediate)
🇩🇪 German: B1 (Intermediate)

🎯 Open to Opportunities

I'm looking for a Python Developer role with a focus on ML/NLP projects in a product-driven team where I can bridge technical execution and business strategy—applying my tech stack and unique cross-domain experience to build efficient code that solves real-world problems.

📍 Location: Saint Petersburg, Russia (open to on-site, remote, and hybrid work)

📧 Contact: ilia.a.shaposhnikov@gmail.com

📱 Telegram: @iliashaposhnikov

💼 LinkedIn: @iliashaposhnikov

🚀 Key Projects

Category	Project	Key Technologies	Core Concept & Challenges
Production Backend	💰 Wallet REST API	FastAPI, PostgreSQL, Docker, async	Asynchronous API for financial operations with guaranteed data consistency under concurrent requests (transactions, row-level locks).
AI & Intelligent Systems	🤖 Video Analytics Bot	Aiogram, Ollama (LLM), PostgreSQL, asyncpg	NLP-powered bot: transforms natural language queries into SQL analytics using a local LLM (Mistral 7B).
Machine Learning (ML) & NLP	🎬 Movie Recommendation System	scikit-learn, pandas, TF-IDF, Aiogram	End-to-end recommendation engine based on textual features (genres, cast) with dual interfaces: Telegram bot and console app with visualization.
Machine Learning (ML) & NLP	🔬 Embedding Visualizer	Gensim, scikit-learn, Matplotlib	Interactive toolkit for deep semantic analysis of word embeddings with vector-arrow analogy visualization, semantic cluster projection (PCA/t-SNE), and evaluation on Google Analogy Test Set through an intuitive CLI. Built with a robust, modular architecture.
Machine Learning (ML) & NLP	🚫 SMS Spam Detector	scikit-learn, pandas, CLI	Modular pipeline for binary SMS classification using Naive Bayes/Logistic Regression, featuring CLI interface, structured logging, artifact persistence, and interpretability via confusion matrices and word clouds.
Data Research & Analysis	🔬 CountVectorizer Comparison	scikit-learn, NLTK, Matplotlib, Seaborn	Comprehensive comparative analysis of 5 text preprocessing methods (basic, stop-word removal, lemmatization, stemming, simple tokenization) for news classification using CountVectorizer. Includes evaluation by accuracy, speed, and vocabulary size. Built with a modular architecture for maintainable and readable code.
Data Research & Analysis	🔑 Text Keyword Extractor	scikit-learn, pandas, NLTK	In-depth TF-IDF analysis: from-scratch algorithm implementation with detailed comparison (formulas, weights, ranking) against scikit-learn's version for keyword extraction.
Data Research & Analysis	🔮 Pumpkin Price & Color Forecast	scikit-learn, pandas, matplotlib, EDA	Dual predictive models for agricultural economics: regression for price forecasting and classification for color prediction with interpretable outputs and production-ready structure.
Web Services & API	🔗 URL Shortener Service	Flask, SQLAlchemy, REST API, Alembic	Web service with REST API for generating short URLs. Features validation, custom identifier support, and history tracking in a database.

📚 Full Project List

💰 Wallet REST API [FastAPI, PostgreSQL]

High-load asynchronous REST API for managing financial balances. The service ensures data consistency during concurrent deposit/withdrawal operations, implementing an e-wallet pattern.

✨ Key Features:

Concurrency Safety: Guarantees data integrity through READ COMMITTED transactions and SELECT ... FOR UPDATE row-level locks.
Production-Grade Stack: Full cycle from asynchronous backend to containerization.
Deployment Ready: Fully configured for Docker with orchestration (docker-compose).
Comprehensive Documentation: Auto-generated interactive OpenAPI (Swagger) documentation at /docs.

🛠 Tech Stack:

📂 Project Repository

🤖 Video Analytics Bot [AI, LLM, PostgreSQL]

Intelligent Telegram bot converting natural language queries into analytical SQL queries for a video statistics database. Uses a local LLM (Ollama + Mistral 7B) for prompt engineering and code generation.

✨ Key Features:

NLP Interface: Users ask questions in natural language ("How many videos have >100K views?"), the bot returns a precise numerical answer.
Local LLM: Mistral 7B model via Ollama ensures complete data privacy, offline operation, and no limits/fees.
Prompt Engineering: Detailed system prompt with database schema description, strict rules, and examples for stable SQL query generation.
Production Architecture: Asynchronous bot on Aiogram 3.7+, optimized PostgreSQL with indexes, connection pooling via asyncpg.

🛠 Tech Stack:

📂 Project Repository

🎬 Movie Recommendation System [NLP, TF-IDF]

Film recommendation engine with dual UI: Telegram bot and console application. The system analyzes descriptions and cast using NLP and ML techniques.

✨ Key Features:

Two Algorithms: Recommendations based on genres/keywords and weighted cast analysis.
Two Interfaces: Convenient Telegram bot and a visual console interface with charts.
End-to-End Pipeline: From data preprocessing (TF-IDF) to an interactive web application.

🛠 Tech Stack:

📂 Project Repository

🔬 Embedding Visualizer [ML, NLP, Visualization]

Interactive research toolkit for deep semantic analysis of word embeddings with vector-arrow visualizations of semantic relationships. Built with a robust, modular architecture for enhanced stability and maintainability. Enables intuitive exploration of semantic structure in Word2Vec and GloVe models through analogy visualization with directional arrows, semantic cluster projection, and quality evaluation.

✨ Key Features:

Semantic Cluster Projection: Automatic 2D mapping of seed words and their nearest neighbors with color-coded clusters using PCA (global structure) and t-SNE (local neighborhoods).
Vector-Arrow Analogy Visualization: Unique 2D plots showing semantic relationships as directional arrows (w2 → w1 and result → w3), visually demonstrating parallelism in vector arithmetic (king - man + woman = queen).
Smart Model Management: Automatic download with integrity checks, mirror fallback, and binary caching for instant subsequent loads.
Lazy Model Loading: Models load on demand via Model Manager, improving startup speed and memory efficiency.
Robust Modular Architecture: Separated concerns across services, presentation, visualization, and data layers for clean, testable code. Centralized configuration and enhanced logging.
Quality Evaluation: Testing on Google Analogy Test Set (19,544 questions) with accuracy breakdown by semantic/syntactic categories and vocabulary coverage analysis.
Zero-Code Exploration: Intuitive command-line interface with contextual help, demo mode, and persistent session — no programming required for deep semantic analysis.

🛠 Tech Stack:

📂 Project Repository

🚫 SMS Spam Detector [ML, NLP, CLI]

Modular pipeline for binary SMS classification using scikit-learn vectorizers and probabilistic models, designed with production-ready patterns and an intuitive command-line interface.

✨ Key Features:

Flexible Model Selection: Support for Naive Bayes (fast baseline) and Logistic Regression (higher accuracy) via --model CLI argument.
Adaptive Vectorization: Switch between CountVectorizer and TfidfVectorizer with configurable n-grams, max features, and stop-word handling.
Comprehensive Evaluation: Automated calculation of accuracy, F1, precision, recall, and ROC-AUC with JSON export for experiment tracking.
Interpretability Tools: Confusion matrix visualization via sklearn's ConfusionMatrixDisplay, word clouds for spam/ham analysis, and misclassification inspection with probability scores.
CLI-Driven Workflow: Full pipeline execution via python scripts/train.py with argparse validation, reproducibility via --random-state, and optional plot generation.
Production Patterns: Structured logging to console + file, graceful error handling with exit codes, and artifact persistence (models, metrics, plots) with timestamps.
Modular Architecture: Clean separation of concerns across data, features, models, evaluation, and visualization modules for maintainable, testable code.

🛠 Tech Stack:

📂 Project Repository

🔬 CountVectorizer Comparison Project [NLP]

Research project comparing the effectiveness of 5 text preprocessing methods (basic, stop-word removal, lemmatization, stemming, simple tokenization) when vectorizing with CountVectorizer on the BBC News dataset. Built with a modular architecture for enhanced readability and maintainability.

✨ Key Features:

Comparative Analysis: Direct comparison of 5 text processing approaches by accuracy, vocabulary size, execution time, and matrix density.
Deep NLP Focus: Implementation and evaluation of linguistic methods (lemmatization, stemming).
Clear Conclusions: Identification of the optimal method that achieved a balance of accuracy and speed.
Full Visualization & Reporting: Auto-generated comparative charts, tables, and detailed CSV reports.
Modular Design: Code structured into dedicated modules (methods, utils) following DRY principles.

🛠 Tech Stack:

📂 Project Repository

🔑 Text Keyword Extractor [TF-IDF, NLP]

Research project conducting in-depth analysis of the TF-IDF algorithm: from-scratch implementation with detailed comparison against scikit-learn's version for extracting keywords from text documents (BBC News dataset).

✨ Key Features:

TF-IDF from Zero: Clean, documented pure-Python implementation without using third-party libraries for vector representation.
Algorithm Comparison: Step-by-step comparison of manual TF-IDF (idf = log(N / df)) vs. scikit-learn's optimized TF-IDF (idf = log((1 + N) / (1 + df)) + 1 with L2 normalization).
Analytical CLI: Interactive console interface for comprehensive exploration: document search by terms, top-N keyword extraction with weights, TF-IDF comparison, and random document analysis.
Industry Practices: NLTK for stop-word processing, result pagination, modular architecture, and real-world dataset usage.

🛠 Tech Stack:

📂 Project Repository

🔮 Pumpkin Price & Color Forecast [scikit-learn, pandas, EDA]

Dual predictive modeling project for US agricultural market data: regression for pumpkin price forecasting and classification for color prediction, with emphasis on interpretability and production-ready code.

✨ Key Features:

Interpretable Regression Models: From simple linear (y = kx + b) to multivariate polynomial models achieving R² = 0.969, with clear formulas and performance metrics (MSE, RMSE).
High-Accuracy Classification: Logistic regression classifier predicting pumpkin color with F1 = 0.94 and AUC = 0.975, including threshold optimization and confusion matrix analysis.
Comprehensive EDA Pipeline: Automated exploratory analysis with visualizations of seasonality, correlations, and feature distributions saved to structured output directories.
Production-Ready Architecture: Modular code structure (src/, scripts/, utils/), reusable components, and demo script for end-to-end pipeline execution.
Practical Agricultural Economics Focus: Real-world dataset (US pumpkin market) with actionable insights on price drivers (variety, location, packaging) and color predictors.

🛠 Tech Stack:

📂 Project Repository

🔗 URL Shortener Service [Flask, REST API]

Web service for generating short URLs with a full-featured REST API and web interface. Supports both auto-generated short links and custom identifiers, with validation and storage in a database.

✨ Key Features:

Dual Interface: User-friendly web interface for manual link creation and REST API for automated integration with other services.
Creation Flexibility: Both auto-generated short IDs (6 characters) and custom short identifiers.
Full Data Lifecycle: Flask-Migrate (Alembic) for database schema versioning, ensuring reliable storage of link history.
Robust Validation: Built-in validation of source URLs and custom identifiers via Flask-WTF and WTForms.

🛠 Tech Stack:

📂 Project Repository

📚 Full Project List

🇷🇺 Russian Version / На русском

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ilya Shaposhnikov IlyaShaposhnikov

Achievements

Achievements

Block or report IlyaShaposhnikov

👋 Hi, I'm Ilya Shaposhnikov

Previous Experience

🛠 Technology Stack

🗣️ [Natural] Languages

🎯 Open to Opportunities

🚀 Key Projects

💰 Wallet REST API [FastAPI, PostgreSQL]

🤖 Video Analytics Bot [AI, LLM, PostgreSQL]

🎬 Movie Recommendation System [NLP, TF-IDF]

🔬 Embedding Visualizer [ML, NLP, Visualization]

🚫 SMS Spam Detector [ML, NLP, CLI]

🔬 CountVectorizer Comparison Project [NLP]

🔑 Text Keyword Extractor [TF-IDF, NLP]

🔮 Pumpkin Price & Color Forecast [scikit-learn, pandas, EDA]

🔗 URL Shortener Service [Flask, REST API]

Pinned Loading

Uh oh!