Skip to content
View IlyaShaposhnikov's full-sized avatar
  • Russia, Saint Petersburg

Block or report IlyaShaposhnikov

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please donโ€™t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
IlyaShaposhnikov/README.md

๐Ÿ‡ท๐Ÿ‡บ Russian Version / ะะฐ ั€ัƒััะบะพะผ

๐Ÿ‘‹ Hi, I'm Ilya Shaposhnikov

Python Developer building backend services and integrating machine learning (ML) and NLP to solve real-world problems. I approach development as solving business problems: I value clean, maintainable architecture and measurable impact.

I focus on building robust, production-ready end-to-end solutions โ€” from designing architecture and business logic to containerization, CI/CD setup, and deployment. Below are projects I architected and implemented: from high-load APIs and modern AI/ML solutions to comprehensive web services.

Previous Experience

For over 12 years, I optimized complex processes at an international corporation, where I initiated and implemented IT solutions (from reporting automation to enterprise CAT system deployment). This experience shaped my systems thinking, process analysis skills, and ability to drive projects to production (tangible results: 90% reduction in manual effort, 70% increase in operational speed).

๐Ÿ›  Technology Stack

Backend & Frameworks: Python Django Django REST Framework Djoser FastAPI Flask Aiogram Telegram Bot API SQLAlchemy Alembic Pydantic

Data Science & Machine Learning: scikit-learn pandas NumPy Matplotlib Seaborn NLTK Ollama

Databases: PostgreSQL MySQL SQLite asyncpg

DevOps & Tools: Docker Docker Compose Git GitHub Actions Nginx Gunicorn Bash Postman

Testing: Pytest

Frontend & Other: HTML REST API JWT

๐Ÿ—ฃ๏ธ [Natural] Languages

  • ๐Ÿ‡ท๐Ÿ‡บ Russian: Native
  • ๐Ÿ‡ฌ๐Ÿ‡ง English: C2 (Proficient)
  • ๐Ÿ‡ช๐Ÿ‡ธ Spanish: C1 (Advanced)
  • ๐Ÿ‡ซ๐Ÿ‡ท French: B2 (Upper-Intermediate)
  • ๐Ÿ‡ฉ๐Ÿ‡ช German: B1 (Intermediate)

๐ŸŽฏ Open to Opportunities

I'm looking for a Python Developer role with a focus on ML/NLP projects in a product-driven team where I can bridge technical execution and business strategyโ€”applying my tech stack and unique cross-domain experience to build efficient code that solves real-world problems.

๐Ÿ“ Location: Saint Petersburg, Russia (open to on-site, remote, and hybrid work)

๐Ÿ“ง Contact: ilia.a.shaposhnikov@gmail.com

๐Ÿ“ฑ Telegram: @iliashaposhnikov

๐Ÿ’ผ LinkedIn: @iliashaposhnikov

๐Ÿš€ Key Projects

Category Project Key Technologies Core Concept & Challenges
Production Backend ๐Ÿ’ฐ Wallet REST API FastAPI, PostgreSQL, Docker, async Asynchronous API for financial operations with guaranteed data consistency under concurrent requests (transactions, row-level locks).
AI & Intelligent Systems ๐Ÿค– Video Analytics Bot Aiogram, Ollama (LLM), PostgreSQL, asyncpg NLP-powered bot: transforms natural language queries into SQL analytics using a local LLM (Mistral 7B).
Machine Learning (ML) & NLP ๐ŸŽฌ Movie Recommendation System scikit-learn, pandas, TF-IDF, Aiogram End-to-end recommendation engine based on textual features (genres, cast) with dual interfaces: Telegram bot and console app with visualization.
Machine Learning (ML) & NLP ๐Ÿ”ฌ Embedding Visualizer Gensim, scikit-learn, Matplotlib Interactive toolkit for deep semantic analysis of word embeddings with vector-arrow analogy visualization, semantic cluster projection (PCA/t-SNE), and evaluation on Google Analogy Test Set through an intuitive CLI. Built with a robust, modular architecture.
Machine Learning (ML) & NLP ๐Ÿšซ SMS Spam Detector scikit-learn, pandas, CLI Modular pipeline for binary SMS classification using Naive Bayes/Logistic Regression, featuring CLI interface, structured logging, artifact persistence, and interpretability via confusion matrices and word clouds.
Data Research & Analysis ๐Ÿ”ฌ CountVectorizer Comparison scikit-learn, NLTK, Matplotlib, Seaborn Comprehensive comparative analysis of 5 text preprocessing methods (basic, stop-word removal, lemmatization, stemming, simple tokenization) for news classification using CountVectorizer. Includes evaluation by accuracy, speed, and vocabulary size. Built with a modular architecture for maintainable and readable code.
Data Research & Analysis ๐Ÿ”‘ Text Keyword Extractor scikit-learn, pandas, NLTK In-depth TF-IDF analysis: from-scratch algorithm implementation with detailed comparison (formulas, weights, ranking) against scikit-learn's version for keyword extraction.
Data Research & Analysis ๐Ÿ”ฎ Pumpkin Price & Color Forecast scikit-learn, pandas, matplotlib, EDA Dual predictive models for agricultural economics: regression for price forecasting and classification for color prediction with interpretable outputs and production-ready structure.
Web Services & API ๐Ÿ”— URL Shortener Service Flask, SQLAlchemy, REST API, Alembic Web service with REST API for generating short URLs. Features validation, custom identifier support, and history tracking in a database.

๐Ÿ“š Full Project List

๐Ÿ’ฐ Wallet REST API [FastAPI, PostgreSQL]

High-load asynchronous REST API for managing financial balances. The service ensures data consistency during concurrent deposit/withdrawal operations, implementing an e-wallet pattern.

โœจ Key Features:

  • Concurrency Safety: Guarantees data integrity through READ COMMITTED transactions and SELECT ... FOR UPDATE row-level locks.
  • Production-Grade Stack: Full cycle from asynchronous backend to containerization.
  • Deployment Ready: Fully configured for Docker with orchestration (docker-compose).
  • Comprehensive Documentation: Auto-generated interactive OpenAPI (Swagger) documentation at /docs.

๐Ÿ›  Tech Stack: FastAPI PostgreSQL SQLAlchemy Alembic Docker Pytest

๐Ÿ“‚ Project Repository

๐Ÿค– Video Analytics Bot [AI, LLM, PostgreSQL]

Intelligent Telegram bot converting natural language queries into analytical SQL queries for a video statistics database. Uses a local LLM (Ollama + Mistral 7B) for prompt engineering and code generation.

โœจ Key Features:

  • NLP Interface: Users ask questions in natural language ("How many videos have >100K views?"), the bot returns a precise numerical answer.
  • Local LLM: Mistral 7B model via Ollama ensures complete data privacy, offline operation, and no limits/fees.
  • Prompt Engineering: Detailed system prompt with database schema description, strict rules, and examples for stable SQL query generation.
  • Production Architecture: Asynchronous bot on Aiogram 3.7+, optimized PostgreSQL with indexes, connection pooling via asyncpg.

๐Ÿ›  Tech Stack: Python Aiogram PostgreSQL Ollama asyncpg

๐Ÿ“‚ Project Repository

๐ŸŽฌ Movie Recommendation System [NLP, TF-IDF]

Film recommendation engine with dual UI: Telegram bot and console application. The system analyzes descriptions and cast using NLP and ML techniques.

โœจ Key Features:

  • Two Algorithms: Recommendations based on genres/keywords and weighted cast analysis.
  • Two Interfaces: Convenient Telegram bot and a visual console interface with charts.
  • End-to-End Pipeline: From data preprocessing (TF-IDF) to an interactive web application.

๐Ÿ›  Tech Stack: Python scikit-learn pandas Matplotlib Telegram Bot API

๐Ÿ“‚ Project Repository

๐Ÿ”ฌ Embedding Visualizer [ML, NLP, Visualization]

Interactive research toolkit for deep semantic analysis of word embeddings with vector-arrow visualizations of semantic relationships. Built with a robust, modular architecture for enhanced stability and maintainability. Enables intuitive exploration of semantic structure in Word2Vec and GloVe models through analogy visualization with directional arrows, semantic cluster projection, and quality evaluation.

โœจ Key Features:

  • Semantic Cluster Projection: Automatic 2D mapping of seed words and their nearest neighbors with color-coded clusters using PCA (global structure) and t-SNE (local neighborhoods).
  • Vector-Arrow Analogy Visualization: Unique 2D plots showing semantic relationships as directional arrows (w2 โ†’ w1 and result โ†’ w3), visually demonstrating parallelism in vector arithmetic (king - man + woman = queen).
  • Smart Model Management: Automatic download with integrity checks, mirror fallback, and binary caching for instant subsequent loads.
  • Lazy Model Loading: Models load on demand via Model Manager, improving startup speed and memory efficiency.
  • Robust Modular Architecture: Separated concerns across services, presentation, visualization, and data layers for clean, testable code. Centralized configuration and enhanced logging.
  • Quality Evaluation: Testing on Google Analogy Test Set (19,544 questions) with accuracy breakdown by semantic/syntactic categories and vocabulary coverage analysis.
  • Zero-Code Exploration: Intuitive command-line interface with contextual help, demo mode, and persistent session โ€” no programming required for deep semantic analysis.

๐Ÿ›  Tech Stack: Python Gensim scikit-learn Matplotlib NumPy

๐Ÿ“‚ Project Repository

๐Ÿšซ SMS Spam Detector [ML, NLP, CLI]

Modular pipeline for binary SMS classification using scikit-learn vectorizers and probabilistic models, designed with production-ready patterns and an intuitive command-line interface.

โœจ Key Features:

  • Flexible Model Selection: Support for Naive Bayes (fast baseline) and Logistic Regression (higher accuracy) via --model CLI argument.
  • Adaptive Vectorization: Switch between CountVectorizer and TfidfVectorizer with configurable n-grams, max features, and stop-word handling.
  • Comprehensive Evaluation: Automated calculation of accuracy, F1, precision, recall, and ROC-AUC with JSON export for experiment tracking.
  • Interpretability Tools: Confusion matrix visualization via sklearn's ConfusionMatrixDisplay, word clouds for spam/ham analysis, and misclassification inspection with probability scores.
  • CLI-Driven Workflow: Full pipeline execution via python scripts/train.py with argparse validation, reproducibility via --random-state, and optional plot generation.
  • Production Patterns: Structured logging to console + file, graceful error handling with exit codes, and artifact persistence (models, metrics, plots) with timestamps.
  • Modular Architecture: Clean separation of concerns across data, features, models, evaluation, and visualization modules for maintainable, testable code.

๐Ÿ›  Tech Stack: Python scikit-learn pandas NumPy Matplotlib Seaborn

๐Ÿ“‚ Project Repository

๐Ÿ”ฌ CountVectorizer Comparison Project [NLP]

Research project comparing the effectiveness of 5 text preprocessing methods (basic, stop-word removal, lemmatization, stemming, simple tokenization) when vectorizing with CountVectorizer on the BBC News dataset. Built with a modular architecture for enhanced readability and maintainability.

โœจ Key Features:

  • Comparative Analysis: Direct comparison of 5 text processing approaches by accuracy, vocabulary size, execution time, and matrix density.
  • Deep NLP Focus: Implementation and evaluation of linguistic methods (lemmatization, stemming).
  • Clear Conclusions: Identification of the optimal method that achieved a balance of accuracy and speed.
  • Full Visualization & Reporting: Auto-generated comparative charts, tables, and detailed CSV reports.
  • Modular Design: Code structured into dedicated modules (methods, utils) following DRY principles.

๐Ÿ›  Tech Stack: Python scikit-learn pandas NumPy NLTK Matplotlib

๐Ÿ“‚ Project Repository

๐Ÿ”‘ Text Keyword Extractor [TF-IDF, NLP]

Research project conducting in-depth analysis of the TF-IDF algorithm: from-scratch implementation with detailed comparison against scikit-learn's version for extracting keywords from text documents (BBC News dataset).

โœจ Key Features:

  • TF-IDF from Zero: Clean, documented pure-Python implementation without using third-party libraries for vector representation.
  • Algorithm Comparison: Step-by-step comparison of manual TF-IDF (idf = log(N / df)) vs. scikit-learn's optimized TF-IDF (idf = log((1 + N) / (1 + df)) + 1 with L2 normalization).
  • Analytical CLI: Interactive console interface for comprehensive exploration: document search by terms, top-N keyword extraction with weights, TF-IDF comparison, and random document analysis.
  • Industry Practices: NLTK for stop-word processing, result pagination, modular architecture, and real-world dataset usage.

๐Ÿ›  Tech Stack: Python scikit-learn pandas NLTK

๐Ÿ“‚ Project Repository

๐Ÿ”ฎ Pumpkin Price & Color Forecast [scikit-learn, pandas, EDA]

Dual predictive modeling project for US agricultural market data: regression for pumpkin price forecasting and classification for color prediction, with emphasis on interpretability and production-ready code.

โœจ Key Features:

  • Interpretable Regression Models: From simple linear (y = kx + b) to multivariate polynomial models achieving Rยฒ = 0.969, with clear formulas and performance metrics (MSE, RMSE).
  • High-Accuracy Classification: Logistic regression classifier predicting pumpkin color with F1 = 0.94 and AUC = 0.975, including threshold optimization and confusion matrix analysis.
  • Comprehensive EDA Pipeline: Automated exploratory analysis with visualizations of seasonality, correlations, and feature distributions saved to structured output directories.
  • Production-Ready Architecture: Modular code structure (src/, scripts/, utils/), reusable components, and demo script for end-to-end pipeline execution.
  • Practical Agricultural Economics Focus: Real-world dataset (US pumpkin market) with actionable insights on price drivers (variety, location, packaging) and color predictors.

๐Ÿ›  Tech Stack: Python scikit-learn pandas NumPy Matplotlib Seaborn

๐Ÿ“‚ Project Repository

๐Ÿ”— URL Shortener Service [Flask, REST API]

Web service for generating short URLs with a full-featured REST API and web interface. Supports both auto-generated short links and custom identifiers, with validation and storage in a database.

โœจ Key Features:

  • Dual Interface: User-friendly web interface for manual link creation and REST API for automated integration with other services.
  • Creation Flexibility: Both auto-generated short IDs (6 characters) and custom short identifiers.
  • Full Data Lifecycle: Flask-Migrate (Alembic) for database schema versioning, ensuring reliable storage of link history.
  • Robust Validation: Built-in validation of source URLs and custom identifiers via Flask-WTF and WTForms.

๐Ÿ›  Tech Stack: Flask SQLAlchemy Alembic REST API Jinja2

๐Ÿ“‚ Project Repository

๐Ÿ“š Full Project List

๐Ÿ‡ท๐Ÿ‡บ Russian Version / ะะฐ ั€ัƒััะบะพะผ

Pinned Loading

  1. wallet-api wallet-api Public

    A FastAPI-based REST API for managing user wallet balances with secure concurrent transaction handling using SELECT FOR UPDATE to ensure data integrity, built with asynchronous PostgreSQL and Docker

    Python

  2. embedding-visualizer embedding-visualizer Public

    An interactive toolkit for exploring Word2Vec and GloVe embeddings featuring nearest neighbor search with similarity bars, word analogy solving with 2D vector visualization, semantic cluster projecโ€ฆ

    Python

  3. film-recommendation-tfidf film-recommendation-tfidf Public

    A movie recommendation system using NLP and TF-IDF to suggest films based on genre, keywords, and cast similarity, with console and Telegram bot interfaces

    Python

  4. video_analytics_bot video_analytics_bot Public

    A Telegram bot that uses a local Ollama LLM to process natural language queries and retrieve video analytics from a PostgreSQL database

    Python

  5. shortlink_generator shortlink_generator Public

    A Flask-based URL shortening service with web and REST API access, supporting custom aliases and database storage

    Python 6

  6. text-keyword-extractor text-keyword-extractor Public

    A manual TF-IDF implementation for keyword extraction from BBC News, featuring comparative benchmarking against scikit-learn's TF-IDF and an interactive console interface with paginated search

    Python