Skip to content

xicv/recogniz.ing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Recogniz.ing Logo

Recogniz.ing

Voice, transformed.

AI-powered voice typing that adapts to your workflow—privacy-first, cross-platform, and powered by Google Gemini 3 Flash

License: MIT Flutter GitHub Stars

WebsiteDocumentationChangelogReport Issue


✨ What is Recogniz.ing?

Recogniz.ing turns your voice into text—intelligently. Whether you're writing emails, meeting notes, or the next great novel, it captures your words with remarkable accuracy.

What makes it different:

  • Privacy-first — Your voice and transcriptions stay on your device
  • Free tier friendly — Gemini 3 Flash's generous free tier covers daily use for most people
  • Multi-language — Works in 20+ languages with automatic detection
  • Cross-platform — macOS, Windows, Linux, iOS, Android, Web

Quick Start

Prerequisites

You'll need two things:

  1. Flutter SDK (3.38+)

    flutter doctor
  2. Gemini API KeyGet your free key from Google AI Studio

    • The free tier is generous: most users won't need to pay
    • If you do exceed it, paid tier is ~$0.075 per million characters

Install & Run

# Clone the repository
git clone https://github.com/xicv/recogniz.ing.git
cd recogniz.ing

# Install dependencies
flutter pub get

# Run on your platform
flutter run -d macos      # macOS (recommended)
flutter run -d ios        # iOS Simulator
flutter run -d android    # Android device/emulator
flutter run -d windows    # Windows
flutter run -d linux      # Linux
flutter run -d web        # Web browser (limited features)

Using the Makefile

The project includes a Makefile with 40+ commands for common tasks:

make get              # Install Flutter dependencies
make run-macos        # Run on macOS
make dev              # get + analyze + format + test
make quick-run        # get + run-macos (fastest start)

First Time Setup

Getting started takes about 2 minutes:

Step Action
1 Launch Recogniz.ing
2 Go to Settings (press Cmd/Ctrl+5)
3 Enter your Gemini API Key (add multiple keys for automatic failover!)
4 (Optional) Customize prompts and vocabulary
5 Go back to Dashboard to see your usage stats
6 Tap the microphone button or press Ctrl+Shift+R (configurable in Settings)
7 Speak, then tap again to stop and transcribe

Features at a Glance

Smart Voice Recording

Feature Description
Silero VAD ML-based voice activity detection (~95% accuracy)
Graceful Fallback Amplitude-based VAD (~75%) when ML unavailable
Global Hotkey Ctrl+Shift+R anywhere on your system (configurable in Settings)
Push-to-Talk Hold a modifier key (Right Cmd, Right Option, or Fn) to record, release to transcribe (macOS)
Auto-Inject Automatically pastes transcription into the focused text field (macOS)
Smart Audio Format Auto-selects format based on recording duration
Background Processing Smooth UI even during intensive audio analysis

AI-Powered Transcription

Powered by Google Gemini 3 Flash—Google's fastest AI model:

  • Token-efficient prompts — 67% less overhead
  • Auto-retry — Handles transient API errors automatically
  • Model selection — Choose your Gemini model via dropdown (Gemini 3 Flash default)
  • Multi-language — 20+ languages with auto-detection
  • Code-switching — Preserves mixed-language speech naturally
  • Long-form support — Up to ~3.5 hours per request

Your Words, Your Way

Customization Options
Prompts 6 built-in (Clean, Formal, Bullets, Email, Meeting Notes, Social Media) + custom
Vocabulary 6 sets (General, Technology, Business, Medical, Legal, Finance) + custom
Language Auto-detect or choose from 20+ languages
Audio Quality Auto (smart), Compact (AAC), or Full (PCM)

Dashboard & Analytics

  • Per-key usage tracking — Today's requests, tokens, and words per API key
  • Multi-API key management — Smart failover cascades through all available keys with automatic rotation on rate limit
  • Per-key usage attribution — Each transcription is tagged to the key that produced it
  • Real-time stats — Dashboard updates instantly after each transcription
  • Editable transcriptions — Fix mistakes, save changes
  • Favorites — Star important transcriptions for quick access

Privacy First

What We Promise Details
Local storage All data stays on your device
No account required Use immediately after setup
No telemetry We don't collect or send user data
API only Audio sent only to Google for transcription, not stored
Open source Code is transparent and auditable

Modern UI/UX

  • Premium icon with deep blue-teal-emerald gradient and warm coral recording dot
  • Microphone menu bar icon with template image support (auto light/dark mode) and recording state indicator
  • Material Design 3 with dynamic theming
  • Collapsible left drawer navigation
  • Light/Dark theme with system detection
  • WCAG AAA compliant (7:1 contrast ratio)
  • Keyboard shortcuts for power users

How It Works

Transcription Flow

flowchart LR
    A[Press Record<br/>Ctrl+Shift+R / PTT Key] --> B[Silero VAD<br/>Voice Activity Detection]
    B --> C[Audio Validation<br/>RMS Analysis in Isolate]
    C --> D{Quality OK?}
    D -->|No| E[Discard<br/>Show Feedback]
    D -->|Yes| F[Apply User<br/>Prompts & Vocabulary]
    F --> G[Send to Gemini 3 Flash<br/>with Language Detection]
    G --> H[Receive Transcription]
    H --> I[Save to Local Storage<br/>Hive Database]
    I --> J[Auto-Inject or<br/>Copy to Clipboard]

    style A fill:#10B981,stroke:#059669,color:#fff
    style B fill:#3B82F6,stroke:#2563EB,color:#fff
    style G fill:#8B5CF6,stroke:#7C3AED,color:#fff
    style I fill:#F59E0B,stroke:#D97706,color:#fff
    style J fill:#10B981,stroke:#059669,color:#fff
Loading

Architecture Overview

graph TB
    subgraph "Presentation Layer"
        UI[Features & Widgets]
        Drawer[App Shell & Navigation]
    end

    subgraph "State Management"
        Riverpod[Riverpod Providers]
    end

    subgraph "Business Logic"
        UseCases[Use Cases]
        Services[Core Services]
    end

    subgraph "Data Layer"
        Models[Hive Models]
        Storage[Local Storage]
    end

    subgraph "External"
        Gemini[Gemini API]
        VAD[Silero VAD]
    end

    UI --> Riverpod
    Drawer --> Riverpod
    Riverpod --> UseCases
    UseCases --> Services
    Services --> Models
    Services --> Storage
    Services --> Gemini
    Services --> VAD

    style UI fill:#E0F2FE,stroke:#0284C7
    style Riverpod fill:#FCE7F3,stroke:#DB2777
    style UseCases fill:#DCFCE7,stroke:#16A34A
    style Services fill:#FEF3C7,stroke:#D97706
    style Models fill:#E0E7FF,stroke:#4F46E5
    style Gemini fill:#F3E8FF,stroke:#9333EA
Loading

State Management Flow

flowchart LR
    A[Widget] -->|watch| B[Provider]
    B -->|provides| C[Service/UseCase]
    C -->|notifies| B
    B -->|updates| A

    B -->|read| D[Hive Storage]
    C -->|calls| E[Gemini API]
    E -->|returns| C

    style A fill:#E0F2FE,stroke:#0284C7
    style B fill:#FCE7F3,stroke:#DB2777
    style C fill:#DCFCE7,stroke:#16A34A
    style E fill:#F3E8FF,stroke:#9333EA
Loading

Project Structure

The project follows Clean Architecture principles:

lib/
├── core/                    # Business logic and infrastructure
│   ├── constants/          # App-wide constants and UI dimensions
│   ├── config/             # Type-safe configuration classes
│   ├── error/              # Enhanced error handling system
│   ├── interfaces/         # Service interfaces (VadServiceInterface, etc.)
│   ├── models/             # Data models with Hive serialization
│   ├── providers/          # Riverpod state providers
│   ├── services/           # External integrations (Gemini, VAD, Audio...)
│   ├── theme/              # Material Design 3 theming
│   ├── use_cases/          # Business use cases
│   └── utils/              # Helper utilities
│
├── features/               # Feature-based organization
│   ├── app_shell.dart      # Main app container with navigation
│   ├── dashboard/          # Statistics and transcription history
│   ├── dictionaries/       # Vocabulary management
│   ├── prompts/            # AI prompt templates
│   ├── recording/          # Voice recording UI with VAD
│   ├── settings/           # Settings management
│   └── transcriptions/     # Transcription cards and display
│
└── widgets/                # Reusable UI components
    ├── navigation/         # Navigation drawer
    └── shared/             # Cross-feature components

Note: The Flutter project root IS the repository root. Run all Flutter commands from /recogniz.ing/, not from a subdirectory.


Keyboard Shortcuts

Shortcut Action
Ctrl+Shift+R Start/Stop recording (configurable in Settings)
Hold Right Cmd Push-to-talk: hold to record, release to transcribe (macOS, configurable)
Cmd/Ctrl+S Save edited transcription
Cmd/Ctrl+1 Go to Transcriptions
Cmd/Ctrl+2 Go to Dashboard
Cmd/Ctrl+3 Go to Dictionaries
Cmd/Ctrl+4 Go to Prompts
Cmd/Ctrl+5 Go to Settings

Development

Code Quality

make analyze          # flutter analyze
make format           # flutter format .
make test             # Run all tests
make test-coverage    # Run tests with coverage
make test-single TEST=test/widget_test.dart  # Run specific test

Code Generation

make generate         # build_runner with delete-conflicting-outputs
# Run after modifying any model files with @HiveType annotations

Version Management

make version          # Show current version
make sync-version     # Sync pubspec.yaml from CHANGELOG.json (SSOT)
make changelog        # Generate CHANGELOG.md from CHANGELOG.json
make verify-changelog # Verify changelogs are in sync
make bump-patch       # Bump patch version (1.0.0 → 1.0.1)
make bump-minor       # Bump minor version (1.0.0 → 1.1.0)
make bump-major       # Bump major version (1.0.0 → 2.0.0)
make release          # Bump patch + deploy all platforms

Landing Page

The repository includes a Vue 3 + Vite + TailwindCSS landing page that deploys to recogniz.ing.

Deployment Architecture

xicv/recogniz.ing (Single Repository)
├── .github/workflows/
│   ├── release-all-platforms.yml  # Builds app, creates releases
│   ├── build-windows.yml          # Windows-specific builds
│   └── landing-deploy.yml         # Deploys to GitHub Pages
└── landing/                        # Vue 3 landing page
    ├── src/
    ├── public/downloads/
    │   └── manifest.json           # Version manifest
    ├── public/.nojekyll            # Required for GitHub Pages + Vite
    └── package.json

Tech Stack: Vue 3.5, Vite 6.0, TailwindCSS 3.4, TypeScript 5.5, PWA

Automated Release Flow

  1. Tag Push → Push version tag (v1.15.1)
  2. Parallel Builds → GitHub Actions builds all platforms
  3. GitHub Release → Creates release with artifacts
  4. Update Manifest → Updates download manifest
  5. Deploy Landing → Triggers landing page deployment

Landing Page Development

cd landing
npm install
npm run dev     # Start dev server
npm run build   # Build for production

Changelog

This project uses a JSON-first changelog system where CHANGELOG.json is the source of truth and CHANGELOG.md is auto-generated.

Why JSON?

Benefit Explanation
Programmatic Access Easy to parse and render in UI components
Validation Schema can be validated, reducing errors
Automation CI/CD can easily read and process data
Single Source Edit one file, generate the other

Workflow

# 1. Bump version with entry template
make bump-patch-entry

# 2. Edit CHANGELOG.json with actual changes

# 3. Generate Markdown from JSON
make changelog

# 4. Commit both files together
git add CHANGELOG.json CHANGELOG.md pubspec.yaml
git commit -m "chore: bump version to X.Y.Z and update changelog"

For detailed version history, see CHANGELOG.md.


Troubleshooting

Problem Solution
"Microphone permission denied" • macOS: System Preferences → Security & Privacy → Privacy → Microphone
• iOS: Settings → Recogniz.ing → Microphone
• Android: Settings → Apps → Recogniz.ing → Permissions
"API key invalid" • Ensure you copied the full API key from Google AI Studio
• Check that your API key has Gemini API access enabled
• Verify network connectivity
"Global hotkey not working" • macOS: Click "Open System Settings" button in the accessibility banner — permission is auto-detected (no restart required)
• Check for conflicting hotkeys in system settings
"Push-to-talk / auto-inject not working" • macOS: Requires Accessibility permission (same as global hotkey)
• Enable PTT in Settings → Push-to-Talk section
• Auto-inject requires a focused text field in another app
"Transcription is empty" • Ensure audio was captured (check recording duration)
• Verify vocabulary doesn't interfere with common words
• Check network connection to Gemini API

License

MIT License


Made with by @xicv

WebsiteGitHubReport Issue