Complete codebase audit and improvements#28
Merged
Conversation
Complete line-by-line audit of all 8 shell scripts (~20,869 lines) for 24/7/365 field deployment in wildlife audio recording. Report includes: - 3 Critical issues (watchdog, local buffering, API validation) - 6 High priority issues (locking, timeouts, disk space, USB handling) - 7 Medium priority issues (log rotation, config validation, etc.) - Security analysis with current measures and recommendations - Test coverage analysis with recommended additions - Documentation gaps identified - Anti-patterns and best practices assessment - Deployment checklist and maintenance schedule Overall assessment: Excellent production-hardening practices with comprehensive error handling. Issues identified are refinements rather than fundamental flaws.
Major production reliability improvements for 24/7 field deployment: MONITORING ENHANCEMENTS: - Add watchdog/heartbeat mechanism with optional hardware watchdog support (graceful fallback if hardware not available) - Add disk space monitoring with configurable thresholds (80% warn, 95% critical) - Add memory usage trending with leak detection (tracks growth over time) - Add network connectivity monitoring with gateway ping check - Add entropy monitoring to quick diagnostics (fixes stream setup delays) API COMPATIBILITY: - Add MediaMTX API version auto-detection (tries v3, v2, v1, legacy) - Graceful fallback for different MediaMTX versions USB DEVICE HANDLING: - Add ALSA device availability check before stream restart - Prevents restart loops when USB device is physically disconnected - Grace period for device reconnection RELIABILITY FIXES: - Enhanced signal handling (SIGPIPE ignored, SIGHUP config reload, SIGUSR2 heartbeat) - Configuration validation on startup with syntax checking - Version compatibility checking between scripts - Lock directory validation before file creation NEW CONFIGURATION OPTIONS: - HEARTBEAT_INTERVAL, ENABLE_HARDWARE_WATCHDOG - DISK_SPACE_WARNING_PERCENT, DISK_SPACE_CRITICAL_PERCENT - MEM_WARNING_PERCENT, MEM_GROWTH_THRESHOLD_MB - NETWORK_CHECK_ENABLED, NETWORK_CHECK_TARGET - USB_ALSA_CHECK_ENABLED, USB_DISCONNECT_GRACE_PERIOD - MEDIAMTX_API_VERSION, MEDIAMTX_API_FALLBACK All changes are backward compatible with existing deployments.
Audio Buffering Features: - Enable memory buffering by default via FFmpeg's -rtbufsize (32MB) - Uses RAM-based buffering to reduce data loss during restarts - No SD card wear - all buffering is in memory Local Recording Feature (optional): - Optional ring buffer recording alongside streaming - Defaults to /dev/shm (tmpfs/RAM) for SD card wear protection - Configurable segment duration (default: 5 min) and count (12 segments = 1 hour) - Uses FFmpeg's tee muxer for simultaneous stream + record - Optional disk persistence for long-term archival (AUDIO_DISK_PERSIST) Configuration Options: - AUDIO_BUFFER_ENABLED=true (default) - Enable memory buffering - AUDIO_RTBUFSIZE=33554432 - Real-time buffer size (32MB default) - AUDIO_LOCAL_RECORDING=false - Enable ring buffer recording - AUDIO_RECORDING_PATH=/dev/shm/lyrebird-buffer - Buffer location (RAM) - AUDIO_RECORDING_SEGMENT_TIME=300 - Segment duration (seconds) - AUDIO_RECORDING_SEGMENTS=12 - Number of segments to keep - AUDIO_DISK_PERSIST=false - Archive segments to disk (SD card wear!) - AUDIO_DISK_PATH=/var/lib/mediamtx-ffmpeg/recordings This addresses the concern about lost audio during stream restarts while protecting SD cards from excessive write wear.
SECURITY ENHANCEMENTS: - Add library integrity verification (LYREBIRD_COMMON_EXPECTED_HASH) - Checksum validation before sourcing lyrebird-common.sh - Security Hardening section in README with: - MediaMTX authentication configuration - Firewall rules examples - Interface binding recommendations MONITORING IMPROVEMENTS: - Add audio level monitoring for dead/silent mic detection - Uses FFmpeg volumedetect filter to check audio levels - Configurable silence threshold and warning duration - Tracks silence duration across monitor cycles DOCUMENTATION ADDITIONS: - Recovery Procedures section in README - Quick recovery steps for common issues - Complete system recovery procedure - Rollback instructions - Emergency field recovery (no internet) LOG IMPROVEMENTS: - Add timezone indicator to all log timestamps - Format: YYYY-MM-DD HH:MM:SS TZ TESTING: - Add tests/ directory with bats-core unit tests - test_lyrebird_common.bats: Tests for shared library - test_stream_manager.bats: Tests for stream manager functions - Tests for version comparison, PID validation, sanitization NEW CONFIGURATION OPTIONS: - LYREBIRD_COMMON_EXPECTED_HASH - Library integrity verification - AUDIO_LEVEL_CHECK_ENABLED - Enable silence detection - AUDIO_LEVEL_SAMPLE_DURATION - Seconds to sample - AUDIO_SILENCE_THRESHOLD_DB - dB threshold for silence - AUDIO_SILENCE_WARN_DURATION - Seconds before warning
- SC2034: SCRIPT_COMPAT_VERSION reserved for external version checking - SC2120: validate_config intentionally accepts optional parameter - SC1090: Dynamic config file sourcing is validated before use
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.