Add multi-DB assistant with auto-monitor, analyse, session monitor, SQL tuning & snapshot compare#15
Open
devin-ai-integration[bot] wants to merge 19 commits into
Open
Conversation
- app.py: Main CLI loop with rich terminal output, argument parsing - llm_client.py: Ollama API client for LLM communication - mcp_client.py: MCP PostgreSQL server client for query execution - sql_generator.py: Prompt engineering, SQL extraction, and safety validation - requirements.txt: Python dependencies (requests, rich) - README.md: Architecture docs, usage examples, installation instructions Features: - Natural language to SQL via Ollama (codellama model) - Schema-aware prompt engineering - SQL safety enforcement (SELECT-only, blocks dangerous keywords) - Retry logic for failed SQL generation - Rich formatted output with timing metrics - Interactive CLI commands (help, schema, clear, exit)
Contributor
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
…ofiles - Replace CLI (app.py) with Streamlit web UI - Replace MCP client with direct PostgreSQL connection via psycopg2 (db_client.py) - Add connection profile manager for save/load DB configs (profile_manager.py) - Update requirements.txt with streamlit, psycopg2-binary, pandas - Update README with new architecture and usage docs - Keep llm_client.py and sql_generator.py unchanged
- Refactor db_client.py with abstract BaseDBClient, PostgreSQLClient, OracleClient - Add oracledb driver support (thin mode, no Oracle Client needed) - Add db_type dropdown in profile manager and connection sidebar - Add auto_monitor.py: periodic tablespace monitoring, auto-extend datafiles (max 20GB/file) - Add auto_analyse.py: AWR/pg_stat_statements analysis with LLM summary + action plan - Update sql_generator.py for dual-DB SQL dialects - Update Streamlit UI with Auto Monitor and Auto Analyse tabs - Update requirements.txt with oracledb dependency - Update README.md with new architecture and features
…n UI - Default timeout increased from 120s to 300s (first model load is slow) - Added timeout slider (60-600s) in Ollama Settings sidebar - Improved timeout error message with troubleshooting hint
- Update Oracle system prompt to use ROWNUM instead of FETCH FIRST/OFFSET (compatible with Oracle 11g+, fixes ORA-00933) - Increase MAX_RETRIES from 2 to 3 for SQL generation - Add auto-retry in Query tab: when a query fails with a DB error, the error is fed back to the LLM to regenerate corrected SQL automatically - Explicit Oracle syntax guidance: NVL, DUAL, TO_DATE, subquery for ORDER BY + ROWNUM
- Oracle: AWR snap ID range selector (queries DBA_HIST_SNAPSHOT, collects DBA_HIST_SQLSTAT/SYSTEM_EVENT/SYSSTAT for selected range) - PostgreSQL: pgProfile sample ID range selector (queries profile.samples, collects profile.stmt_list/wait_sampling_total for selected range) - PostgreSQL: latest pg_stat_statements one-click analysis with extension check - Both: file upload for AWR HTML/text, pg_stat_statements CSV, pgProfile reports - Auto Analyse tab now has radio button mode selector per DB type - Parsed report text shown in expander when no raw data available
Oracle's oracledb driver returns column names in UPPERCASE by default. Normalize to lowercase in OracleClient.execute_query() so all downstream code (AWR snap selector, auto_analyse, etc.) can use lowercase keys consistently.
- Oracle: collect top CPU SQL (v$sql by cpu_time), full table scans (v$sql_plan TABLE ACCESS FULL), existing indexes (all_indexes + all_ind_columns with LISTAGG), stale stats (all_tab_statistics), and execution plans (v$sql_plan detail for top 5 sql_ids) - PostgreSQL: collect top CPU queries (pg_stat_statements with blk_read_time/temp_blks), seq scan tables (pg_stat_user_tables with avg rows per scan), existing indexes (pg_indexes with DDL), stale stats/vacuum (dead tuples, last_analyze), lock waits (pg_stat_activity) - Rewrote LLM system prompt to require SQL-ID-specific analysis: high-CPU SQL with exact sql_id/queryid, full table scan tables with causing sql_id, missing index CREATE statements referencing the queryid that benefits, stale stats with ANALYZE/DBMS_STATS commands, unused index DROP statements, and numbered action plan with exact SQL commands and expected improvement
Session/Lock Monitor (session_monitor.py): - Active sessions view (v$session / pg_stat_activity) - Blocking lock tree with recursive hierarchy (CONNECT BY for Oracle, recursive CTE for PostgreSQL) - Lock details (v$lock / pg_locks with object names) - Long-running queries (>5s threshold) - Wait event chains - Kill/cancel session UI (ALTER SYSTEM KILL SESSION for Oracle, pg_cancel_backend/pg_terminate_backend for PostgreSQL) SQL Tuning Advisor (sql_tuning_advisor.py): - Paste any SQL, runs EXPLAIN PLAN (Oracle) or EXPLAIN (PostgreSQL) - Extracts tables from plan, collects per-table metadata: column stats, existing indexes, table stats, clustering factor - PostgreSQL: optional EXPLAIN ANALYZE with actual execution stats - LLM prompt requires step-by-step plan analysis, root cause, specific CREATE INDEX statements, SQL rewrite suggestions, stats maintenance commands, and numbered action plan Updated app.py with two new tabs in the UI.
…ysis, exclude system queries, 500-char SQL text
…ueries (PG 17+ compat)
…olders, add data-grounding instructions
…ead of system prompt (codellama is a completion model, not instruction-following)
… Python code now identifies all issues (high elapsed SQL, full table scans, sequence caching, stale stats, unused indexes, etc.) with real sql_ids, table names, and query text - LLM only provides a brief supplementary summary of pre-identified findings - Same hybrid approach applied to snapshot comparison
…LLM summary - Add top_cpu_queries/top_cpu_sql section (most important - always shows top SQL) - Add top_queries/top_elapsed_sql section (deduped from CPU section) - Add database_stats overview (cache hit ratio, connections, temp usage) - Add connection_stats section (idle connection detection) - Add Oracle system_stats with cache hit ratio, hard parse ratio, disk sorts - Add Oracle SGA configuration, tablespace I/O, redo log switches, temp usage - Add Oracle execution plans display with full scan/hash join detection - Add Oracle parallel queries section - Add pgProfile wait events section - Add table_stats (top tables by activity) section - Add AWR/pgProfile fallback for top SQL sections - Remove LLM summary entirely (codellama keeps hallucinating generic advice) - Update app.py labels: 'Performance Analysis Report' instead of 'AI Analysis' - All analysis is now 100% programmatic from real DB data
…, config review, prioritised actions
…le, AWR, pg_stat_statements)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a Python tool under
tools/pg-assistant/that converts natural language questions into SQL queries using a local Ollama LLM and executes them against PostgreSQL or Oracle databases via a Streamlit web UI. Includes automated tablespace monitoring with auto-extend, fully programmatic performance analysis, live session/lock monitoring, an AI-powered SQL tuning advisor, and side-by-side snapshot comparison with Plotly visualizations.Modules:
app.py— Streamlit web UI with sidebar for connection/profile management, tabbed interface (Query, Schema, Auto Monitor, Auto Analyse, Sessions & Locks, SQL Tuning Advisor, Compare Snapshots, History)db_client.py— AbstractBaseDBClientwithPostgreSQLClient(psycopg2) andOracleClient(oracledb thin mode) implementations, pluscreate_db_client()factoryllm_client.py— Ollama REST API client (/api/generate)sql_generator.py— Prompt engineering with schema injection, dual-DB system prompts (PostgreSQL/Oracle SQL dialects), SQL extraction, keyword-based safety validation, retry logicprofile_manager.py— Save/load/delete connection profiles as JSON (~/.pg-assistant/profiles.json) withdb_typeandservice_namefieldsauto_monitor.py—TablespaceMonitorclass: periodic tablespace usage checks (configurable interval, default 1hr), auto-extend Oracle datafiles up to 20 GB per file, PostgreSQL storage size reportingauto_analyse.py—PerformanceAnalyserclass: live V$/pg_stat_* collection, AWR snap-ID range analysis (Oracle), pgProfile sample-ID range analysis (PostgreSQL), latest pg_stat_statements snapshot, uploaded report file parsing (AWR HTML/text, CSV, pgProfile). Analysis is 100% programmatic — Python code extracts real findings from DB data with specific SQL IDs, table names, query text, and exact fix commands. No LLM involved in analysis (codellama was hallucinating generic advice). Includes best-practice checks for row contention, sequence caching, high elapsed time, full table scans, high execution count, temp usage, and 30+ other sections.session_monitor.py—SessionMonitorclass: active sessions, blocking lock tree, lock details, long-running queries, wait events, and kill/cancel session for both Oracle and PostgreSQLsql_tuning_advisor.py—SQLTuningAdvisorclass: EXPLAIN PLAN execution, per-table metadata collection (columns, indexes, statistics), and LLM-powered tuning recommendations with specific index/rewrite/maintenance suggestionssnapshot_compare.py—SnapshotComparatorclass: compare two AWR (Oracle) or pgProfile (PostgreSQL) snapshot ranges, compute delta metrics, generate Plotly bar/pie charts for visual comparison, and produce programmatic differential analysis (no LLM)requirements.txt—requests,psycopg2-binary,oracledb,streamlit,pandas,plotlyREADME.md— Architecture, usage, installation docsKey behaviors:
information_schemafor PG,ALL_TAB_COLUMNSfor Oracle) and injects it into every LLM promptDROP,DELETE,UPDATE, etc.) and enforces SELECT/WITH-only queries in the natural language pathALTER TABLESPACE,ALTER DATABASE DATAFILE)db_typefieldOracleClient.execute_query()for consistent downstream access_build_findings_report()covers 30+ data sections — all extracted from real DB data with specific SQL IDs, table names, query text, and exact fix commandsUpdates since last revision
Restructured analysis output to enterprise DBA / Copilot-quality format:
auto_analyse.py: Complete rewrite of_build_findings_report()to produce structured, severity-grouped output:pg_settings/v$parameterwith risk flags (e.g.statement_timeout=0, highmax_connections)pg_stat_wal(WAL volume, FPI, sync time — PG 14+),pg_total_relation_sizewith TOAST breakdown, idle-in-transaction sessions,pg_settingsconfiguration parameters,pg_stat_replicationlagv$parameterconfiguration,v$sessionidle sessions (> 5 min)snapshot_compare.py: Removed dead LLM code (_format_comparison_text()and_get_llm_comparison()methods)app.py: Updated all spinner text and button labels to remove "LLM" references from analysis paths (analysis is fully programmatic, LLM is only used for SQL generation and SQL Tuning Advisor)Review & Testing Checklist for Human
rufflint/format checks only. The Streamlit UI, database connections, all eight tabs, and all Oracle/PostgreSQL query paths have not been run against actual Ollama, PostgreSQL, or Oracle services. This is the highest-risk item._build_findings_report()is ~850 lines of untested analysis logic — Accesses specific dictionary keys from query results (e.g.,row.get("sql_id"),row.get("xact_duration_sec")). If queries return different key names or column structures, bottlenecks will silently be empty. Hardcoded thresholds (cache hit < 95%, rollback rate > 10% = SEV-1, seq scans > 100, etc.) may not suit all environments. Critically verify the output contains real SQL IDs, real table names, real metrics from your database — not empty sections.pg_stat_walis PG 14+ only (version check exists), butpg_stat_replicationrequires replication privileges,v$parameterandv$sessionrequire DBA-level access on Oracle. Verify all new queries work with your user's grant level.snapshot_compare.pyis ~950 lines of untested code — Contains complex SQL queries for Oracle AWR and pgProfile, delta computation logic, and Plotly chart generation. Column names, join conditions, and WHERE filters have not been validated against a real database.ALTER SYSTEM KILL SESSION(Oracle) andpg_terminate_backend(PostgreSQL) behind a single button click. There is a warning label but no "Are you sure?" confirmation step.EXPLAIN ANALYZEactually executes the query — A user could paste a write statement (INSERT/UPDATE/DELETE) andEXPLAIN ANALYZEon PostgreSQL would run it. The UI has a caution warning but no SQL validation on this path.auto_monitor.pyrunsALTER DATABASE DATAFILE ... AUTOEXTEND ONandALTER TABLESPACE ... ADD DATAFILEautomatically when thresholds are exceeded.profile_manager.py— database passwords are saved as plaintext JSON in~/.pg-assistant/profiles.json.auto_analyse.pyandsnapshot_compare.pyassume pgProfile tables (profile.samples,profile.stmt_list,profile.sample_statements,profile.wait_sampling_total) with specific column names. pgProfile's schema varies by version.Suggested test plan: Run
streamlit run app.pywith Ollama (codellama) running and both a reachable PostgreSQL and Oracle instance. Verify:Notes
from db_client import ...) require running from thetools/pg-assistant/directory. Will break if invoked from elsewhere or installed as a package.oracledbthin mode does not require Oracle Client installation but may not support all Oracle features (e.g. Advanced Queuing, Continuous Query Notification).psycopg2connection objects stored in Streamlitsession_statemay not survive all rerun edge cases; theis_connectedproperty mitigates this with a health-check query but manual reconnection may occasionally be needed.README.mdarchitecture diagram does not yet reflect the Session Monitor, SQL Tuning Advisor, or Compare Snapshots modules..replace()/.format()for placeholders. Values originate from DB query results (not direct user input) but are passed through Streamlit selectbox → integer → string interpolation. Similarly,session_monitor.pyOracle kill session uses.format(sid=sid, serial=serial)— values come fromst.number_input(integer-constrained) but the module itself doesn't validate types.pg_monitorrole or superuser. Users with limited grants will get permission errors.LLMClientis still imported and passed toPerformanceAnalyserandSnapshotComparatorconstructors for API compatibility, even though neither class uses it for analysis anymore. Minor tech debt — could be cleaned up in a follow-up.Link to Devin session: https://partner-workshops.devinenterprise.com/sessions/75db244b07ca4a3db4c6563dafd2cafc