Skip to content

Add K8s Agent: Streamlit-based on-prem Kubernetes cluster management UI#16

Open
devin-ai-integration[bot] wants to merge 31 commits into
mainfrom
devin/1775468395-k8s-agent-streamlit
Open

Add K8s Agent: Streamlit-based on-prem Kubernetes cluster management UI#16
devin-ai-integration[bot] wants to merge 31 commits into
mainfrom
devin/1775468395-k8s-agent-streamlit

Conversation

@devin-ai-integration

@devin-ai-integration devin-ai-integration Bot commented Apr 6, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a new k8s-agent/ Streamlit application for managing on-premises Kubernetes clusters. The tool provides a web UI with the following capabilities:

  • Profile Manager — CRUD for cluster profiles storing node definitions (control-plane/worker), SSH credentials, K8s/CRI-O versions, network config, custom storage paths, and proxy settings. Supports importing existing clusters via kubeconfig upload. Profiles are persisted as JSON files on disk with 0600 permissions to protect embedded kubeconfig credentials.
  • Cluster Creation — Generates and executes bash scripts over SSH to provision kubeadm-based clusters with CRI-O runtime and Flannel CNI. Users input which nodes are control-plane vs. worker; the agent SSHes into each to install packages, init the control plane, join workers, and apply hardening (NetworkPolicies, RBAC, ResourceQuotas, PodSecurity). CRI-O storage, kubelet data directory, and log paths are configurable to use dedicated disks instead of default /var/lib. Proxy/alternate proxy settings are injected into the master node environment during provisioning. Includes a Reset Cluster tab to tear down and optionally re-provision.
  • Cluster Debugger — Pre-built and custom diagnostic commands executed via SSH (provisioned clusters) or kubectl (imported clusters), with optional LLM-powered root cause analysis.
  • Monitoring Setup — One-click Helm-based deployment of kube-prometheus-stack, Grafana dashboard imports, PrometheusRule alert definitions, and metrics-server / kube-state-metrics installation. Works with both provisioned and imported clusters.
  • Log Analysis — Multi-source log collection (kubelet, CRI-O, API server, etcd, Flannel, CoreDNS), error pattern extraction, cross-source correlation with 30-second temporal window, optional LLM-powered analysis, and Smart Log Analysis (LogAI-inspired ML pipeline — see below). SSH-only sources (Kubelet, CRI-O) are automatically filtered out for imported clusters. Pod Logs tab includes pod and container dropdowns fetched live from the cluster. Istio/Envoy access log analysis with response time analytics (see below).
  • Resource Viewer — Browse live cluster resources: Pods, ConfigMaps, Deployments, Services, DestinationRules, Ingresses, DaemonSets, StatefulSets, and more. Includes Deployment Scaling, Pod Shell, Resource Requests/Limits, Node Containers (with pod distribution per node), RBAC Viewer, Node Health Overview, Events Timeline, Pod Restart Tracker, and PVC/Storage Dashboard.
  • Multi-Cluster Dashboard — Home page showing all profiles at a glance with status, node counts, health checks, and quick actions.
  • Certificate Manager — View cluster certificate expiration (kubeadm certs + TLS secrets), cert-manager status, and renewal guide.
  • Cost Optimizer — Analyze resource usage vs requests/limits, right-sizing recommendations, and idle resource detection.
  • Upgrade Planner — Plan and review Kubernetes version upgrades with pre-upgrade checks and step-by-step guidance.
  • AI Assistant — Streaming chat interface for Kubernetes Q&A. Supports OpenAI-compatible endpoints and local Ollama instances (no API key required).

All remote operations go through subprocess-based SSH (provisioned clusters) or local kubectl with kubeconfig (imported clusters). LLM calls support both OpenAI-compatible and Ollama chat APIs.

Updates since last revision

  • New: Ollama LLM support — The LLM integration now supports two providers, selectable from the sidebar LLM Settings panel:

    • OpenAI-compatible (default) — Any endpoint that speaks the OpenAI chat completions API (e.g. Infosys AI Gateway). Requires API URL + API key.
    • Ollama (local) — Connect to a local Ollama instance (default: http://10.73.98.113:11434). No API key required.

    Ollama-specific features:

    • "Fetch available models" button queries GET /api/tags and populates a model dropdown
    • Streaming uses Ollama's newline-delimited JSON format (not OpenAI SSE)
    • Non-streaming uses Ollama's POST /api/chat with {"message": {"content": "..."}} response format
    • Temperature is passed via options.temperature; max tokens via options.num_predict
    • Provider selection and Ollama URL/model are applied at runtime (mutates config.LLM_PROVIDER, config.OLLAMA_BASE_URL, config.OLLAMA_MODEL module-level variables)
    • All "LLM not configured" messages across the app now mention both providers

    Configuration via env vars: LLM_PROVIDER=ollama, OLLAMA_BASE_URL=http://..., OLLAMA_MODEL=llama3. Or configure entirely via the sidebar UI.

    Files changed: config.py (new OLLAMA_BASE_URL, OLLAMA_MODEL, get_active_llm_url(), get_active_model()), modules/llm_client.py (refactored into _build_messages/_build_headers/_build_payload helpers, provider-aware query_llm/stream_llm, new list_ollama_models()), app.py (sidebar LLM Settings panel redesigned with provider toggle).

Previous updates

  • Fix: profile switching broken (Set Active button + sidebar dropdown) — The previous fix (deleting the profile_selector widget key before st.rerun()) was insufficient because Streamlit ignores the index parameter when a widget key already exists in session state — it reads the stale value from session_state[key] instead. The new fix:

    1. Pre-syncs session_state["profile_selector"] with active_profile before the selectbox widget is instantiated, so Streamlit reads the correct value.
    2. Adds an on_change callback to keep active_profile in sync when the user changes the dropdown directly.
    3. Deletes the profile_selector key in all code paths that modify active_profile (Set Active button, Create Profile, Import Cluster, Delete Profile) — the previous fix only covered the Set Active button.
    4. Handles edge cases: first render (key doesn't exist yet), deleted profiles (stale key value not in options list).

    Tested locally by creating two profiles and swapping between them via both the sidebar dropdown and the "Set Active" button in Manage Profiles — see recording below.

    Profile switching test

View original video (rec-9a37d52576914ae9aab09d68fb7260e5-edited.mp4)

  • "Collect pod logs" mode in Smart Log Analysis — Smart Log Analysis now has three modes: "Collect from cluster" (system-level logs), "Collect pod logs", and "Paste logs". The new mode provides a guided workflow to analyze Istio sidecar access logs or any pod's logs:

    • Select namespace (dropdown if namespaces are available, text input fallback)
    • Click "Load Pods" to fetch all pods in that namespace
    • Select pod from dropdown (shows pod name + status)
    • Select container from dropdown — istio-proxy is pre-selected when present, making Istio access log analysis one-click
    • Choose time range (15m / 1h / 6h / 24h) and line count (100–10,000)
    • Click "Fetch Pod Logs & Analyze" to collect logs and feed them into the Smart Analysis pipeline
    • Pipeline auto-detects Istio/Envoy access logs and shows response time percentiles, status code distribution, per-path breakdown, slow requests, and timeline charts
  • Fix: "Set Active" profile button crash — The previous fix (directly assigning st.session_state.profile_selector = profile.name) caused a StreamlitAPIException because Streamlit prohibits modifying widget state after the widget is instantiated. New fix: the button now deletes the profile_selector widget key from session state before calling st.rerun(), allowing the sidebar selectbox to reinitialize from the updated active_profile value via its index parameter.

  • Fix: kubeconfig path not quoted in shell commandscluster_debugger.py:_run_local_kubectl and config.py:fetch_namespaces now quote the kubeconfig path in shell commands (--kubeconfig="{path}"), consistent with cluster_creator.py. Prevents breakage if DATA_DIR contains spaces.

  • Fix: profile JSON files written with 0600 permissionssave_profile() now uses os.open() with 0o600 mode instead of default open(), restricting read access to the file owner. Profile JSON files contain embedded kubeconfig credentials for imported clusters.

  • Istio/Envoy access log analysis — Smart Log Analysis now auto-detects Istio/Envoy access logs and shows comprehensive response time analytics:

    • Latency percentiles (p50, p90, p95, p99, avg, min, max) and error rate
    • Status code distribution (table + pie chart) and status class summary (2xx/3xx/4xx/5xx)
    • Response flags distribution (Envoy-specific flags like UF, UH, DC, etc.)
    • Latency distribution histogram with percentile marker lines
    • Per-path response time breakdown (request count, latency percentiles, error rate per API path)
    • Per-upstream service statistics (upstream service time vs total duration)
    • Slow requests table (requests above p95 threshold)
    • Request timeline (per-minute bar chart with latency overlay)
    • Supports both text-format (standard Envoy) and JSON-format Istio logs
    • Auto-detection heuristic requires 30% of lines to parse successfully to avoid false positives
  • Removed Helm Releases and Network Policies tabs from Resource Viewer per user request. These tabs and all associated code have been removed.

  • Removed init containers from Resource Requests/Limits — The "Container Resource Requests & Limits" tab now only shows application containers, excluding init containers. The "Init" column has been removed from the table and TSV export.

  • Pod count per node in Node Containers tab — For imported clusters, the Node Containers tab now shows a Pod Distribution Across Nodes summary with per-node pod counts displayed as metrics, total/avg/min/max/spread statistics, and a distribution health check (warns if spread exceeds 50% of the average, indicating uneven scheduling). Each node expander also shows its pod count in the header.

  • Pod & container dropdowns in Pod Logs tab — The Pod Logs tab (Log Analysis page) now has a "Load Pods" button that fetches all pods in the selected namespace. Once loaded, the pod name input switches to a dropdown showing pod names with their status. Selecting a pod also populates a container dropdown.

  • Smart Log Analysis (LogAI-inspired) — ML-powered log analysis (TF-IDF + DBSCAN clustering, anomaly detection, Drain-style pattern mining, auto-summarization, log volume timeline). Uses scikit-learn natively.

  • Multi-Cluster Dashboard, Certificate Manager, Cost Estimator, Pod Restart Tracker, PVC/Storage Dashboard — Six advanced features added.

  • Fix: unquoted paths in rm -rf during cluster reset — All reset rm -rf commands now quote the path.

  • Fix: unsanitized profile name in kubeconfig path — Profile name is now sanitized via re.sub(r"[^\w.-]", "_", profile.name).

  • Fix: proxy /etc/environment format for pam_env — Generates plain KEY=VALUE lines (not export KEY=VALUE).

  • kubectl/helm auto-detection, namespace auto-fetch, kubeconfig import, K8s/CRI-O version 1.35, Pod Security Standard explanations, LLM made fully optional, offline manifest uploads, step-by-step SSH provisioning, enriched cluster details, flash messages, feedback messages, metrics components, deployment scaling, pod shell, resource name dropdowns, node containers (crictl), cluster reset.

Review & Testing Checklist for Human

  • Security: SSH and kubectl command injectionrun_ssh_command passes user-supplied strings (hostnames, IPs, custom commands, namespace names, storage paths, proxy URLs) directly into shell commands via f-string interpolation with no sanitization. run_custom_command allows arbitrary shell execution. The Pod Shell tab passes user-typed commands to kubectl exec. The Node Containers tab allows free-text CRI commands passed directly to run_ssh_command. While reset paths, kubeconfig paths, and profile names are now quoted/sanitized, the broader command injection surface remains.
  • Ollama integration not tested against a real instance — The Ollama provider code (/api/chat, /api/tags, NDJSON streaming) was written from Ollama API docs but has not been tested against the actual Ollama endpoint at http://10.73.98.113:11434. Response format parsing (especially streaming NDJSON vs OpenAI SSE) could fail silently or produce garbled output. The list_ollama_models function swallows all exceptions and returns an empty list, which could hide connectivity issues.
  • Ollama runtime config mutation — The sidebar mutates config.LLM_PROVIDER, config.OLLAMA_BASE_URL, and config.OLLAMA_MODEL at module level on every Streamlit rerun. This works for single-user sessions but could cause race conditions if Streamlit is serving multiple concurrent users (each rerun overwrites the same module globals). The OpenAI provider settings (LLM_API_URL, LLM_API_KEY, LLM_MODEL) are still read-only from env vars.
  • No real cluster testing — All features (provisioning, resetting, debugging, monitoring, Smart Log Analysis, Istio analysis, "Collect pod logs" mode, pod distribution, profile switching, Ollama LLM, all Resource Viewer tabs) have not been tested against a live cluster. Profile switching was tested locally with dummy profiles (no real kubeconfig). Key fragile areas: (a) Istio log regex may not match all Envoy configurations; (b) Pod count per node makes N+1 kubectl calls (one per node) which could be slow on large clusters; (c) Cost Optimizer right-sizing parses kubectl top pods --containers text by column position; (d) Smart Log Analysis TF-IDF + DBSCAN on real log volumes; (e) "Collect pod logs" mode parses get_pod_list output by whitespace column positions — may break if pod names/statuses contain unexpected formatting.
  • Profile switching widget state complexity — The sidebar selectbox now pre-syncs session_state["profile_selector"] before instantiation and uses an on_change callback. Four separate code paths (Set Active, Create Profile, Import Cluster, Delete Profile) all delete profile_selector before st.rerun(). If a future code path sets active_profile without deleting the widget key, the stale selectbox value will silently override the intended profile.
  • Security: kubeconfig stored in plaintext JSONkubeconfig_content (full kubeconfig YAML including cluster credentials) is stored in profile JSON files at data/profiles/. File permissions are now restricted to 0600, but the content itself is unencrypted.
  • Cluster reset is destructive and irreversible — The Reset Cluster tab runs kubeadm reset, rm -rf on kubelet/CRI-O/etcd data, and flushes all iptables rules. The only safeguard is typing "RESET" in a text input.

Suggested test plan:

  1. Ollama integration: Open LLM Settings in sidebar → select "Ollama (local)" → enter http://10.73.98.113:11434 → click "Fetch available models" → verify model list populates → select a model → go to AI Assistant → send a message → verify streaming response renders correctly. Then try Cluster Debugger AI analysis and Log Analysis AI analysis with Ollama to verify non-streaming query_llm also works.
  2. Provider switching: Switch between OpenAI and Ollama providers in the sidebar. Verify no widget state errors. Switch to Ollama, send a message in AI Assistant, switch back to OpenAI, verify the app doesn't crash.
  3. Import a real cluster via kubeconfig → click "Set Active" in Manage Profiles tab → verify sidebar dropdown switches to the new profile. Then switch to a different profile via the sidebar dropdown. Then switch back via "Set Active". Verify no StreamlitAPIException or stale profile state.
  4. Go to Log Analysis → Smart Log Analysis → select "Collect pod logs" → select a namespace → click "Load Pods" → verify pod dropdown populates → select a pod with an Istio sidecar → verify istio-proxy is pre-selected in container dropdown → click "Fetch Pod Logs & Analyze" → verify logs are fetched and Istio access log analysis is triggered automatically.
  5. In Smart Log Analysis, switch between all three modes ("Collect from cluster", "Collect pod logs", "Paste logs") and verify no widget state errors occur.
  6. Go to Resource Viewer → Resource Requests/Limits tab → verify init containers are NOT shown.
  7. Go to Resource Viewer → Node Containers tab → click "Show containers per node" → verify pod count metrics appear per node with distribution summary.
  8. Verify profile JSON files in data/profiles/ have 600 permissions (not world-readable).

Notes

  • The logai Python package is not imported at runtime. Smart Log Analysis reimplements key algorithms using scikit-learn directly.
  • The Istio log parser supports two formats: standard Envoy text format (regex-based) and JSON format. The auto-detection heuristic requires 30% of lines to parse successfully.
  • The _run_on_cluster() pattern is duplicated across three modules (cluster_debugger.py, monitoring_setup.py, log_analyzer.py). Consider extracting to a shared utility.
  • SSH_ONLY_COMMANDS and SSH_ONLY_LOG_SOURCES are hardcoded sets that must be updated if new commands/sources are added.
  • Profile storage is local filesystem — not suitable for multi-user production deployment without a shared backend.
  • The sidebar LLM Settings panel now actively mutates config.* module globals at runtime for Ollama settings. OpenAI settings remain env-var–based and read-only.
  • The Smart Log Analysis health_color variable is assigned but never used.
  • For imported clusters, the Node Containers tab shows pod-level info via kubectl get pods --field-selector spec.nodeName=... which is not equivalent to crictl ps -a.
  • Heavy dependencies: scikit-learn>=1.3.0, numpy>=1.24.0, and pandas>=2.0.0 significantly increase install size.
  • The import re as _re inside run_kubectl is a function-level import; consider moving to the top-level import block.
  • The Ollama default base URL (http://10.73.98.113:11434) is hardcoded in config.py. This is a user-specific internal IP that should likely be empty by default for other deployments.

Link to Devin session: https://partner-workshops.devinenterprise.com/sessions/f7a6f611aedb4141ba2e24adecaee0b3


Open with Devin

- Profile Manager: CRUD for cluster profiles with node definitions (control-plane/worker), SSH credentials
- Cluster Creation: SSH-based provisioning with CRI-O, Flannel CNI, kubeadm, best practices hardening
- Cluster Debugger: Diagnostic commands with AI-powered root cause analysis and recommendations
- Monitoring Setup: One-click Prometheus + Grafana deployment with dashboards and alerting rules
- Log Analysis: Multi-source log collection, error pattern extraction, cross-source correlation
- AI Assistant: Chat interface powered by LLM for Kubernetes questions
- Integrated with Infosys AI Gateway for LLM capabilities
@devin-ai-integration

Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

- Remove unused NodeInfo import from app.py
- Remove unused pyyaml and pandas from requirements.txt
- Add crio_root, crio_runroot, kubelet_root, log_root fields to ClusterProfile
- Add http_proxy, https_proxy, no_proxy, http_proxy_alt, https_proxy_alt fields
- Update generated scripts to configure CRI-O storage paths via crio.conf.d
- Update control-plane init script to use custom audit log dir and kubelet root
- Add proxy env vars to common setup and control-plane init scripts
- Add Storage Paths and Proxy Settings sections to Profile Manager UI
- Show storage/proxy details in Manage Profiles view and profile summary
…est uploads

- Add is_llm_configured() helper to detect when LLM is not set up
- Make all LLM imports lazy to avoid errors when LLM deps missing
- Guard all AI-powered UI features with is_llm_configured() checks
- Show informative fallback messages when LLM is not configured
- Add Offline Manifests tab for uploading Flannel YAML and other files
- Add flannel_manifest_path/prometheus_manifest_path to ClusterProfile
- SCP user-provided Flannel manifest to nodes during provisioning
- Core features (cluster creation, debugging, monitoring, logs) work without LLM
devin-ai-integration[bot]

This comment was marked as resolved.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 6 new potential issues.

View 10 additional findings in Devin Review.

Open in Devin Review

Comment thread k8s-agent/modules/cluster_creator.py Outdated
Comment thread k8s-agent/modules/cluster_creator.py
Comment thread k8s-agent/modules/cluster_creator.py
Comment thread k8s-agent/modules/cluster_creator.py
Comment thread k8s-agent/modules/cluster_creator.py
Comment thread k8s-agent/config.py
Comment on lines +119 to +120
with open(kc_path, "w") as f:
f.write(kubeconfig_content)

@devin-ai-integration devin-ai-integration Bot Apr 7, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Kubeconfig credentials written to disk with world-readable permissions

All temporary kubeconfig files are written with plain open(path, "w") which creates files using the process umask (typically 0o644 — owner rw, group r, others r). This means any user on the system can read the kubeconfig, which often contains bearer tokens or client certificates granting full cluster access.

This contrasts with profile_manager.py:85 which correctly uses os.open(path, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600) for profile files (that also embed kubeconfig_content). The same restricted-permission pattern should be used for all kubeconfig files.

All affected locations writing kubeconfig with insecure permissions
  • config.py:146-147 (fetch_namespaces)
  • modules/cluster_debugger.py:78-79 (_run_local_kubectl)
  • modules/log_analyzer.py:43-44 (_run_local_shell)
  • modules/monitoring_setup.py:39-40 (_run_local_shell)
  • modules/cluster_creator.py:1403-1404 (run_kubectl)
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 18 additional findings in Devin Review.

Open in Devin Review

Comment thread k8s-agent/modules/cluster_debugger.py Outdated
Comment on lines +77 to +80
kubeconfig_path = config.get_kubeconfig_path("_debug_temp")
with open(kubeconfig_path, "w") as f:
f.write(kubeconfig_content)
full_cmd = f"{kubectl} --kubeconfig={kubeconfig_path} {kubectl_args}"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Race condition: shared temp kubeconfig files cause operations to target wrong cluster

The _run_local_kubectl and _run_local_shell functions in the debugger, log analyzer, and monitoring modules all write kubeconfig content to fixed-name temp files (_debug_temp, _log_temp, _monitor_temp), then execute kubectl commands referencing those files. In a multi-user Streamlit deployment, concurrent sessions operating on different imported clusters will overwrite each other's kubeconfig files. If session A writes cluster-A's kubeconfig to _debug_temp.kubeconfig and session B overwrites it with cluster-B's kubeconfig before session A's kubectl command reads it, session A's command runs against cluster B. This could cause diagnostics, monitoring commands, or log collection to execute against the wrong cluster.

Affected locations
  • modules/cluster_debugger.py:77 — uses _debug_temp
  • modules/log_analyzer.py:42 — uses _log_temp
  • modules/monitoring_setup.py:38 — uses _monitor_temp
  • config.py:117 — uses _ns_fetch

Note that cluster_creator.py:1398-1400 correctly uses a per-profile filename, showing the fix pattern.

Prompt for agents
The _run_local_kubectl function in cluster_debugger.py (and the identical _run_local_shell functions in log_analyzer.py and monitoring_setup.py) writes kubeconfig content to a shared temp file with a static name (_debug_temp, _log_temp, _monitor_temp respectively), then runs kubectl referencing that file. This creates a TOCTOU race condition in multi-session Streamlit deployments.

The fix pattern already exists in cluster_creator.py:1398-1400 where run_kubectl uses a per-profile filename. Apply the same approach to all four affected locations:

1. In cluster_debugger.py _run_local_kubectl (line 77): Instead of get_kubeconfig_path('_debug_temp'), use a unique path per invocation. Options: use tempfile.NamedTemporaryFile with delete=False (and clean up after), or use a session-specific or profile-specific name.

2. In log_analyzer.py _run_local_shell (line 42): Same fix.

3. In monitoring_setup.py _run_local_shell (line 38): Same fix.

4. In config.py fetch_namespaces (line 117): Same fix.

The simplest approach is to use Python's tempfile.NamedTemporaryFile(suffix='.kubeconfig', dir=kc_dir, delete=False) to create a unique file per call, then clean up in a try/finally block. Alternatively, pass the profile name through to these functions and use it in the filename, similar to cluster_creator.py.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 19 additional findings in Devin Review.

Open in Devin Review

"""
is_helm = command.strip().startswith("helm ")

if profile.kubeconfig_content:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 run_kubectl routes on kubeconfig_content alone, inconsistent with all other routing functions

The run_kubectl function at k8s-agent/modules/cluster_creator.py:1392 checks only if profile.kubeconfig_content: to decide whether to run commands locally vs. via SSH. Every other routing function in the codebase — _run_on_cluster in log_analyzer.py:66, _run_on_cluster in monitoring_setup.py:62, run_diagnostic in cluster_debugger.py:159, run_custom_command in cluster_debugger.py:227, and check_pod_issues in cluster_debugger.py:335 — consistently checks both profile.cluster_source == "imported" and profile.kubeconfig_content. This means a provisioned cluster that also has its kubeconfig stored (e.g., fetched post-provisioning) would use local kubectl in run_kubectl but SSH in all other functions. If the local machine can't reach the API server directly (e.g., it's in a private network only reachable via SSH), run_kubectl commands would fail while debugger/monitoring/log commands succeed via SSH. The function's own docstring also states the intent is routing for imported clusters.

Suggested change
if profile.kubeconfig_content:
if profile.cluster_source == "imported" and profile.kubeconfig_content:
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

…codes, per-path/upstream breakdowns, slow requests
… Limits, add pod count per node, fix Set Active profile button
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

…ectbox renders, add on_change callback, delete profile_selector on all profile state changes
…ama connection, model fetching, streaming support
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants