1091 ibm cpd scheduler not deleted by cp4d delete instancesh#1096
1091 ibm cpd scheduler not deleted by cp4d delete instancesh#1096luigimolinaro wants to merge 8 commits intomainfrom
Conversation
The delete_ibm_scheduler() function was using hardcoded 'ibm-scheduling' instead of the PROJECT_SCHEDULING_SERVICE variable. This caused the scheduler namespace to not be deleted when running cp4d-delete-instance.sh. Changes: - Use PROJECT_SCHEDULING_SERVICE variable with default 'cpd-scheduler' - Update confirmation message to use the variable - Align with cpd_vars.sh standard variable naming Fixes #1091 Signed-off-by: Luigi Molinaro <luigi.molinaro@ibm.com>
Enhanced delete_ibm_scheduler() to handle stuck namespaces in Terminating state: - Use PROJECT_SCHEDULING_SERVICE variable (default: cpd-scheduler) - Export namespace to JSON and remove kubernetes finalizers - Use OpenShift REST API to force finalize the namespace - Clean up temporary JSON file after operation - Align with best practices from focedeletens.sh script This ensures the scheduler namespace is properly deleted even when stuck with finalizers. Fixes #1091 Signed-off-by: Luigi Molinaro <luigi.molinaro@ibm.com>
Enhanced cp4d-delete-instance.sh with multiple improvements for reliable namespace deletion: NEW FEATURES: - Added --force-finalizer option to enable forced finalizer removal via OpenShift REST API - Added --timeout option to configure namespace deletion timeout (default: 900s) - Implemented automatic retry logic with up to 3 attempts when timeout is reached - Added comprehensive diagnostic output when namespace deletion is stuck ROBUST CLEANUP FUNCTIONS: - force_remove_resource_finalizers(): Removes finalizers from blocking resources * PersistentVolumeClaims (PVCs) * PersistentVolumes (PVs) associated with namespace * Pods stuck in Terminating state (force delete with grace-period=0) * Services with finalizers * ConfigMaps with finalizers * Secrets with finalizers - diagnose_namespace_stuck(): Provides detailed diagnostic information * Lists all remaining resources in namespace * Shows namespace status and finalizers * Identifies terminating pods * Lists PVCs that may be blocking deletion ENHANCED WAIT LOGIC: - Configurable timeout with progress logging every 60 seconds - Automatic retry with forced cleanup when timeout is reached - Shorter timeout (300s) for retry attempts - Better error handling and return codes NAMESPACE DELETION IMPROVEMENTS: - Applied force_remove_finalizers() to all namespace deletion functions: * delete_operator_ns (CP4D operators) * delete_instance_ns (CP4D instance) * delete_knative (knative-eventing, knative-serving) * delete_app_connect (ibm-app-connect) * delete_ibm_scheduler (cpd-scheduler) - now uses PROJECT_SCHEDULING_SERVICE variable * delete_ibm_license_server (ibm-licensing) * delete_ibm_certificate_manager (ibm-cert-manager) * delete_common_services_control (cs-control) BUG FIXES: - Fixed delete_ibm_scheduler() to use PROJECT_SCHEDULING_SERVICE variable instead of hardcoded 'ibm-scheduling' - Removed duplicate namespace deletion attempts in licensing and cert-manager functions USAGE: ./cp4d-delete-instance.sh <namespace> ./cp4d-delete-instance.sh -n <namespace> --force-finalizer --timeout 1200 Signed-off-by: Luigi Molinaro <luigi.molinaro@ibm.com>
Added color-coded logging functions with visual indicators: - ✓ Green (log_success): Successful operations - ✗ Red (log_error): Errors and failures - ⚠ Yellow (log_warning): Warnings and timeouts - ℹ Cyan (log_info): Informational messages Applied colored logging throughout the script: - Success messages when namespaces are deleted - Warnings for timeouts and stuck namespaces - Errors for failed deletion attempts - Info messages for cleanup operations and diagnostics This makes it much easier to quickly identify the status of operations during namespace deletion, especially when dealing with stuck resources. Signed-off-by: Luigi Molinaro <luigi.molinaro@ibm.com>
f42326c to
a8e277d
Compare
Implemented parallel deletion mode to significantly speed up namespace cleanup: NEW OPTION: - --parallel: Enable parallel deletion of multiple namespaces NEW FUNCTIONS: - start_ns_deletion(): Initiates namespace deletion in background (non-blocking) - wait_multiple_ns_deleted(): Waits for multiple namespaces to complete deletion in parallel * Monitors all namespaces simultaneously * Progress logging every 60 seconds * Applies forced cleanup to stuck namespaces if --force-finalizer is enabled * Individual status reporting for each namespace PARALLEL DELETION STRATEGY: 1. Instance and operator namespaces deleted sequentially (dependencies) 2. Cluster-wide namespaces deleted in parallel: - knative-eventing - knative-serving - ibm-app-connect - cpd-scheduler - ibm-licensing (if not shared) - ibm-cert-manager (if not shared) - cs-control PERFORMANCE IMPROVEMENT: - Sequential mode: ~15-30 minutes (namespaces deleted one by one) - Parallel mode: ~5-10 minutes (multiple namespaces deleted simultaneously) - Up to 3x faster for environments with many namespaces USAGE: # Sequential deletion (default, original behavior) ./cp4d-delete-instance.sh -n cpd-instance # Parallel deletion (faster) ./cp4d-delete-instance.sh -n cpd-instance --parallel # Parallel with force finalizer ./cp4d-delete-instance.sh -n cpd-instance --parallel --force-finalizer BACKWARD COMPATIBILITY: - Default behavior unchanged (sequential deletion) - Parallel mode only activated with --parallel flag - All existing options work with both modes Signed-off-by: Luigi Molinaro <luigi.molinaro@ibm.com>
fketelaars
left a comment
There was a problem hiding this comment.
- Enables forced removal of Kubernetes finalizers via OpenShift REST API --> Why use REST API whereas in other parts of the code, the oc patch command is used?
- When trying the command with a non-existing CP4D instance namespace, the command tries to delete ' ' namespace and waits 900 seconds
- Parallel should be the default
fketelaars
left a comment
There was a problem hiding this comment.
This is what I mean:
cp4d-delete-instance.sh cpd --parallel
About to delete the following from the cluster:
- Instance namespace:
- Operator namespace:
- IBM Custom Resource Definitions
Are you sure (y/N)? y
ℹ [2026-03-13 07:12:21] Using parallel deletion mode for faster execution
error: resource(s) were provided, but no name was specified
[2026-03-13 07:12:21] Getting Custom Resources in OpenShift project ...
You must specify the type of resource to get. Use "oc api-resources" for a complete list of supported resources.
error: Required resource not specified.
Use "oc explain <resource>" for a detailed description of that resource (e.g. oc explain pods).
See 'oc get -h' for help and examples
[2026-03-13 07:12:22] Delete all Custom Resources except the base ones
[2026-03-13 07:12:22] Delete remaining Custom Resources
[2026-03-13 07:12:22] Delete role binding if Cloud Pak for Data was connected to IAM
error: resource(s) were provided, but no name was specified
error: the server doesn't have a resource type "authentication"
[2026-03-13 07:12:22] Waiting for deletion of namespace (timeout: 900s)...
My assumption is that the environment variables normally used to manage Cloud Pak for Data are set. For example, variables like: If these variables are not defined, the script has no way to determine the namespaces and resources it should operate on. This variables are usually very clear when Cloud Pak is installed without the deployer, because these variables are explicitly configured. However, when using the Cloud Pak Deployer, users often rely only on the deployer abstraction and may not be aware of these underlying variables. I'm currently thinking about how we could improve the script so it can automatically discover these values instead of relying on environment variables. Let me think about the best way to implement this. |
Major improvements: - Auto-discovery of CP4D namespaces by finding ZenService resources - Flexible pattern matching for namespace variants (licensing, cert-manager, scheduler) - --dry-run mode for safe testing without deletion - Enhanced confirmation summary with detailed resource counts - Improved help documentation with comprehensive usage examples - Fixed argument parsing to properly handle flags like --dry-run - Fixed variable scope issues (removed invalid 'local' declarations) - Smart namespace detection excludes OpenShift system namespaces - No default assumptions for optional namespaces (scheduler, cert-manager, licensing) - Single confirmation prompt with comprehensive deletion summary - Better error messages when auto-discovery fails The script now provides a much safer and user-friendly experience for deleting CP4D instances.
|
I think i find a way : Please check |
fketelaars
left a comment
There was a problem hiding this comment.
I'm not sure if the changes are heading in the right direction. Concerns:
- cert-manager and the operator namespace are Red Hat. I don't want this script to touch these, only the IBM certificate manager if it exists.
- I really don't want auto-discovery for the CP4D instance namespace; this is too risky IMO. Let users specify the namespace and assume that the operator namespace is <instance_namespace>-operators or it must be specified at command line or environment variable
- There are situations where instance namespace was deleted (or pending deletion) and the operator namespace was not, also vice-versa. Don't limit the script by assuming that these are still there; I want it to clean up residuals as well, even if orphaned.
Bug Fixes
Fixed Scheduler Namespace Deletion
ibm-schedulingnamespace instead of the configurable variablePROJECT_SCHEDULING_SERVICEvariable (default:cpd-scheduler)Removed Duplicate Deletion Attempts
delete_ibm_license_server()anddelete_ibm_certificate_manager()New Features
1. Force Finalizer Removal (
--force-finalizer)2. Configurable Timeout (
--timeout <SECONDS>)--timeout 1200for 20-minute timeout3. Automatic Retry Logic
--force-finalizeris enabled4. Parallel Namespace Deletion (
--parallel)5. Colored Output
log_success): Successful operationslog_error): Errors and failureslog_warning): Warnings and timeoutslog_info): Informational messagesRobust Cleanup Functions
force_remove_resource_finalizers()Removes finalizers from resources that commonly block namespace deletion:
--grace-period=0diagnose_namespace_stuck()Provides comprehensive diagnostic information when namespaces are stuck:
Enhanced
wait_ns_deleted()