Astropy caching#1190
Open
rcboufleur wants to merge 19 commits into
Open
Conversation
- Add benchmark.py: Section-based timing for identifying bottlenecks - Add resource_monitor.py: CPU, memory, I/O, and load monitoring - Instrument occ_path_coeff.py with benchmark and resource monitoring - Enable with BENCHMARK_ENABLED=1 and RESOURCE_MONITOR=1 env vars - All monitoring is non-intrusive and easily removable (marked with comments) - Reverted experimental optimizations that compromised scientific accuracy
- Benchmark and ResourceMonitor now accept 'enabled' parameter - occ_path_coeff.py reads debug flag from obj_data or obj_data.predict_occultation - Environment variables still work as fallback (BENCHMARK_ENABLED, RESOURCE_MONITOR) - Set debug=True in job configuration to enable monitoring on cluster
- Added set_debug() method to Asteroid class - Call a.set_debug(DEBUG) in submit_tasks after setting job_id - This propagates debug=True from job to each asteroid's JSON - Enables benchmarking and resource monitoring on cluster workers
- Removed 'enabled' parameter from Benchmark and ResourceMonitor - BENCHMARK_ENABLED=1 enables benchmarking - RESOURCE_MONITOR=1 enables resource monitoring - Reverted asteroid.set_debug() and run_pred_occ changes - Set env vars in cluster.sh for cluster runs
- MAX_NODES: controls max_blocks (default: 20) - MAX_WORKERS_PER_NODE: controls max_workers per node (default: 28) - BENCHMARK_ENABLED/RESOURCE_MONITOR: passed to cluster workers - Reduces I/O contention by limiting concurrent workers - init_blocks still calculated dynamically in run_pred_occ.py
- Update parsl_config.py to properly detect and export BENCHMARK_ENABLED and RESOURCE_MONITOR - Use 'in os.environ' check to reliably detect variables from docker-compose environment section - Export variables in worker_init script to ensure they reach all worker processes - Fixes issue where variables from .env file were not reaching Parsl workers
- Fix import order in run_pred_occ.py: move astropy_cache_config before astropy.config This ensures XDG_CACHE_HOME is set before astropy initializes its cache directory - Configure astropy cache in env.sh for linea environment on lead node Sets XDG_CACHE_HOME based on PREDICT_INPUTS parent directory for shared filesystem cache - Export BENCHMARK_ENABLED and RESOURCE_MONITOR in env.sh for lead node Ensures variables are available when get_config() runs and can be passed to workers - Export BENCHMARK_ENABLED and RESOURCE_MONITOR in cluster.sh for workers Variables are passed via Parsl envs dict but need explicit export for Python processes Fixes: - Astropy cache now uses shared filesystem location instead of /home/user/.astropy/cache - Environment variables from .env properly propagated to cluster workers - Works for both daemon.sh and rerun.sh execution paths
This fixes the hanging issue during ingestion worker startup. When astropy.config was imported before astropy_cache_config, astropy would initialize with the default cache directory (~/.cache/astropy) instead of the shared filesystem cache. This caused IERS data lookups to fail and trigger download attempts, leading to hangs when network was unavailable or slow. By importing astropy_cache_config first, XDG_CACHE_HOME is set before astropy initializes, ensuring it uses the correct shared cache location where IERS data already exists.
The pandas to_sql() method with custom upsert function processes data in chunks (default ~1000 rows). Each chunk executes a separate INSERT ... ON CONFLICT statement, which checks the hash_id unique constraint. When multiple asteroids finish simultaneously, this creates many concurrent INSERT operations all checking the same unique constraint, causing database lock contention and slowdowns. By increasing chunksize to 5000, we reduce the number of INSERT statements by 5x, significantly reducing lock contention on the hash_id constraint while still maintaining reasonable transaction sizes. This should improve ingestion performance when multiple asteroids complete processing at the same time.
- Fix UnboundLocalError in predict_occ.py: Initialize occultation_file before try block and add guard check - Add null value handling in consolidate_results() and ingest_predictions(): Filter out rows with null closest_approach to prevent NOT NULL constraint violations - Revert chunksize optimization in occultation.py: Restore original ingestion method without chunksize parameter
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.