Astropy caching by rcboufleur · Pull Request #1190 · linea-it/tno

rcboufleur · 2025-12-11T21:22:58Z

No description provided.

- Add benchmark.py: Section-based timing for identifying bottlenecks - Add resource_monitor.py: CPU, memory, I/O, and load monitoring - Instrument occ_path_coeff.py with benchmark and resource monitoring - Enable with BENCHMARK_ENABLED=1 and RESOURCE_MONITOR=1 env vars - All monitoring is non-intrusive and easily removable (marked with comments) - Reverted experimental optimizations that compromised scientific accuracy

- Benchmark and ResourceMonitor now accept 'enabled' parameter - occ_path_coeff.py reads debug flag from obj_data or obj_data.predict_occultation - Environment variables still work as fallback (BENCHMARK_ENABLED, RESOURCE_MONITOR) - Set debug=True in job configuration to enable monitoring on cluster

- Added set_debug() method to Asteroid class - Call a.set_debug(DEBUG) in submit_tasks after setting job_id - This propagates debug=True from job to each asteroid's JSON - Enables benchmarking and resource monitoring on cluster workers

- Removed 'enabled' parameter from Benchmark and ResourceMonitor - BENCHMARK_ENABLED=1 enables benchmarking - RESOURCE_MONITOR=1 enables resource monitoring - Reverted asteroid.set_debug() and run_pred_occ changes - Set env vars in cluster.sh for cluster runs

- MAX_NODES: controls max_blocks (default: 20) - MAX_WORKERS_PER_NODE: controls max_workers per node (default: 28) - BENCHMARK_ENABLED/RESOURCE_MONITOR: passed to cluster workers - Reduces I/O contention by limiting concurrent workers - init_blocks still calculated dynamically in run_pred_occ.py

- Update parsl_config.py to properly detect and export BENCHMARK_ENABLED and RESOURCE_MONITOR - Use 'in os.environ' check to reliably detect variables from docker-compose environment section - Export variables in worker_init script to ensure they reach all worker processes - Fixes issue where variables from .env file were not reaching Parsl workers

- Fix import order in run_pred_occ.py: move astropy_cache_config before astropy.config This ensures XDG_CACHE_HOME is set before astropy initializes its cache directory - Configure astropy cache in env.sh for linea environment on lead node Sets XDG_CACHE_HOME based on PREDICT_INPUTS parent directory for shared filesystem cache - Export BENCHMARK_ENABLED and RESOURCE_MONITOR in env.sh for lead node Ensures variables are available when get_config() runs and can be passed to workers - Export BENCHMARK_ENABLED and RESOURCE_MONITOR in cluster.sh for workers Variables are passed via Parsl envs dict but need explicit export for Python processes Fixes: - Astropy cache now uses shared filesystem location instead of /home/user/.astropy/cache - Environment variables from .env properly propagated to cluster workers - Works for both daemon.sh and rerun.sh execution paths

This fixes the hanging issue during ingestion worker startup. When astropy.config was imported before astropy_cache_config, astropy would initialize with the default cache directory (~/.cache/astropy) instead of the shared filesystem cache. This caused IERS data lookups to fail and trigger download attempts, leading to hangs when network was unavailable or slow. By importing astropy_cache_config first, XDG_CACHE_HOME is set before astropy initializes, ensuring it uses the correct shared cache location where IERS data already exists.

The pandas to_sql() method with custom upsert function processes data in chunks (default ~1000 rows). Each chunk executes a separate INSERT ... ON CONFLICT statement, which checks the hash_id unique constraint. When multiple asteroids finish simultaneously, this creates many concurrent INSERT operations all checking the same unique constraint, causing database lock contention and slowdowns. By increasing chunksize to 5000, we reduce the number of INSERT statements by 5x, significantly reducing lock contention on the hash_id constraint while still maintaining reasonable transaction sizes. This should improve ingestion performance when multiple asteroids complete processing at the same time.

- Fix UnboundLocalError in predict_occ.py: Initialize occultation_file before try block and add guard check - Add null value handling in consolidate_results() and ingest_predictions(): Filter out rows with null closest_approach to prevent NOT NULL constraint violations - Revert chunksize optimization in occultation.py: Restore original ingestion method without chunksize parameter

rcboufleur added 19 commits December 11, 2025 15:24

Add astropy caching

3285d5b

Add IERS astropy caching

8979823

Add height and units to EarthLocation

26fccd6

Fix datetime parsing due to PRAIA conversion bug

ea29cec

Add manual workflow

4a13a6f

Fix pre-commit

ead30e0

Fix lint

212b5b9

Update parsl config

7a31290

Fix pre-commit bugs

c7bd723

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Astropy caching#1190

Astropy caching#1190
rcboufleur wants to merge 19 commits into
mainfrom
astropy_caching

rcboufleur commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rcboufleur commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant