Skip to content

Goes full cron skare update#132

Open
william-aaron-CFA wants to merge 28 commits intomainfrom
goes_full_cron_skare_update
Open

Goes full cron skare update#132
william-aaron-CFA wants to merge 28 commits intomainfrom
goes_full_cron_skare_update

Conversation

@william-aaron-CFA
Copy link
Copy Markdown
Contributor

This PR addresses refactors the rest of the GOES scripts to a cross-platform and cross-environment portable approach, thereby allowing the alert cronjob to be run in configurable environments. This is achieved by the following:

  • Refactor of the shell-changing wrap script to a uniform environment dependent format.
  • Refactor the pathing of the shebang python executable and file directories to depend on environment variables.
  • Refactor the lockfile stall-handling paradigm to use shell-independent process management libraries.
  • Document the new cron jobs for primary and secondary runs, along with cron table scope environment variables to be written in the mta@boba-v and mta@r2d2-v machine cron tables for primary and secondary runs.

@william-aaron-CFA william-aaron-CFA self-assigned this Mar 16, 2026
@william-aaron-CFA william-aaron-CFA added documentation Improvements or additions to documentation enhancement New feature or request labels Mar 16, 2026
@william-aaron-CFA william-aaron-CFA force-pushed the goes_full_cron_skare_update branch from 94d7918 to 22ae857 Compare March 20, 2026 15:32
…es and in-place UDP data streaming download.
… Use downlaod functions instead of os.system calls for shell distribution dependent web downloads.
… Use File IO operations to process large archive file.
…ile read of the existing GOES differential and integral protons in the existing data directory.
@william-aaron-CFA william-aaron-CFA force-pushed the goes_full_cron_skare_update branch from 22ae857 to 3d1d277 Compare March 31, 2026 19:30
#
SPACE_WEATHER = Path(os.getenv('SPACE_WEATHER', "/data/mta4/Space_Weather"))
GOES_DATA_DIR : Path = SPACE_WEATHER / "GOES" / "Data"
HRC_PROXY_ARCHIVE : Path= GOES_DATA_DIR / "hrc_proxy.csv"
Copy link
Copy Markdown
Contributor Author

@william-aaron-CFA william-aaron-CFA Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now define our data directories based on the running environment, with default values set to the primary run versions. The environment variables set by the primary and secondary cronjob runs are documented in the GOES README.md

print(msg)
else:
p = Popen(["/sbin/sendmail", "-t", "-oi"], stdin=PIPE)
p.communicate(msg.as_bytes())
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test case handling for sending notification email if there is an interruption in the HRC proxy archive record.

check_cadence()

#: Remove lock file once process is completed
os.remove(lock) No newline at end of file
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lock file portion handles race conditions and stalls. The design approach is the same, but now instead of executing shell commands directly, we use process management libraries like psutil, os

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the HRC Proxy archive is a single python script now called directly from the cron table. Thus, the check_archive shell scripts are no longer necessary. See README.md for cronjob entry.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the HRC Proxy archive is a single python script now called directly from the cron table. Thus, the check_archive shell scripts are no longer necessary. See README.md for cronjob entry.

OUT_DATA_DIR = "/data/mta4/Space_Weather/GOES/Data"
SPACE_WEATHER = Path(os.getenv('SPACE_WEATHER', "/data/mta4/Space_Weather"))
GOES_DATA_DIR : Path = SPACE_WEATHER / "GOES" / "Data"
OUT_GOES_DATA_DIR : Path = SPACE_WEATHER / "GOES" / "Data"
Copy link
Copy Markdown
Contributor Author

@william-aaron-CFA william-aaron-CFA Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now define our data directories based on the running environment, with default values set to the primary run versions. The environment variables set by the primary and secondary cronjob runs are documented in the GOES README.md

#
os.system(f"rm /tmp/{user}/{name}.lock")
#: Remove lock file once process is completed
os.remove(lock) No newline at end of file
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lock file portion handles race conditions and stalls. The design approach is the same, but now instead of executing shell commands directly, we use process management libraries like psutil, os

GOES_DATA_DIR = '/data/mta4/Space_Weather/GOES/Data'
SPACE_WEATHER = Path(os.getenv('SPACE_WEATHER', "/data/mta4/Space_Weather"))
GOES_DATA_DIR : Path = SPACE_WEATHER / "GOES" / "Data"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now define our data directories based on the running environment, with default values set to the primary run versions. The environment variables set by the primary and secondary cronjob runs are documented in the GOES README.md

os.system(f"rm /tmp/{user}/{name}.lock")

#: Remove lock file once process is completed
os.remove(lock) No newline at end of file
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lock file portion handles race conditions and stalls. The design approach is the same, but now instead of executing shell commands directly, we use process management libraries like psutil, os

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Equivalent operation to the goes_main_script used to bundle together GOES data processing for file fetching, plotting, and webpage generation. Now environment portable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Archiving long term GOES data now handled by calling the collect_goes_long.py python script directly in the cron table. Thus the two goes_long shell scripts are no longer needed. See GOES README.md for details.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Archiving long term GOES data now handled by calling the collect_goes_long.py python script directly in the cron table. Thus the two goes_long shell scripts are no longer needed. See GOES README.md for details.

Copy link
Copy Markdown
Contributor Author

@william-aaron-CFA william-aaron-CFA Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorporated into the goes.sh script.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorporated into the goes.sh script.

SPACE_WEATHER = Path(os.getenv("SPACE_WEATHER", "/data/mta4/Space_Weather"))
SPACE_WEATHER_WEB = Path(os.environ.get('SPACE_WEATHER_WEB', "/data/mta4/www/RADIATION"))
GOES_DATA_DIR : Path = SPACE_WEATHER / "GOES" / "Data"
GOES_PLOT_DIR : Path = SPACE_WEATHER_WEB / "GOES" / "Plots"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now define our data directories based on the running environment, with default values set to the primary run versions. The environment variables set by the primary and secondary cronjob runs are documented in the GOES README.md

Additionally, instead of re-fetching the GOES data files in order to generate the plot, we read the already fetched data files handled by the fetch_goes_data.py script. This means that later changes in this plot_goes_data.py script will involve removing unnecessary data fetching and formatting as this has already been accomplished by the fetch_goes_data.py script.

intg_data_dict["title"] = "Proton Flux (Integral)"
intg_data_dict["labels"] = INTG_GROUP_SELECTION
intg_data_dict["colors"] = ["red", "blue", "#51FF3B"]
intg_data_dict["limits"] = {"y_min": 1e-2, "y_max": 1e4}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewritten main function to read the existing data files and perform all plotting operations. Algorithmic approach is the same.

sleep(5)
_last_exception.add_note(f'Decorator ran function {_freq} times. Still encountered error.')
raise _last_exception
return wrapper_func
Copy link
Copy Markdown
Contributor Author

@william-aaron-CFA william-aaron-CFA Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rerun function decorator was included to assist fetching of the GOES data table in the event of a JSON decode or network error. This is handled by the fetch_goes_data.py script already, and the plotting script has been rewritten to use the existing data files rather than recreating them, thereby removing the need for this function.

with urllib.request.urlopen(jlink, timeout = 10) as url:
data = json.loads(url.read().decode())
data = Table(data)
return data
Copy link
Copy Markdown
Contributor Author

@william-aaron-CFA william-aaron-CFA Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This data fetching function was included to handle fetching of the GOES data table. This is handled by the fetch_goes_data.py script already, and the plotting script has been rewritten to use the existing data files rather than recreating them, thereby removing the need for this function.

intg_data_dict["colors"] = ["red", "blue", "#51FF3B"]
intg_data_dict["limits"] = {"y_min": 1e-2, "y_max": 1e4}

return Table(rows = new_rows)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reorientation function was included to format SWPC NOAA data into an astropy table in MeV units. This is handled by the fetch_goes_data.py script already, and the plotting script has been rewritten to use the existing data files rather than recreating them, thereby removing the need for this function.

f.write(str(pid))
main()
#: Remove lock file once process is completed
os.remove(lock) No newline at end of file
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lock file portion handles race conditions and stalls. The design approach is the same, but now instead of executing shell commands directly, we use process management libraries like psutil, os

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fetching the GOES-19 associated media is performed by the calling the swpc_media.py python script directly in the cron table. Thus the two pull_swpc_media shell scripts are no longer needed. See GOES README.md for details.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fetching the GOES-19 associated media is performed by the calling the swpc_media.py python script directly in the cron table. Thus the two pull_swpc_media shell scripts are no longer needed. See GOES README.md for details.


2-59/5 * * * * ${ENV_FLIGHT}/bin/skare ${SPACE_WEATHER}/goes.sh >> ${HOME}/Logs/goes_main_new.cron 2>&1
3-59/5 * * * * cd ${SPACE_WEATHER}/GOES/Scripts; ${ENV_FLIGHT}/bin/skare python alert_hrc.py -m flight >> ${HOME}/Logs/goes_main_new.cron 2>&1
4-59/5 * * * * cd ${SPACE_WEATHER}/GOES/Scripts; ${ENV_FLIGHT}/bin/skare python check_archive.py -m flight >> ${HOME}/Logs/goes_archive_check.cron 2>&1
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These newly listed cronjobs are not truly new. Instead, the python scripts have been refactored to be environment portable, and therefore the the shell wrapper scripts are no longer necessary.

#
GOES_MEDIA_DIR = '/data/mta4/www/RADIATION/GOES/Media'
SPACE_WEATHER_WEB = Path(os.environ.get('SPACE_WEATHER_WEB', "/data/mta4/www/RADIATION"))
GOES_MEDIA_DIR : Path = SPACE_WEATHER_WEB / "GOES" / "Media"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now define our data directories based on the running environment, with default values set to the primary run versions. The environment variables set by the primary and secondary cronjob runs are documented in the GOES README.md

resp = requests.get(url, timeout=30)
resp.raise_for_status()
img = Image.open(io.BytesIO(resp.content))
return img
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

download_img() function included to remove the need for downloading the image file with a wget command into an intermediary file directory. The image is loaded directly into the python execution instead.

with open(file_out, 'wb') as f:
for chunk in resp.iter_content(chunk_size = 1024*1024):
if chunk:
f.write(chunk)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored video download into a function for stream downloading, thereby removing the dependence on wget

GOES_DATA_DIR : Path = SPACE_WEATHER / "GOES" / "Data"
GOES_WEB_DIR : Path = SPACE_WEATHER_WEB / "GOES"
TESTMAIL = False
ADMIN = 'mtadude@cfa.harvard.edu'
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now define our data directories based on the running environment, with default values set to the primary run versions. The environment variables set by the primary and secondary cronjob runs are documented in the GOES README.md

print(msg)
else:
p = Popen(["/sbin/sendmail", "-t", "-oi"], stdin=PIPE)
p.communicate(msg.as_bytes())
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the GOES HRC proxy alert's dependence on the Gp_pchan_5m.txt file, this update_goes_html_page.py script includes a notification system for warning the MTA team if there is any runtime issues. This code change to the send_mail() function includes better testing capabilities and handling for platform independent mail commands.

#
os.system(f"rm /tmp/{user}/{name}.lock")
#: Remove lock file once process is completed
os.remove(lock) No newline at end of file
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lock file portion handles race conditions and stalls. The design approach is the same, but now instead of executing shell commands directly, we use process management libraries like psutil, os

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant