Add run_fcre_aed_forecast entry function for FaaSr-deployable GLM-AED forecast by Ashish-Ramrakhiani · Pull Request #92 · FLARE-forecast/FCRE-forecast-code

Ashish-Ramrakhiani · 2026-05-22T22:25:08Z

Summary

Adds a single-action entry function (workflows/glm_aed_flare_rs/run_fcre_aed_forecast.R)
that wraps the FCRE GLM-AED forecast loop in a form deployable as a FaaSr
action. The existing standalone driver (combined_run_aed.R) is preserved
unchanged and continues to work.

Changes

New workflows/glm_aed_flare_rs/run_fcre_aed_forecast.R — callable
entry function with embedded forecast loop, soft time-budget for
long-running-action targets, and config refresh after update_run_config.
Refactor R/convert_vera4cast_inflow.R, R/generate_forecast_score_arrow.R,
and workflows/glm_aed_flare_rs/generate_inflow_forecast.R — accept an
optional config argument; when supplied, route S3 reads/writes through
the new FLAREr::flare_* dispatch layer. When config = NULL (the
existing standalone-driver call pattern), behavior is unchanged.
Config bundle updates for FLAREr v3.1-dev-netcdf-v2 compatibility:
- configure_run.yml: adds use_faasr flag
- configure_flare_glm_aed.yml: base_AED_phyto_pars_nml_file,
  non_vertical_noise_config_file, restart_save_timesteps: "all";
  obs_filename switched to insitu
- glm3.nml: init_restart_from_file, init_restart_fname,
  restart_fname placeholders
- observations_config_aed.csv: model_source / model_variable /
  model_depth_m columns; secchi row dropped
- non_vertical_noise_config.csv: new file (verbatim from upstream
  FLAREr v3.1-dev-netcdf-v2)

…ous datastores

…lare_*

…ed_run_aed.R)

…mote_folder

…consistency

FaaSr_py uses GitHub tarball API for plain owner/repo entries, which extracts to /tmp/functions/{InvocationID}/{Owner}-{Repo}-{sha}/, not /{Owner}/{Repo}/. The previous glob expected the clone-style path and fell through to here::here() = /action.

… sim_name lake_directory discovery: replace the fork-username-specific glob with a marker-based walk of /tmp/functions for any directory containing configuration/<config_set>/<configure_run_file>. Works for any fork without code changes; portable across the FaaSr clone vs tarball extraction layouts. configure_run.yml (deployment-only on flare_faasr_int): pin forecast_start_datetime to 2026-05-01 (verified both noaa stage2 and vera4cast inflow_gefsClimAED have data for that date) and rename sim_name to glm_aed_flare_rs_faasr_test so all S3 writes (restart/forecasts/scores) land in a sandbox path distinct from the operational glm_aed_flare_v3 model_id.

… clobber When sourced from run_fcre_aed_forecast.R, the unconditional lake_directory <- here::here() / config_set_name <- 'glm_aed_flare_v3' re-assignments overwrote the caller's already-resolved values, causing the subsequent set_up_simulation to look in /action/configuration/ glm_aed_flare_v3/configure_run.yml and fail. Wrap each top-level var in 'if (!exists(...))'. Console / RStudio users still get the original defaults (no parent values present); programmatic callers can pre-set lake_directory, config_set_name, configure_run_file, config to avoid the clobber. Backward-compatible.

…lenv faasr_source_r_files at action startup recursively sources every .R file in the FCRE tarball. Many legacy scripts assign top-level lake_directory <- here::here() and config_set_name <- 'ltreb' (etc.) into globalenv. Those values persist. source() defaults to local=FALSE which evaluates in globalenv. So when run_fcre_aed_forecast did source(generate_inflow_forecast.R), the sourced script's exists('lake_directory') check found the stale globalenv value and skipped re-assignment, keeping lake_directory='/action' and config_set_name='ltreb'. The subsequent set_up_simulation tried '/action/configuration/ltreb/configure_aed_run.yml' and failed. Pass local=TRUE to all four source() calls so they evaluate in the entry function's local environment, where lake_directory and config_set_name are correctly set.

rstac's STAC datetime parameter requires RFC 3339 format ('YYYY-MM-DDTHH:MM:SSZ'). YAML loads start_datetime/forecast_start_datetime as POSIXct or as space-separated strings, both of which produce 'YYYY-MM-DD HH:MM:SS' on paste() — rstac rejects with 'date time provided not follow the RFC 3339 format'. Format with strftime to ISO 8601 / RFC 3339 before calling get_lst. Wrap get_lst and get_vals in tryCatch so any failure (rstac error, network blip, no images for the date range) falls cleanly into the existing 'No new RS data' placeholder branch instead of crashing the whole forecast iteration. Replaces the brittle exists('vals') check with an explicit !is.null(vals).

Stage3 met data has duplicate (datetime, parameter, variable) entries on 2025-07-22, 2025-07-24, 2025-07-25 (different prediction values for the same key). With the prior 2026-05-01 pin and start_datetime 2024-10-01, the 19-month spinup window pulled these duplicates into create_met_files's pivot_wider, producing list-columns and crashing mutate(WindSpeed = sqrt(eastward_wind^2 + northward_wind^2)) with "non-numeric argument to binary operator". The 2024-10-01 -> 2024-12-20 window predates the duplicate zone, has both noaa stage2 and vera4cast inflow data available, and runs ~7x faster on spinup. The stage3 duplicate-data issue is an upstream data-quality problem out of scope for this work.

Cuts the forward GLM-AED simulation from 34 days x 221 ensembles to 1 day x 221 ensembles. The spinup window (start_datetime -> forecast_start_datetime) is unchanged at 80 days; this only shrinks the forward forecast step. Deployment-only on flare_faasr_int; the operational YAML keeps forecast_horizon: 34.

…ility FLAREr v3.1-dev's generate_initial_conditions.R:128 does init$inflation[] <- config$da_setup$inflation_factor; without the field on the RHS, R errors with "replacement has length zero". The field was introduced in v3.1-dev's param-fitting work and doesn't exist in main, which is why production pipelines running off main don't hit it. 1.0 = no inflation, matching every bundled extdata config in FLAREr itself (default/, aed/, default_pf/, open_meteo/). Harmless on FLAREr main, required on v3.1-dev.

v3.1-dev's run_da_forecast.R branches on config$da_setup$add_random_noise (values 0/1/2) at lines 386, 402, 450, with no default. When the field is missing from the YAML, the if() expression evaluates to a length-zero NULL and R errors with "argument is of length zero". FLAREr main unconditionally calls add_process_noise() per ensemble at run_da_forecast.R:374 -- equivalent to v3.1-dev's add_random_noise == 2 branch. Setting 2 reproduces main's behaviour. Harmless on FLAREr main (field ignored), required on v3.1-dev.

v3.1-dev's run_flare.R:145 reads config$model_settings$depth_sd into states_non_vertical$depth_sd. Then run_da_forecast.R:388 / :446 use it as the sd argument to rnorm() to perturb lake_depth per ensemble per timestep. Without the YAML field, sd=NULL and rnorm errors with 'invalid arguments'. FLAREr main has no states_non_vertical concept and no depth perturbation in DA — depth_sd: 0.0 reproduces that behavior (rnorm(n, mean, sd=0) returns mean exactly, no randomness). Harmless on main, required on v3.1-dev.

start_datetime: 2024-12-17 (was 2024-10-01) -- DA loop runs 3 daily iterations instead of 80, exercising the same code path with ~25x less spinup compute. ensemble_size: 31 (was 221) -- each DA / forecast step runs 31 GLM-AED simulations instead of 221, ~7x less per-step compute. Combined: ~175x compute reduction for the GLM-AED-heavy DA loop. Deployment-only change on flare_faasr_int; the operational YAML in the upstream branch keeps start_datetime: 2024-10-01 and ensemble_size: 221. The test only verifies pipeline wiring, not forecast quality -- a 3-day spinup is not enough for AED state convergence.

…_aed_flare_v3 Three filter sites in run_fcre_aed_forecast.R hardcoded model_id == 'glm_aed_flare_v3' (carried over from combined_run_aed.R). Two consequences when sim_name != 'glm_aed_flare_v3': 1. Reads the production forecast instead of the one we just wrote -- the pipeline doesn't actually exercise its own output. 2. Score parquet auto-partitions by the model_id column from the filtered data, so writes land at .../scores/.../model_id=glm_aed_flare_v3/... -- additive pollution of the production namespace. Replace the literal with config$run_config$sim_name at all three sites. The script now reads the forecast it just produced, scores it, and writes scores under its own sim_name; generic for any sim_name.

…the run Submit can fail for transient reasons (vera4cast API down, rate limit, schema validation) that aren't related to whether our forecast pipeline worked. Catch and log instead of crashing — restart/forecast/score parquet writes already succeeded at this point, the run should reach completion regardless. AWS env restore now runs after either branch (submit success or caught error).

…config Two fixes: 1. arrow::open_dataset partition listing was taking ~65min each over OSN because the prefix was the bucket root (flare/forecasts/parquet) and arrow had to enumerate every model_id x reference_date directory under the entire forecasts bucket to build the schema, even though the dplyr filter only wanted one partition. Push the partition keys into the prefix so arrow opens exactly one parquet file. The dplyr filter is now redundant and removed. Two sites (vera4cast aggregation block and score block, plus the past_forecasts variant). 2. update_run_config in v3.1-dev FLAREr has a config parameter not present in main. Internally it calls flare_put_file(config = config). Our caller wasn't passing config, so the inner call errored with 'argument config is missing'. Add config = config to the call.

… still resolve Side effect of pointing arrow::open_dataset directly at a leaf partition path (site_id=X/model_id=Y/reference_date=Z): partition columns are embedded in the path, not in the parquet file. arrow only synthesizes them when discovering partitions from the bucket root. At the leaf path, the resulting tibble has only the parquet's own columns - no site_id, model_id, reference_date. Downstream dplyr summarize(.by = c(..., site_id, model_id, ...)) and write_dataset(partitioning = c('site_id','model_id','reference_date')) both need those columns. Add them back via mutate() with the values we already have (the same values we used to construct the prefix). Three sites: vera4cast aggregation forecast_df, score block forecast_df, score block past_forecasts.

… testing Force get_run_config to ignore S3-leftover configure_run.yml and use local YAML on every invoke. Without this, each test run inherits the prior run's advanced forecast_start_datetime + restart_file pointer and tries to resume -- which exposes latent bugs in loop logic that don't manifest in a daily-cron production setting where each invoke runs a single iteration. Revert before merging to main: production runs rely on clean_start=FALSE (default) so the daily restart chain works.

Reverting the deployment-only override added in 045470e. Test repeatability will be handled by clearing the test sandbox's S3 state before each invoke instead of skipping the S3-resume branch in code. The set_up_simulation call returns to its default (clean_start=FALSE).

forecast_start_datetime: 2026-05-04 -- latest date with both noaa stage2 and vera4cast inflow_gefsClimAED data available. After iter 1 update_run_config advances to 2026-05-05; no data published for that date yet, so noaa_ready/inflow_ready flip FALSE and the while loop exits naturally. Mirrors the daily-cron production pattern where each cron firing runs exactly one iteration. start_datetime: 2026-05-01 -- 3-day spinup, well clear of the 2025-07-22 to 2025-07-25 stage3 met-data duplicate zone.

After update_run_config writes the advanced forecast_start_datetime to disk + S3, the outer-scope config in run_fcre_aed_forecast still holds the previous values. Sourced helpers on the next iteration (generate_inflow_forecast.R -> convert_vera4cast_inflow) read the stale value and write inflow partitions under the previous reference_date, while run_flare re-reads config from disk and looks under the new one -- producing arrow IOError 'No such file or directory' on the reference_date partition. Refreshing config via set_up_simulation right after update_run_config keeps the in-memory view aligned with on-disk state. combined_run_aed.R doesn't trip this because each cron firing runs the loop exactly once per process; this script runs multiple iterations in one process.

Iter 1 produces a restart NetCDF whose time vector spans [start_datetime, forecast_start_datetime + horizon]. After iter 1, the combined_run_aed.R-style carry-over math sets the next iteration's start_datetime to old_forecast_start_datetime - 4 days. With a 3-day spinup that target lands BEFORE the NetCDF's earliest time, so which(datetime == new_sd) returns integer(0) and generate_initial_conditions crashes with 'replacement has length zero' when indexing pars_restart with the empty restart_index. Bumping start_datetime from 2026-05-01 to 2026-04-30 gives a 4-day spinup; iter 1's NetCDF then includes 2026-04-30, so iter 2 finds its restart_index. Production daily-cron config doesn't trip this because its 80-day spinup leaves plenty of cushion.

…leanly FLAREr::check_noaa_present catches arrow IOError on a missing NOAA partition with a tryCatch whose handler calls message(error_message), passing the simpleError condition object. message() with a single condition argument signals the condition as-is, preserving its error class -- so the FaaSr executor's outer withCallingHandlers(error=) catches it and marks the action as failed instead of letting the function return FALSE. The team's daily-cron pattern never trips this because their config runs against past dates where stage2 NOAA is always present. Our chained-iteration pattern eventually advances past the published horizon and hits the catch arm. Wrap the two call sites with our own tryCatch that converts any error into noaa_ready=FALSE so the while-loop exits cleanly. Remove this workaround when upstream check_noaa_present is fixed to use message(conditionMessage(e)).

Only set GLM_PATH='GLM3r' when unset, so a container ENV (or any explicit pre-set value) wins. Pass local_path to flare_arrow_s3_bucket at the two forecasts_parquet read sites so the read location matches where write_forecast writes in local mode.

FLAREr's run_flare on v3.1-dev-netcdf-v2 reads config$model_settings$base_AED_phyto_pars_nml_file (with _file suffix) and passes it to update_phy_states_obs_mapping, which calls read_nml on it. Point the new field at aed2.nml, which contains the aed_phytoplankton block the function expects.

FLAREr's run_model on v3.1-dev-netcdf-v2 sets init_restart_from_file via glmtools::set_nml() before each GLM run; the call requires the variable to exist in the base nml. Adds the placeholder (=0) plus init_restart_fname to match the bundled aed/default templates.

Last NML field FLAREr's run_model writes via glmtools::set_nml that wasn't present in glm3.nml. Mirrors the bundled aed/default templates.

Previous defensive default could land on GLM3r in the FaaSr container, which doesn't have GLM3r installed. Check for /opt/glm/glm first; that's where the FaaSr container drops the embedded GLM binary. Log the resolved value at function entry to make the chain transparent.

fcre-targets-rs.csv is the remote-sensing (LST) target file — usually empty because cloud-free imagery is sparse. With it as obs_filename, create_obs_matrix returned all-NA observations, obs_count was 0 at every iter, and FLAREr's no-DA branch fired — which has a separate uninitialized-pars_corr bug. Using fcre-targets-insitu.csv (the rich daily-depth observations) matches Quinn's bundled aed config and keeps every iter on the DA branch.

It was the only non-vertical entry (multi_depth=0). v3.1-dev-netcdf-v2 adds a new code path for non-vertical obs that requires model_source / model_variable / model_depth_m columns plus a non_vertical_noise_config csv — neither of which our config has. Without those, extract_modeled errors with 'missing value where TRUE/FALSE needed'. Removing secchi sidesteps the path entirely and matches what the rest of the FaaSr test workflow exercises.

Three files brought in from FLARE-forecast/FLAREr@v3.1-dev-netcdf-v2's inst/extdata/configuration/aed/ (per Quinn's reference): - non_vertical_noise_config.csv (new file, verbatim from upstream) - non_vertical_noise_config_file key added to configure_flare_glm_aed.yml pointing at the new file - observations_config_aed.csv schema extended with model_source, model_variable, model_depth_m columns. Existing multi_depth=1 rows get NA in the new columns (no behavior change). secchi row restored using upstream's diagnostic/extc_coeff/1.0 pattern but keeping FCRE's Secchi_m_sample target name.

write_restart on v3.1-dev-netcdf-v2 returns .zip when GLM restart staging produces extra files (the default for this branch) and .nc otherwise. Hard-coding .nc in the next iteration's restart_file caused get_restart_file to 404 on iter 2 because iter 1 had uploaded the file as .zip. Capturing run_flare's return value and using basename() on whatever it actually wrote keeps the chain consistent regardless of which format the underlying write produced.

v3.1-dev-netcdf-v2's write_restart only saves model state for the timesteps listed in output_settings$restart_save_timesteps (default: just forecast_start_datetime). The next iter computes start_datetime = prev_fsd - 4 days, which is not in the saved set, so generate_initial_ conditions can't find the right offset and aborts. 'all' tells write_restart to save every timestep; cheap (5 small snapshots in the zip) and removes the offset-coupling between the loop's start_datetime math and the restart save list.

…lity Quinn's new write_restart on v3.1-dev-netcdf-v2 only persists GLM output timesteps in the zip — spinup_start (which is the date the team's '-4 days' pattern lands on) is never saved. generate_restart_initial_ conditions_from_zip then can't find that date and aborts with 'No GLM restart directory found for date X. Available dates: ...'. Shift to -3 days so the next iter's start_datetime lands on the earliest GLM-output date in the restart zip while still preserving a 4-day spinup window into the new forecast_start_datetime.

… heartbeat

Ashish-Ramrakhiani added 30 commits April 28, 2026 23:27

Configure glm_aed_flare_rs for FaaSr: +use_faasr, +vera4cast_* anonym…

58c9e34

…ous datastores

Refactor FCRE helpers: backward-compatible config arg, dispatch via f…

47ec529

…lare_*

Add run_fcre_aed_forecast: FaaSr-action entry point (preserves combin…

4840980

…ed_run_aed.R)

Fix FCRE callers: pass full path under bare bucket as faasr_prefix/re…

198e77f

…mote_folder

Resolve lake_directory via cloned-repo glob; chdir for relative path …

1d433bb

…consistency

Drop dead duplicate arrow::open_dataset block after tryCatch

864b1d2

Demo: glm_aed_flare_rs_demo_faasr sandbox config for FaaSr-mode run

1b136f0

Ashish-Ramrakhiani and others added 15 commits May 14, 2026 23:08

Add restart_fname placeholder under &output

cea5946

Last NML field FLAREr's run_model writes via glmtools::set_nml that wasn't present in glm3.nml. Mirrors the bundled aed/default templates.

Tighten inline comments in run_fcre_aed_forecast

de978c1

Bump sim_name for fresh demo run

04a629a

Bump sim_name to v3 for fresh demo iteration

de9747b

Bump sim_name to v4 for fresh demo run

21fa9a5

Restore keep_automation.txt to preserve upstream automation-keepalive…

da4f0dd

… heartbeat

Merge branch 'FLARE-forecast:main' into flare_faasr_int

c30b12b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add run_fcre_aed_forecast entry function for FaaSr-deployable GLM-AED forecast#92

Add run_fcre_aed_forecast entry function for FaaSr-deployable GLM-AED forecast#92
Ashish-Ramrakhiani wants to merge 45 commits into
FLARE-forecast:mainfrom
Ashish-Ramrakhiani:flare_faasr_int

Ashish-Ramrakhiani commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Ashish-Ramrakhiani commented May 22, 2026

Summary

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant