Skip to content

Refactor sequence-model engine: validation, early-stopping, global grad-norm clip, sync train, binary IO#36

Open
RemindCZ wants to merge 1 commit intomasterfrom
codex/refactor-cuda/c++-mt5-dll-to-sequence-model-engine
Open

Refactor sequence-model engine: validation, early-stopping, global grad-norm clip, sync train, binary IO#36
RemindCZ wants to merge 1 commit intomasterfrom
codex/refactor-cuda/c++-mt5-dll-to-sequence-model-engine

Conversation

@RemindCZ
Copy link
Copy Markdown
Owner

Motivation

  • Bring the DLL from an LSTM-only demo toward a technically honest sequence-model engine by adding validation support, early stopping, global gradient-norm clipping, proper sync/async training semantics and robust binary state I/O.
  • Keep the real LSTM implementation intact while exposing necessary controls (mode, seed, global grad norm) to MQL5 consumers and removing/neutralizing misleading diagnostics.
  • Provide a versioned binary format for portable save/load of model topology + weights.

Description

  • Added versioned binary header ModelBinHeader and implemented SaveBinaryFile / LoadBinaryFile plus new DLL exports DN_SaveBinaryState and DN_LoadBinaryState that write/read a compact binary representation (magic + version + topology + weights).
  • Added validation dataset buffers (X_val, T_val), LoadValidationBatch export and a generalized ComputeDatasetMSE path; added tracking of best_val_loss and a patience-based early stopping flow that snapshots/restores best weights when validation is used.
  • Implemented global gradient-norm clipping: new kernel kScaleInplace, helper ApplyGlobalGradNormClipping and integration before optimizer updates; exposed DN_SetGlobalGradNormClip / DN_GetGlobalGradNormClip.
  • Fixed training API semantics by adding TrainSync_Locked and wiring DN_Train to synchronous training while preserving the existing StartTrainingAsync_Locked used by DN_TrainAsync; added train/eval mode and RNG seed control with exports DN_SetTrainMode and DN_SetSeed.
  • Exposed validation and early-stop controls via exports: DN_LoadValidationBatch, DN_SetEarlyStoppingPatience, DN_GetBestValidationLoss; also neutralized misleading telemetry by returning NaN for unsupported per-layer diagnostics.
  • Minor utilities and safety: ComputeDatasetMSE (shared full/validation evaluation), progress updates now optionally track validation metric for best-MSE reporting, and maintenance of early-stopping counters.
  • Note: LSTM implementation was kept and remains real/untouched; GRULayer and RNNLayer are still thin derivations of LSTMLayer (placeholders) and therefore do not yet provide a distinct GRU or vanilla RNN algorithm in this change.

Testing

  • Performed repository-level automated checks: searched/validated presence of new symbols and exports via rg/nl (symbol checks succeeded); commit created successfully (git commit succeeded).
  • Verified file edits and diffs with git diff / git status (changes recorded as one modified file kernel.cu).
  • Attempted to run a CUDA toolchain check (nvcc --version) but the environment lacks nvcc so no build/test of CUDA kernels could be performed (no binary compiled or runtime tests executed).
  • No runtime functional tests were available in this environment; recommend a CI build (NVCC) and a small training/prediction round-trip (train on toy data, validate save/load, test early-stop) before production use.

Codex Task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant