Checklist
Issue or Suggestion Description
Hi,
I'm deploying a custom digit detection model (0-9 classes) on an ESP32-S3 (N8R8) using the official yolo11_detect example, and I'm running into a persistent issue with TIE728 acceleration.
Environment:
ESP-IDF: v6.0.1
ESP-DL: tried both v3.3.2 and v2.1.0
Chip: ESP32-S3 (N8R8, Octal PSRAM 8MB enabled)
Model: espdet_pico (0.36M parameters, input 224x224), trained on SVHN dataset with esp-detection, quantized with esp-ppq to .espdl
What I did:
Cloned the official yolo11_detect example (espressif/esp-dl=3.3.2:yolo11_detect).
Replaced the model file with my own .espdl.
Enabled PSRAM (Octal, 8MB) via menuconfig.
Built and flashed.
What happens:
With ESP-DL v3.3.2, the program crashes immediately on inference with:
Guru Meditation Error: Core 0 panic'ed (LoadProhibited) at tie728_s8_conv2d_h_w_unaligned_c_n_activation_loop0_no_preload_bias.
With ESP-DL v2.1.0, the model works correctly (detects digits) but inference takes ~26 seconds per image, which is far from the expected 7–8 FPS.
What I've tried:
Manually modified dl_base_conv2d.cpp to force C-only path (i_impl_func = nullptr), which avoids the crash but keeps performance very slow because TIE728 is completely bypassed.
Locking esp-dl to v2.1.0 works reliably but with unacceptable speed.
I've verified that the official bus.jpg test image works in both cases, but custom digit images also work slowly.
My questions:
Is there a known compatibility issue between custom espdet_pico models and the TIE728 assembly kernels? What causes the LoadProhibited crash?
Is there a recommended quantization setting, model export parameter, or runtime configuration that would allow me to safely enable TIE728 and achieve the expected ~7 FPS?
Alternatively, is there a way to keep the latest ESP-DL but selectively disable only the problematic TIE728 kernel while retaining other optimizations?
Any guidance or workaround would be greatly appreciated. I can provide the .espdl model file or the training pipeline if needed.
Checklist
Issue or Suggestion Description
Hi,
I'm deploying a custom digit detection model (0-9 classes) on an ESP32-S3 (N8R8) using the official yolo11_detect example, and I'm running into a persistent issue with TIE728 acceleration.
Environment:
ESP-IDF: v6.0.1
ESP-DL: tried both v3.3.2 and v2.1.0
Chip: ESP32-S3 (N8R8, Octal PSRAM 8MB enabled)
Model: espdet_pico (0.36M parameters, input 224x224), trained on SVHN dataset with esp-detection, quantized with esp-ppq to .espdl
What I did:
Cloned the official yolo11_detect example (espressif/esp-dl=3.3.2:yolo11_detect).
Replaced the model file with my own .espdl.
Enabled PSRAM (Octal, 8MB) via menuconfig.
Built and flashed.
What happens:
With ESP-DL v3.3.2, the program crashes immediately on inference with:
Guru Meditation Error: Core 0 panic'ed (LoadProhibited) at tie728_s8_conv2d_h_w_unaligned_c_n_activation_loop0_no_preload_bias.
With ESP-DL v2.1.0, the model works correctly (detects digits) but inference takes ~26 seconds per image, which is far from the expected 7–8 FPS.
What I've tried:
Manually modified dl_base_conv2d.cpp to force C-only path (i_impl_func = nullptr), which avoids the crash but keeps performance very slow because TIE728 is completely bypassed.
Locking esp-dl to v2.1.0 works reliably but with unacceptable speed.
I've verified that the official bus.jpg test image works in both cases, but custom digit images also work slowly.
My questions:
Is there a known compatibility issue between custom espdet_pico models and the TIE728 assembly kernels? What causes the LoadProhibited crash?
Is there a recommended quantization setting, model export parameter, or runtime configuration that would allow me to safely enable TIE728 and achieve the expected ~7 FPS?
Alternatively, is there a way to keep the latest ESP-DL but selectively disable only the problematic TIE728 kernel while retaining other optimizations?
Any guidance or workaround would be greatly appreciated. I can provide the .espdl model file or the training pipeline if needed.