Custom YOLOv11 digit detection model crashes with TIE728 acceleration on ESP32-S3, works slowly without it

### Checklist

- [x] Checked the issue tracker for similar issues to ensure this is not a duplicate.
- [x] Provided a clear description of your suggestion.
- [x] Included any relevant context or examples.

### Issue or Suggestion Description

Hi,

I'm deploying a custom digit detection model (0-9 classes) on an ESP32-S3 (N8R8) using the official yolo11_detect example, and I'm running into a persistent issue with TIE728 acceleration.

Environment:

ESP-IDF: v6.0.1

ESP-DL: tried both v3.3.2 and v2.1.0

Chip: ESP32-S3 (N8R8, Octal PSRAM 8MB enabled)

Model: espdet_pico (0.36M parameters, input 224x224), trained on SVHN dataset with esp-detection, quantized with esp-ppq to .espdl

What I did:

Cloned the official yolo11_detect example (espressif/esp-dl=3.3.2:yolo11_detect).

Replaced the model file with my own .espdl.

Enabled PSRAM (Octal, 8MB) via menuconfig.

Built and flashed.

What happens:

With ESP-DL v3.3.2, the program crashes immediately on inference with:
Guru Meditation Error: Core 0 panic'ed (LoadProhibited) at tie728_s8_conv2d_h_w_unaligned_c_n_activation_loop0_no_preload_bias.

With ESP-DL v2.1.0, the model works correctly (detects digits) but inference takes ~26 seconds per image, which is far from the expected 7–8 FPS.

What I've tried:

Manually modified dl_base_conv2d.cpp to force C-only path (i_impl_func = nullptr), which avoids the crash but keeps performance very slow because TIE728 is completely bypassed.

Locking esp-dl to v2.1.0 works reliably but with unacceptable speed.

I've verified that the official bus.jpg test image works in both cases, but custom digit images also work slowly.

My questions:

Is there a known compatibility issue between custom espdet_pico models and the TIE728 assembly kernels? What causes the LoadProhibited crash?

Is there a recommended quantization setting, model export parameter, or runtime configuration that would allow me to safely enable TIE728 and achieve the expected ~7 FPS?

Alternatively, is there a way to keep the latest ESP-DL but selectively disable only the problematic TIE728 kernel while retaining other optimizations?

Any guidance or workaround would be greatly appreciated. I can provide the .espdl model file or the training pipeline if needed.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom YOLOv11 digit detection model crashes with TIE728 acceleration on ESP32-S3, works slowly without it #16

Checklist

Issue or Suggestion Description

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Custom YOLOv11 digit detection model crashes with TIE728 acceleration on ESP32-S3, works slowly without it #16

Description

Checklist

Issue or Suggestion Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions