Skip to content

Custom YOLOv11 digit detection model crashes with TIE728 acceleration on ESP32-S3, works slowly without it #16

@zongxiongw

Description

@zongxiongw

Checklist

  • Checked the issue tracker for similar issues to ensure this is not a duplicate.
  • Provided a clear description of your suggestion.
  • Included any relevant context or examples.

Issue or Suggestion Description

Hi,

I'm deploying a custom digit detection model (0-9 classes) on an ESP32-S3 (N8R8) using the official yolo11_detect example, and I'm running into a persistent issue with TIE728 acceleration.

Environment:

ESP-IDF: v6.0.1

ESP-DL: tried both v3.3.2 and v2.1.0

Chip: ESP32-S3 (N8R8, Octal PSRAM 8MB enabled)

Model: espdet_pico (0.36M parameters, input 224x224), trained on SVHN dataset with esp-detection, quantized with esp-ppq to .espdl

What I did:

Cloned the official yolo11_detect example (espressif/esp-dl=3.3.2:yolo11_detect).

Replaced the model file with my own .espdl.

Enabled PSRAM (Octal, 8MB) via menuconfig.

Built and flashed.

What happens:

With ESP-DL v3.3.2, the program crashes immediately on inference with:
Guru Meditation Error: Core 0 panic'ed (LoadProhibited) at tie728_s8_conv2d_h_w_unaligned_c_n_activation_loop0_no_preload_bias.

With ESP-DL v2.1.0, the model works correctly (detects digits) but inference takes ~26 seconds per image, which is far from the expected 7–8 FPS.

What I've tried:

Manually modified dl_base_conv2d.cpp to force C-only path (i_impl_func = nullptr), which avoids the crash but keeps performance very slow because TIE728 is completely bypassed.

Locking esp-dl to v2.1.0 works reliably but with unacceptable speed.

I've verified that the official bus.jpg test image works in both cases, but custom digit images also work slowly.

My questions:

Is there a known compatibility issue between custom espdet_pico models and the TIE728 assembly kernels? What causes the LoadProhibited crash?

Is there a recommended quantization setting, model export parameter, or runtime configuration that would allow me to safely enable TIE728 and achieve the expected ~7 FPS?

Alternatively, is there a way to keep the latest ESP-DL but selectively disable only the problematic TIE728 kernel while retaining other optimizations?

Any guidance or workaround would be greatly appreciated. I can provide the .espdl model file or the training pipeline if needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions