Skip to content

Intel Arc B70 / BMG G31: IPEX-Ollama oneAPI runner crashes on qwen2.5:0.5b, CPU path works #426

@AnonyymiIsmo

Description

@AnonyymiIsmo

Summary

I am testing IPEX-LLM / IPEX-Ollama on Intel Arc Pro B70 / BMG G31 on Ubuntu 24.04.4.

The Intel GPU is detected correctly through the oneAPI backend, and the model is offloaded to the GPU, but generation crashes. The same Qwen2.5 0.5B model works with the same IPEX-Ollama build when running in CPU mode.

Hardware / OS

  • GPU: Intel Graphics [0xe223], BMG G31 / Arc Pro B70
  • VRAM reported by IPEX-Ollama: 31.9 GiB total, 29.8 GiB available
  • Kernel driver: xe
  • Resizable BAR: BAR 2 current size 32GB
  • OS: Ubuntu 24.04.4
  • Kernel: 6.17.0-29-generic

IPEX / Ollama environment

  • ipex-llm: 2.3.0b20251110
  • IPEX-Ollama reports version: 0.9.3
  • OLLAMA_INTEL_GPU=true
  • OLLAMA_NUM_GPU=999
  • OLLAMA_NUM_PARALLEL=1
  • OLLAMA_CONTEXT_LENGTH=1024
  • ONEAPI_DEVICE_SELECTOR=level_zero:0
  • SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
  • ZES_ENABLE_SYSMAN=1

What works

  • Host OpenCL sees Intel Graphics [0xe223].
  • VAAPI works.
  • PyTorch XPU matmul works.
  • IPEX-Ollama server starts.
  • IPEX-Ollama detects the Intel GPU:
    • library=oneapi
    • name="Intel(R) Graphics [0xe223]"
    • total="31.9 GiB"
  • qwen2.5:0.5b works on CPU control port with:
    • OLLAMA_INTEL_GPU=false
    • OLLAMA_NUM_GPU=0

What fails

qwen2.5:0.5b on Intel GPU / oneAPI

The model loads/offloads to the Intel GPU, but generation fails with:

HTTP 500:
model runner has unexpectedly stopped

Reducing:

  • OLLAMA_NUM_PARALLEL=1
  • OLLAMA_CONTEXT_LENGTH=1024
  • num_batch=128
  • num_batch=64

does not fix it.

gemma4:e4b

gemma4:e4b cannot be tested with this build because the loader reports:

unknown model architecture: 'gemma4'

This appears to be due to IPEX-Ollama 0.9.3 not supporting the newer gemma4 GGUF architecture.

Expected behavior

qwen2.5:0.5b should generate normally on Intel Arc B70 through oneAPI/SYCL.

Actual behavior

qwen2.5:0.5b works on CPU, but the oneAPI GPU runner crashes during generation.

Attachment

I attached a small bugreport tar.gz with:

  • start scripts
  • GPU-mode log
  • CPU-control log
  • conda list
  • pip freeze
  • system/GPU/package information

bugreport-b70-ipex-ollama-2026-05-21-193230.tar.gz

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions