Intel Arc B70 / BMG G31: IPEX-Ollama oneAPI runner crashes on qwen2.5:0.5b, CPU path works

## Summary

I am testing IPEX-LLM / IPEX-Ollama on Intel Arc Pro B70 / BMG G31 on Ubuntu 24.04.4.

The Intel GPU is detected correctly through the oneAPI backend, and the model is offloaded to the GPU, but generation crashes. The same Qwen2.5 0.5B model works with the same IPEX-Ollama build when running in CPU mode.

## Hardware / OS

- GPU: Intel Graphics [0xe223], BMG G31 / Arc Pro B70
- VRAM reported by IPEX-Ollama: 31.9 GiB total, 29.8 GiB available
- Kernel driver: xe
- Resizable BAR: BAR 2 current size 32GB
- OS: Ubuntu 24.04.4
- Kernel: 6.17.0-29-generic

## IPEX / Ollama environment

- ipex-llm: 2.3.0b20251110
- IPEX-Ollama reports version: 0.9.3
- OLLAMA_INTEL_GPU=true
- OLLAMA_NUM_GPU=999
- OLLAMA_NUM_PARALLEL=1
- OLLAMA_CONTEXT_LENGTH=1024
- ONEAPI_DEVICE_SELECTOR=level_zero:0
- SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
- ZES_ENABLE_SYSMAN=1

## What works

- Host OpenCL sees Intel Graphics [0xe223].
- VAAPI works.
- PyTorch XPU matmul works.
- IPEX-Ollama server starts.
- IPEX-Ollama detects the Intel GPU:
  - library=oneapi
  - name="Intel(R) Graphics [0xe223]"
  - total="31.9 GiB"
- qwen2.5:0.5b works on CPU control port with:
  - OLLAMA_INTEL_GPU=false
  - OLLAMA_NUM_GPU=0

## What fails

### qwen2.5:0.5b on Intel GPU / oneAPI

The model loads/offloads to the Intel GPU, but generation fails with:

HTTP 500:
model runner has unexpectedly stopped

Reducing:
- OLLAMA_NUM_PARALLEL=1
- OLLAMA_CONTEXT_LENGTH=1024
- num_batch=128
- num_batch=64

does not fix it.

### gemma4:e4b

gemma4:e4b cannot be tested with this build because the loader reports:

unknown model architecture: 'gemma4'

This appears to be due to IPEX-Ollama 0.9.3 not supporting the newer gemma4 GGUF architecture.

## Expected behavior

qwen2.5:0.5b should generate normally on Intel Arc B70 through oneAPI/SYCL.

## Actual behavior

qwen2.5:0.5b works on CPU, but the oneAPI GPU runner crashes during generation.

## Attachment

I attached a small bugreport tar.gz with:
- start scripts
- GPU-mode log
- CPU-control log
- conda list
- pip freeze
- system/GPU/package information

[bugreport-b70-ipex-ollama-2026-05-21-193230.tar.gz](https://github.com/user-attachments/files/28117248/bugreport-b70-ipex-ollama-2026-05-21-193230.tar.gz)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intel Arc B70 / BMG G31: IPEX-Ollama oneAPI runner crashes on qwen2.5:0.5b, CPU path works #426

Summary

Hardware / OS

IPEX / Ollama environment

What works

What fails

qwen2.5:0.5b on Intel GPU / oneAPI

gemma4:e4b

Expected behavior

Actual behavior

Attachment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Intel Arc B70 / BMG G31: IPEX-Ollama oneAPI runner crashes on qwen2.5:0.5b, CPU path works #426

Description

Summary

Hardware / OS

IPEX / Ollama environment

What works

What fails

qwen2.5:0.5b on Intel GPU / oneAPI

gemma4:e4b

Expected behavior

Actual behavior

Attachment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions