Summary
I am testing IPEX-LLM / IPEX-Ollama on Intel Arc Pro B70 / BMG G31 on Ubuntu 24.04.4.
The Intel GPU is detected correctly through the oneAPI backend, and the model is offloaded to the GPU, but generation crashes. The same Qwen2.5 0.5B model works with the same IPEX-Ollama build when running in CPU mode.
Hardware / OS
- GPU: Intel Graphics [0xe223], BMG G31 / Arc Pro B70
- VRAM reported by IPEX-Ollama: 31.9 GiB total, 29.8 GiB available
- Kernel driver: xe
- Resizable BAR: BAR 2 current size 32GB
- OS: Ubuntu 24.04.4
- Kernel: 6.17.0-29-generic
IPEX / Ollama environment
- ipex-llm: 2.3.0b20251110
- IPEX-Ollama reports version: 0.9.3
- OLLAMA_INTEL_GPU=true
- OLLAMA_NUM_GPU=999
- OLLAMA_NUM_PARALLEL=1
- OLLAMA_CONTEXT_LENGTH=1024
- ONEAPI_DEVICE_SELECTOR=level_zero:0
- SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
- ZES_ENABLE_SYSMAN=1
What works
- Host OpenCL sees Intel Graphics [0xe223].
- VAAPI works.
- PyTorch XPU matmul works.
- IPEX-Ollama server starts.
- IPEX-Ollama detects the Intel GPU:
- library=oneapi
- name="Intel(R) Graphics [0xe223]"
- total="31.9 GiB"
- qwen2.5:0.5b works on CPU control port with:
- OLLAMA_INTEL_GPU=false
- OLLAMA_NUM_GPU=0
What fails
qwen2.5:0.5b on Intel GPU / oneAPI
The model loads/offloads to the Intel GPU, but generation fails with:
HTTP 500:
model runner has unexpectedly stopped
Reducing:
- OLLAMA_NUM_PARALLEL=1
- OLLAMA_CONTEXT_LENGTH=1024
- num_batch=128
- num_batch=64
does not fix it.
gemma4:e4b
gemma4:e4b cannot be tested with this build because the loader reports:
unknown model architecture: 'gemma4'
This appears to be due to IPEX-Ollama 0.9.3 not supporting the newer gemma4 GGUF architecture.
Expected behavior
qwen2.5:0.5b should generate normally on Intel Arc B70 through oneAPI/SYCL.
Actual behavior
qwen2.5:0.5b works on CPU, but the oneAPI GPU runner crashes during generation.
Attachment
I attached a small bugreport tar.gz with:
- start scripts
- GPU-mode log
- CPU-control log
- conda list
- pip freeze
- system/GPU/package information
bugreport-b70-ipex-ollama-2026-05-21-193230.tar.gz
Summary
I am testing IPEX-LLM / IPEX-Ollama on Intel Arc Pro B70 / BMG G31 on Ubuntu 24.04.4.
The Intel GPU is detected correctly through the oneAPI backend, and the model is offloaded to the GPU, but generation crashes. The same Qwen2.5 0.5B model works with the same IPEX-Ollama build when running in CPU mode.
Hardware / OS
IPEX / Ollama environment
What works
What fails
qwen2.5:0.5b on Intel GPU / oneAPI
The model loads/offloads to the Intel GPU, but generation fails with:
HTTP 500:
model runner has unexpectedly stopped
Reducing:
does not fix it.
gemma4:e4b
gemma4:e4b cannot be tested with this build because the loader reports:
unknown model architecture: 'gemma4'
This appears to be due to IPEX-Ollama 0.9.3 not supporting the newer gemma4 GGUF architecture.
Expected behavior
qwen2.5:0.5b should generate normally on Intel Arc B70 through oneAPI/SYCL.
Actual behavior
qwen2.5:0.5b works on CPU, but the oneAPI GPU runner crashes during generation.
Attachment
I attached a small bugreport tar.gz with:
bugreport-b70-ipex-ollama-2026-05-21-193230.tar.gz