AWQ quantization models have abnormal memory usage / OOM on PTL 358H

## Summary

When loading Qwen3.5 AWQ offline-quantized models with the `llm-scaler-vllm` image on PTL 358H, the memory footprint of the 9B model matches the FP8 baseline (no memory savings from AWQ), and the 35B-A3B model OOMs.

## Environment

- **Platform:** PTL 358H (16 cores); iGPU: B390 Graphics, 12 Xe Cores, 122 TOPS
- **Docker image:** `intel/llm-scaler-vllm:0.14.0-b7.1`
- **HF models:**
  - `QuantTrio/Qwen3.5-9B-AWQ`
  - `QuantTrio/Qwen3.5-35B-A3B-AWQ`

## Reproduction

Entrypoint script:

```bash
TORCH_LLM_ALLREDUCE=1 \
VLLM_USE_V1=1 \
CCL_ZE_IPC_EXCHANGE=pidfd \
VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 \
VLLM_WORKER_MULTIPROC_METHOD=spawn \
python3 -m vllm.entrypoints.openai.api_server \
    --model ${MODEL_PATH} \
    --served-model-name ${SERVED_MODEL_NAME} \
    --enforce-eager \
    --port 8000 \
    --host 0.0.0.0 \
    --trust-remote-code \
    --gpu-memory-util=0.6 \
    --max-num-batched-tokens=8192 \
    --disable-log-requests \
    --max-model-len=${MAX_MODEL_LEN} \
    --block-size 64 \
    --quantization awq \
    -tp=1 \
    --enable_prefix_caching \
    --enable-auto-tool-choice \
    --tool-call-parser qwen3_coder \
    --allow-deprecated-quantization ipex_awq
```
## Observed behavior
QuantTrio/Qwen3.5-9B-AWQ — memory usage is roughly equivalent to the FP8 variant, i.e. AWQ appears to bring no memory reduction. See attached log: load_qwen3.5_9b_awq.log.
QuantTrio/Qwen3.5-35B-A3B-AWQ — OOM on load.
## Expected behavior
AWQ quantized weights should fit in significantly less memory than FP8, and the 35B-A3B AWQ model should load within the iGPU memory budget on this platform.

## Attachments

[load_qwen3.5_9b_awq.log](https://github.com/user-attachments/files/27584286/load_qwen3.5_9b_awq.log)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWQ quantization models have abnormal memory usage / OOM on PTL 358H #402

Summary

Environment

Reproduction

Observed behavior

Expected behavior

Attachments

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

AWQ quantization models have abnormal memory usage / OOM on PTL 358H #402

Description

Summary

Environment

Reproduction

Observed behavior

Expected behavior

Attachments

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions