Skip to content

A3000 Laptop 6GB: Recommendations very poor #76

@Sophist-UK

Description

@Sophist-UK

Description

I have been pondering which LLM to run (when I have time) for some months, and watched as new models get announced and new ways of running them are announced (MTP) and new distilled versions are announced etc. and I have read a lot of Reddit posts for people wanting to do similar things with similar hardware. So I have a reasonable idea of what might be best for my hardware.

The output is below...

For some reason it failed to download some stuff and gave an error. But when I ran it again it didn't give an error but gave the exact same results.

A few things in the results that stood out:

  • The specific use case(s) matter - but there is no way for me to state I want e.g. agentic coding
  • Qwen3.6 27B dense rather than Qwen3.6 35B A3B MoE which would run much better with hybrid inference.
  • No TPS estimates - which are absolutely essential for evaluating LLMs - 35Tps vs. 1Tps is a huge impact
  • Q8 rather than Q5 or Q6? Really?
  • No MTP evaluations
  • No distil evaluations
  • No data regarding which runner should be used with which params

Steps to Reproduce

Run whichllm.

Hardware Info

Leaderboard fetch failed: Client error '429 Too Many Requests' for url 'https://datasets-server.huggingface.co/rows?dataset=open-llm-leaderboard%2Fcontents&config=default&split=train&offset=3800&length=100'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429
AA Index fetch failed, will use fallback: __NEXT_DATA__ payload not found

╭──────────────────────────────────────────────────────────────────────────────────────────────── Hardware Info ────────────────────────────────────────────────────────────────────────────────────────────────╮
│ GPU 0: NVIDIA RTX A3000 Laptop GPU — 6.0 GB (CUDA 13.2) — BW: N/A                                                                                                                                             │
│ GPU 1: Intel(R) UHD Graphics — shared memory — BW: N/A                                                                                                                                                        │
│ CPU: Unknown CPU — 8 cores (AVX2)                                                                                                                                                                             │
│ RAM: 31.3 GB                                                                                                                                                                                                  │
│ Disk free: 224.8 GB                                                                                                                                                                                           │
│ OS: windows                                                                                                                                                                                                   │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

                                                 Recommended Models
┏━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┓
┃   # ┃ Model                                         ┃ Params ┃ Quant  ┃ Published  ┃ Downloads ┃ Score ┃ License  ┃
┡━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━┩
│   1 │ Qwen/Qwen3.6-27B                              │  27.8B │  Q8_0  │ 2026-04-21 │      5.2M │  56.7 │ apache-… │
├─────┼───────────────────────────────────────────────┼────────┼────────┼────────────┼───────────┼───────┼──────────┤
│   2 │ google/gemma-4-31B-it                         │  32.7B │  Q6_K  │ 2026-03-11 │     11.3M │  54.3 │ apache-… │
├─────┼───────────────────────────────────────────────┼────────┼────────┼────────────┼───────────┼───────┼──────────┤
│   3 │ google/gemma-4-26B-A4B-it                     │  26.5B │  Q8_0  │ 2026-03-11 │     11.5M │  47.5 │ apache-… │
│     │                                               │ (3.8B… │        │            │           │       │          │
├─────┼───────────────────────────────────────────────┼────────┼────────┼────────────┼───────────┼───────┼──────────┤
│   4 │ Qwen/Qwen3-30B-A3B                            │  30.5B │  Q6_K  │ 2025-04-27 │      2.1M │  47.5 │ apache-… │
│     │                                               │ (3.0B… │        │            │           │       │          │
├─────┼───────────────────────────────────────────────┼────────┼────────┼────────────┼───────────┼───────┼──────────┤
│   5 │ zai-org/GLM-4.7-Flash                         │  31.2B │  Q6_K  │ 2026-01-19 │      1.1M │  45.8 │ mit      │
│     │                                               │ (12.0… │        │            │           │       │          │
├─────┼───────────────────────────────────────────────┼────────┼────────┼────────────┼───────────┼───────┼──────────┤
│   6 │ Qwen/QwQ-32B                                  │  32.8B │  Q6_K  │ 2025-03-05 │     62.5K │  45.4 │ apache-… │
├─────┼───────────────────────────────────────────────┼────────┼────────┼────────────┼───────────┼───────┼──────────┤
│   7 │ openai/gpt-oss-20b                            │  21.5B │  Q8_0  │ 2025-08-04 │      7.9M │  45.0 │ apache-… │
│     │                                               │ (3.6B… │        │            │           │       │          │
├─────┼───────────────────────────────────────────────┼────────┼────────┼────────────┼───────────┼───────┼──────────┤
│   8 │ deepseek-ai/DeepSeek-R1-Distill-Qwen-32B      │  32.8B │  Q6_K  │ 2025-01-20 │    608.3K │  44.6 │ mit      │
├─────┼───────────────────────────────────────────────┼────────┼────────┼────────────┼───────────┼───────┼──────────┤
│   9 │ mistralai/Mistral-Small-3.2-24B-Instruct-2506 │  24.0B │  Q8_0  │ 2025-06-19 │    632.7K │  43.9 │ apache-… │
├─────┼───────────────────────────────────────────────┼────────┼────────┼────────────┼───────────┼───────┼──────────┤
│  10 │ Qwen/Qwen3-14B                                │  14.8B │  Q8_0  │ 2025-04-27 │      1.7M │  43.3 │ apache-… │
└─────┴───────────────────────────────────────────────┴────────┴────────┴────────────┴───────────┴───────┴──────────┘
  Top pick confidence: Low (direct benchmark, gap +2.3, partial offload)
  Benchmark reference: 2026-05 curated snapshot; live AA / LiveBench / Aider merged when reachable.
  Speed caution: Low-confidence speed estimates in top ranks: #1, #2, #3
  Warning #1 Qwen3.6-27B: ~81% of layers will be offloaded to CPU RAM
  Warning #2 gemma-4-31B-it: ~79% of layers will be offloaded to CPU RAM
  Warning #3 gemma-4-26B-A4B-it: ~78% of layers will be offloaded to CPU RAM

Python Version

3.14

Operating System

Windows 11

whichllm Version

0.5.7

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions