Skip to content

Add sub-1.5bpe frontier + entropy-bpe kernels and result sidecars#42

Merged
jagmarques merged 1 commit into
mainfrom
company/land-frontier-kernels
Jun 16, 2026
Merged

Add sub-1.5bpe frontier + entropy-bpe kernels and result sidecars#42
jagmarques merged 1 commit into
mainfrom
company/land-frontier-kernels

Conversation

@jagmarques

Copy link
Copy Markdown
Owner

Summary

  • Lands four Kaggle reproducibility kernels for the sub-1.5bpe frontier and NIAH-cliff data cited in the paper
  • Adds six result JSON sidecars in experiments/kaggle/results/ so numeric claims are traceable to runs

Kernels

Dir Model Protocol Configs
nq_mistral_subbpe Mistral-7B-Inst-v0.3 AQUA-iso PPL + live zstd-L22/Shannon bpe FP16, K3V2→K1V1 pb=0
nq_llama31_subbpe Llama-3.1-8B-Inst same; rope_scaling via prepare_rope_scaling() FP16, K4V2→K1V2 pb=0
nq_qwen3_subbpe_entropy Qwen3-8B (NF4 weights) AQUA-iso PPL + entropy-coded bpe K3V2/K2V2 pb=0/pb=1
nq_mistral_niah_frontier Mistral-7B-Inst-v0.3 chat-template NIAH, 4K+8K, 5 depths FP16, K4V2→K1V2 pb=0

All kernels use ungated model mirrors; no HF_TOKEN required on secondary accounts.

Result sidecars

Six JSONs in experiments/kaggle/results/ covering Mistral, Llama-3.1, Qwen3, Yi and HQMQ comparison runs.

Sanitization

  • Removed jagmardrop internal account name from comment lines (lines 8/33 Mistral, line 8 Llama)
  • Removed # From CLAUDE.md: block with 3 lines of internal context from Qwen3 kernel header
  • kernel-metadata.json id fields retain jagmardrop/ namespace (Kaggle push requires it)
  • HF_TOKEN: optional env-only throughout (os.environ.get("HF_TOKEN")); None is fine for ungated models

Proof of work

VERIFY: git -C /tmp/land_kernels diff --stat origin/main...HEAD

 experiments/kaggle/nq_llama31_subbpe/kernel-metadata.json  |   20 +
 experiments/kaggle/nq_llama31_subbpe/nq_llama31_subbpe.py  |  602 +++++++
 experiments/kaggle/nq_mistral_niah_frontier/kernel-metadata.json  |   20 +
 experiments/kaggle/nq_mistral_niah_frontier/nq_mistral_niah_frontier.py  |  445 +++++
 experiments/kaggle/nq_mistral_subbpe/kernel-metadata.json  |   20 +
 experiments/kaggle/nq_mistral_subbpe/nq_mistral_subbpe.py  |  597 +++++++
 experiments/kaggle/nq_qwen3_subbpe_entropy/kernel-metadata.json  |   20 +
 experiments/kaggle/nq_qwen3_subbpe_entropy/nq_qwen3_subbpe_entropy.py  |  611 +++++++
 experiments/kaggle/results/nq_hqmq_qwen3_v3.json  |  621 +++++++
 experiments/kaggle/results/nq_llama31_subbpe.json  | 1003 +++++++++++
 experiments/kaggle/results/nq_mistral_niah_frontier.json  |  621 +++++++
 experiments/kaggle/results/nq_mistral_subbpe_frontier.json | 1094 ++++++++++++
 experiments/kaggle/results/nq_qwen3_subbpe_entropy.json  | 1031 ++++++++++++
 experiments/kaggle/results/nq_yi_subbpe_niah.json  | 1736 ++++++++++++++++++++
 14 files changed, 8441 insertions(+)

Secret scan: SCANNER: gitleaks - clean
Token grep: CLEAN - no token patterns found
No .company/, CLAUDE.md, *.tex, *.bib, paper/*, .planning/ staged.

…ars (Mistral/Llama/Qwen3 + Mistral NIAH)

Lands four reproducibility kernels + six result JSONs:
- nq_mistral_subbpe: Mistral-7B-Inst-v0.3 AQUA-iso PPL + live zstd-L22/Shannon bpe (K3V2 to K1V1 pb=0)
- nq_llama31_subbpe: Llama-3.1-8B-Inst same protocol; rope_scaling propagated via prepare_rope_scaling()
- nq_qwen3_subbpe_entropy: Qwen3-8B (NF4 weights) K3V2/K2V2 pb=0/pb=1 with entropy-coded bpe measurement
- nq_mistral_niah_frontier: Mistral-7B-Inst-v0.3 chat-template NIAH at 4K+8K across six quant configs

Result sidecars in experiments/kaggle/results/ confirm the sub-1.5bpe PPL frontier and NIAH
cliff data cited in the paper. All kernels use ungated model mirrors; no HF_TOKEN required.
@jagmarques jagmarques marked this pull request as ready for review June 16, 2026 17:17
@jagmarques jagmarques merged commit e9934d0 into main Jun 16, 2026
3 checks passed
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@jagmarques jagmarques deleted the company/land-frontier-kernels branch June 19, 2026 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant