LLM

大模型自用资料。

前沿解读

https://magazine.sebastianraschka.com/

GGUF模型下载

https://modelscope.cn/models

论文 & 模型 & 技术报告

deepseek

基座模型

DeepSeek-VL2: https://github.com/deepseek-ai/DeepSeek-VL2/blob/main/DeepSeek_VL2_paper.pdf https://arxiv.org/pdf/2412.10302
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs viaReinforcement Learning https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf
DeepSeek-V3 Technical Report https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model https://arxiv.org/abs/2405.04434
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models https://arxiv.org/abs/2401.06066
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism https://arxiv.org/abs/2401.02954

代码模型

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence https://arxiv.org/abs/2406.11931
DeepSeek-Coder: When the Large Language Model Meets Programming - The Rise of Code Intelligence https://arxiv.org/pdf/2401.14196

数学推理

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models https://arxiv.org/abs/2402.03300
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

定理证明

DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via RL for Subgoal Decomposition
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search https://arxiv.org/abs/2408.08152
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data https://arxiv.org/abs/2405.14333

多模态

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding https://arxiv.org/abs/2412.10302
DeepSeek-VL: Towards Real-World Vision-Language Understanding

其它

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention https://arxiv.org/abs/2502.11089

Qwen

Qwen3: https://github.com/QwenLM/Qwen3/blob/main/Qwen3_Technical_Report.pdf

Gemini

Gemini 2.5: https://link.zhihu.com/?target=https%3A//storage.googleapis.com/deepmind-media/gemini/gemini_v2_5_report.pdf

Kimi

K2: https://github.com/MoonshotAI/Kimi-K2/blob/main/tech_report.pdf

GLM

GLM-4.5: https://arxiv.org/abs/2508.06471

o1

let's verify step by step: https://arxiv.org/pdf/2305.20050
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters: https://arxiv.org/pdf/2408.03314

其它

ChatGLM2 CODE
LLaMA: Open and Efficient Foundation Language Models PDF CODE
ChatGLM CODE
PaLM: Scaling Language Modeling with Pathways PDF
InstructGPT PDF
GPT 3.0 PDF
T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer PDF CODE
GPT 2.0 PDF
GPT 1.0 PDF
BERT PDF
Transformers PDF

Transformer原理

transformer-explainer https://poloclub.github.io/transformer-explainer
The Illustrated Transformer http://jalammar.github.io/illustrated-transformer/

工具

langchain

书籍

大语言模型

评测

A Survey on Evaluation of Large Language Models

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM

前沿解读

GGUF模型下载

论文 & 模型 & 技术报告

deepseek

基座模型

代码模型

数学推理

定理证明

多模态

其它

Qwen

Gemini

Kimi

GLM

o1

其它

Transformer原理

工具

书籍

评测

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

hubojing/LLM

Folders and files

Latest commit

History

Repository files navigation

LLM

前沿解读

GGUF模型下载

论文 & 模型 & 技术报告

deepseek

基座模型

代码模型

数学推理

定理证明

多模态

其它

Qwen

Gemini

Kimi

GLM

o1

其它

Transformer原理

工具

书籍

评测

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages