-
Notifications
You must be signed in to change notification settings - Fork 17
feat: zig compiler #103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: zig compiler #103
Conversation
Summary of ChangesHello @jiacai2050, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 此拉取请求引入了一篇全面深入分析 Zig 编译器内部架构和执行流程的文章。该文章详细阐述了 Zig 独特的懒惰编译和数据导向设计等核心理念,并细致描述了从源代码解析到机器码生成的完整多阶段编译管线。它着重介绍了编译器如何管理中间表示、处理编译期执行以及实现增量编译和自举过程,为理解 Zig 编译器的高效性和灵活性提供了全面的视角。 Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
|
||
| ### **4.2 结果位置语义 (Result Location Semantics)** | ||
|
|
||
| 在将递归的 AST 转换为线性的 ZIR 时,Zig 引入了 **ResultLoc**(结果位置)的概念,以优化数据流向 6。 当 AstGen 遍历一个 AST 节点时,它会向下传递一个 ResultLoc 参数,告诉子节点:“计算完结果后,请直接把数据写入这个位置。” |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [AutoCorrect Lint] <AutoCorrect> reported by reviewdog 🐶
| 在将递归的 AST 转换为线性的 ZIR 时,Zig 引入了 **ResultLoc**(结果位置)的概念,以优化数据流向 6。 当 AstGen 遍历一个 AST 节点时,它会向下传递一个 ResultLoc 参数,告诉子节点:“计算完结果后,请直接把数据写入这个位置。” | |
| 在将递归的 AST 转换为线性的 ZIR �(结果位置)的概念,以优化数据流向 6。当 AstGen 遍历一个 AST 节点时,它会向下传递一个 ResultLoc 参数,告诉子节点:“计算完结果后,请直接把数据写入这个位置。”��数据写入这个位置。” |
| 对于 Debug 构建,Zig 优先使用自研的后端(如 x86\_64, ARM64, WASM 后端)。 | ||
|
|
||
| * **直接生成**:这些后端直接将 AIR 转换为二进制机器码,跳过了生成 LLVM IR 的过程。 | ||
| * **速度优势**:由于省去了庞大的中间层,自举后端的编译速度通常比 LLVM 后端快数倍,极大地提升了开发时的“修改-编译-运行”循环效率 11。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [AutoCorrect Lint] <AutoCorrect> reported by reviewdog 🐶
| * **速度优势**:由于省去了庞大的中间层,自举后端的编译速度通常比 LLVM 后端快数倍,极大地提升了开发时的“修改-编译-运行”循环效率 11。 | |
| * **速度:由于省去了庞大的中间层,自举后端的编译速度通常比 LLVM 后端快数倍,极大地提升了开发时的“修改 - 编译 - 运行”循环效率 11。�� 11。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a new blog post, a deep dive into the Zig compiler's architecture. The article is very detailed and well-structured, providing a comprehensive overview of Zig's compilation pipeline. My review focuses on improving the language and consistency of the article. I've suggested a few changes to use more formal language and to ensure that the entire article is consistently in Chinese, including the "Works Cited" section. Overall, this is a great contribution.
|
|
||
| ## **摘要** | ||
|
|
||
| 这边文章的核心在于剖析 Zig 编译器独特的“懒惰编译”(Lazy Compilation)哲学、数据导向设计(Data-Oriented Design, DOD)在 AST/ZIR/AIR 阶段的具体应用,以及其如何通过统一的 InternPool 结构实现类型与值的同构存储。报告将详细阐述从源码解析到机器码生成的完整流水线,特别关注语义分析阶段(Sema)如何作为解释器与代码生成器的混合体,实现编译期代码执行(Comptime)与运行时指令发射的交织处理。此外,报告还将探讨 Zig 自举(Bootstrapping)过程的理论基础及其增量编译系统的实现原理。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The term "这边文章" is a bit colloquial. For a formal technical article, it's better to use "这篇文章" or "本文", which are more formal ways to say "this article".
本文的核心在于剖析 Zig 编译器独特的“懒惰编译”(Lazy Compilation)哲学、数据导向设计(Data-Oriented Design, DOD)在 AST/ZIR/AIR 阶段的具体应用,以及其如何通过统一的 InternPool 结构实现类型与值的同构存储。报告将详细阐述从源码解析到机器码生成的完整流水线,特别关注语义分析阶段(Sema)如何作为解释器与代码生成器的混合体,实现编译期代码执行(Comptime)与运行时指令发射的交织处理。此外,报告还将探讨 Zig 自举(Bootstrapping)过程的理论基础及其增量编译系统的实现原理。
|
|
||
| 这种架构不仅使得 Zig 能够拥有媲美脚本语言的元编程能力,同时保持了系统级语言所需的裸机性能与确定性内存控制,为未来的编译器设计提供了一个极具参考价值的理论范本。 | ||
|
|
||
| #### **Works cited** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 1. How Zig incremental compilation is implemented internally? \- Explain \- Ziggit Dev, accessed January 24, 2026, [https://ziggit.dev/t/how-zig-incremental-compilation-is-implemented-internally/3543](https://ziggit.dev/t/how-zig-incremental-compilation-is-implemented-internally/3543) | ||
| 2. Lazy Dependencies, Best Dependencies? \- Brainstorming \- Ziggit Dev, accessed January 24, 2026, [https://ziggit.dev/t/lazy-dependencies-best-dependencies/5509](https://ziggit.dev/t/lazy-dependencies-best-dependencies/5509) | ||
| 3. Zig Programming Language Compiler & Toolchain | Augment Code, accessed January 24, 2026, [https://www.augmentcode.com/open-source/ziglang/zig](https://www.augmentcode.com/open-source/ziglang/zig) | ||
| 4. perform AstGen on whole files at once (AST-\>ZIR) · Issue \#8516 · ziglang/zig \- GitHub, accessed January 24, 2026, [https://github.com/ziglang/zig/issues/8516](https://github.com/ziglang/zig/issues/8516) | ||
| 5. How Zig incremental compilation is implemented internally? \- \#2 by mlugg \- Explain \- Ziggit, accessed January 24, 2026, [https://ziggit.dev/t/how-zig-incremental-compilation-is-implemented-internally/3543/2](https://ziggit.dev/t/how-zig-incremental-compilation-is-implemented-internally/3543/2) | ||
| 6. Zig AstGen: AST \=\> ZIR \- Mitchell Hashimoto, accessed January 24, 2026, [https://mitchellh.com/zig/astgen](https://mitchellh.com/zig/astgen) | ||
| 7. Implementation of Comptime \- Explain \- Ziggit, accessed January 24, 2026, [https://ziggit.dev/t/implementation-of-comptime/5041](https://ziggit.dev/t/implementation-of-comptime/5041) | ||
| 8. Zig Sema: ZIR \=\> AIR \- Mitchell Hashimoto, accessed January 24, 2026, [https://mitchellh.com/zig/sema](https://mitchellh.com/zig/sema) | ||
| 9. zig/src/Air.zig at master · ziglang/zig \- GitHub, accessed January 24, 2026, [https://github.com/ziglang/zig/blob/master/src/Air.zig](https://github.com/ziglang/zig/blob/master/src/Air.zig) | ||
| 10. sometimes there is an unwanted memcpy when passing large structs by-value · Issue \#17580 · ziglang/zig \- GitHub, accessed January 24, 2026, [https://github.com/ziglang/zig/issues/17580](https://github.com/ziglang/zig/issues/17580) | ||
| 11. Stage 2 Proposal: Standardise a binary format for ZIR, and enable compilation to and from this representation · Issue \#5635 · ziglang/zig \- GitHub, accessed January 24, 2026, [https://github.com/ziglang/zig/issues/5635](https://github.com/ziglang/zig/issues/5635) | ||
| 12. Zig builds are getting faster \- Hacker News, accessed January 24, 2026, [https://news.ycombinator.com/item?id=45468698](https://news.ycombinator.com/item?id=45468698) | ||
| 13. Building self-hosted from the original C++ implementation \- Help \- Ziggit, accessed January 24, 2026, [https://ziggit.dev/t/building-self-hosted-from-the-original-c-implementation/6607](https://ziggit.dev/t/building-self-hosted-from-the-original-c-implementation/6607) | ||
| 14. Bootstrapping (compilers) \- Wikipedia, accessed January 24, 2026, [https://en.wikipedia.org/wiki/Bootstrapping\_(compilers)](https://en.wikipedia.org/wiki/Bootstrapping_\(compilers\)) | ||
| 15. Goodbye to the C++ Implementation of Zig \- Zig Programming Language, accessed January 24, 2026, [https://ziglang.org/news/goodbye-cpp/](https://ziglang.org/news/goodbye-cpp/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "accessed" date in the citations is in English (e.g., accessed January 24, 2026). To maintain language consistency throughout the article, consider translating this part into Chinese. For example, it could be changed to 访问于 2026年1月24日. This change should be applied to all entries in this section for better uniformity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a comprehensive technical blog post analyzing the internal architecture and execution flow of the Zig compiler. The article is written in Chinese and covers compiler design philosophy, compilation stages (parsing, AST generation, ZIR, semantic analysis, AIR, code generation, and linking), the InternPool structure, and the bootstrapping process.
Changes:
- Added detailed technical article about Zig compiler internals with sections covering lazy compilation, data-oriented design, and the complete compilation pipeline
- Included Mermaid flowchart diagrams to visualize the compiler execution flow
- Added 15 cited references to external sources about Zig compiler implementation
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Init --> Discovery | ||
|
|
||
| subgraph Frontend ["前端处理 (Per File)"] | ||
| Src --> Tokenizer | ||
| Tokenizer --> Parser | ||
| Parser --> AST | ||
| AST --> CacheCheck{Hash 缓存命中?} | ||
| CacheCheck -- Yes --> LoadZIR | ||
| CacheCheck -- No --> AstGen | ||
| AstGen --> ZIR | ||
| ZIR --> SaveCache[存入缓存] | ||
| end | ||
|
|
||
| Discovery --> AnalyzeRoot |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing "Discovery" node definition. The flowchart references "Discovery" (line 70) as a step after "Init", but this node is never defined in the diagram. This will likely cause the Mermaid diagram to fail rendering or show an incomplete flowchart.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
graph TD
Start([源代码文件]) --> Parse[词法与语法分析 Parse]
Parse --> Tokenize[标记化 Tokenization]
Tokenize --> |UTF-8 字节流| TokenStream[Token 流<br/>零拷贝切片引用]
TokenStream --> ASTBuild[构建 AST]
ASTBuild --> AST[(抽象语法树 AST<br/>结构数组 SoA)]
AST --> |紧凑内存布局<br/>主节点数组+额外数据数组| AstGen
subgraph AstGen_Phase [AST 降级阶段 AstGen]
AstGen[AST 降级 AstGen]
AstGen --> ResultLoc{结果位置语义<br/>ResultLoc}
ResultLoc --> |None/Discard| NoStore[不生成存储指令]
ResultLoc --> |Return| ReturnOpt[直接返回优化]
ResultLoc --> |Instruction| TempValue[临时值引用]
ResultLoc --> |Block| BlockEnd[写入块末尾]
NoStore --> ZIRGen[生成 ZIR 指令]
ReturnOpt --> ZIRGen
TempValue --> ZIRGen
BlockEnd --> ZIRGen
end
ZIRGen --> ZIR[(Zig 中间表示 ZIR<br/>无类型指令序列)]
ZIR --> |文件级缓存<br/>语法结构抽象| CheckCache{检查 ZIR 缓存}
CheckCache --> |缓存命中| LoadCache[加载缓存的 ZIR]
CheckCache --> |缓存未命中| GenerateZIR[生成新 ZIR]
LoadCache --> Sema
GenerateZIR --> Sema
subgraph InternPool_System [统一存储系统]
InternPool[(InternPool<br/>类型与值统一存储)]
InternPool --> |去重 Interning| UniqueIndex[结构唯一性索引]
InternPool --> |类型即值| TypeAsValue[Type 类型本身也是 Value]
InternPool --> |轻量级比较| U32Compare[u32 索引比较]
end
Sema[语义分析 Sema<br/>ZIR 虚拟机+解释器] --> |访问| InternPool
subgraph Sema_Phase [语义分析阶段 Sema]
Sema --> ParseOperand[操作数解析]
ParseOperand --> TypeInfer[类型推导与检查]
TypeInfer --> Coloring{着色与分流}
Coloring --> |所有操作数都是常量| ComptimePath[编译期路径 Comptime]
Coloring --> |包含运行时变量或副作用| RuntimePath[运行时路径 Runtime]
ComptimePath --> VMExecute[虚拟机中直接计算]
VMExecute --> StoreValue[结果存入 InternPool]
StoreValue --> InstMapConst[inst_map 记录为常量]
RuntimePath --> EmitAIR[发射 AIR 指令]
EmitAIR --> InstMapAIR[inst_map 记录为 AIR 引用]
InstMapConst --> CheckDep{检查依赖}
InstMapAIR --> CheckDep
CheckDep --> |发现未分析函数| SuspendTask[挂起当前任务]
SuspendTask --> AnalyzeDep[分析依赖函数]
AnalyzeDep --> ResumeTask[恢复当前任务]
CheckDep --> |遇到泛型调用| Monomorphize{单态化检查}
Monomorphize --> |已有实例| ReuseInstance[重用已生成实例]
Monomorphize --> |新类型参数| CloneZIR[克隆 ZIR]
CloneZIR --> BindType[绑定类型参数]
BindType --> GenerateSpecialized[生成专门化 AIR]
end
ResumeTask --> AIR
ReuseInstance --> AIR
GenerateSpecialized --> AIR
AIR[(分析后中间表示 AIR<br/>全类型+SSA 形式)]
AIR --> |显式控制流<br/>defer/errdefer/try 降级| Liveness
Liveness[活跃度分析<br/>Liveness Analysis]
Liveness --> Tombstone[生成墓碑表<br/>标记死亡点]
Tombstone --> OptimizeResource[资源优化<br/>寄存器/栈槽重用]
OptimizeResource --> BackendSelect{后端选择}
subgraph CodeGen_Phase [代码生成阶段]
BackendSelect --> |Debug 模式| SelfHosted[自举后端<br/>x86_64/ARM64/WASM]
BackendSelect --> |Release 模式| LLVM[LLVM 后端]
BackendSelect --> |跨平台| CBackend[C 后端]
SelfHosted --> |直接生成| MachineCode1[机器码]
LLVM --> ConvertLLVMIR[转换为 LLVM IR]
ConvertLLVMIR --> OptimizePipeline[LLVM 优化管线<br/>循环展开/向量化/内联]
OptimizePipeline --> MachineCode2[机器码]
CBackend --> GenerateC[生成 C 代码]
GenerateC --> CCompile[C 编译器编译]
CCompile --> MachineCode3[机器码]
MachineCode1 --> ImmediateRelease[即时释放 AIR 内存]
MachineCode2 --> ImmediateRelease
MachineCode3 --> ImmediateRelease
end
ImmediateRelease --> Link[链接阶段 Linking]
subgraph Link_Phase [链接阶段]
Link --> IncrementalCheck{增量编译检查}
IncrementalCheck --> |有变化| CalculateSize{计算新代码大小}
IncrementalCheck --> |无变化| SkipLink[跳过链接]
CalculateSize --> |新代码 <= 旧空间| InPlace[原地覆盖]
CalculateSize --> |新代码 > 旧空间| Append[追加到文件末尾<br/>修改跳转指令]
InPlace --> UpdateBinary[更新二进制文件]
Append --> UpdateBinary
UpdateBinary --> CrossCompile{交叉编译}
CrossCompile --> |根据 Target Triple| LoadConfig[加载平台配置]
LoadConfig --> LinkLibc[链接 libc 和依赖]
end
SkipLink --> Output
LinkLibc --> Output
Output([可执行文件])
style Parse fill:#e1f5ff
style AST fill:#fff4e1
style ZIR fill:#ffe1f5
style Sema fill:#f5e1ff
style InternPool fill:#e1ffe1
style AIR fill:#fff4e1
style BackendSelect fill:#e1f5ff
style Link fill:#ffe1e1
style Output fill:#90EE90
style ComptimePath fill:#FFD700
style RuntimePath fill:#FF6347
style InternPool fill:#00CED1
目前如果 mermaid 里面有 |
|
shtml比较严格,不允许自闭合 |
No description provided.