diff --git a/INSTALL.md b/INSTALL.md new file mode 100644 index 0000000..bc1fd22 --- /dev/null +++ b/INSTALL.md @@ -0,0 +1,105 @@ +# INSTALL — preparando o ambiente + +Guia para deixar o ambiente pronto para compilar e rodar o Crusty. + +## Pré-requisitos + +| Ferramenta | Versão mínima | Para quê | +|---|---|---| +| [Rust](https://rustup.rs/) (rustc + cargo) | 1.70+ | Compilar o próprio Crusty | +| `gcc` | qualquer versão recente | Montar (`as`) e linkar (`ld`) os executáveis ELF gerados pelo backend x86-64 | +| Linux x86-64 | — | O backend gera assembly x86-64 / System V ABI. Não há suporte a outras arquiteturas ou a Windows/macOS nativo | + +Sem `gcc` no `PATH`, o compilador ainda funciona até a emissão de assembly (`--emit=asm`), mas os testes de smoke e2e (`tests/exe_smoke_test.rs`, `tests/codegen_smoke.rs`, `tests/double_codegen_test.rs`) são automaticamente pulados (skip), e `--emit=obj`/`--emit=exe` falham. + +## 1. Instalar o Rust + +```bash +curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh +source "$HOME/.cargo/env" +rustup update stable +``` + +Verifique: + +```bash +rustc --version # esperado: 1.70 ou mais recente +cargo --version +``` + +## 2. Instalar o gcc (toolchain de montagem/link) + +**Debian/Ubuntu** +```bash +sudo apt update && sudo apt install -y gcc +``` + +**Arch/Manjaro** +```bash +sudo pacman -S gcc +``` + +**Fedora** +```bash +sudo dnf install gcc +``` + +Verifique: + +```bash +gcc --version +``` + +## 3. Obter o código + +```bash +git clone https://github.com/Bappoz/Crusty.git +cd Crusty +``` + +(Se você já está dentro do repositório, pule esta etapa.) + +## 4. Compilar o projeto + +```bash +cargo build --release +``` + +O binário fica em `target/release/crusty`. Para um build de desenvolvimento (mais rápido de compilar, binário mais lento): + +```bash +cargo build +# binário em target/debug/crusty +``` + +## 5. Verificar a instalação + +Rode o compilador sobre um exemplo incluso no repositório e execute o binário gerado: + +```bash +cargo run --release -- src/examples/hello_world.c -o /tmp/hello +/tmp/hello +``` + +Saída esperada: + +``` +Hello, World! +``` + +Se isso funcionou, o ambiente está pronto. Para confirmar que toda a suíte de testes passa no seu ambiente: + +```bash +cargo test --all +cargo clippy -- -D warnings +cargo fmt --check +``` + +Essas três checagens são exatamente as que o CI (`.github/workflows/`) roda em todo push/PR para `developer` e `master`. + +## Problemas comuns + +- **`error: linker 'cc' not found` ou falha ao montar/linkar** — `gcc` não está instalado ou não está no `PATH`. Repita o passo 2. +- **`cargo: command not found`** depois de instalar o Rust — rode `source "$HOME/.cargo/env"` ou abra um novo terminal. +- **Testes de smoke "pulando" silenciosamente** — esperado se `gcc` não estiver disponível; veja [TESTER.md](TESTER.md) para detalhes. +- **Programa de teste usa `float`** e falha com `error: code generation` — `float` ainda não tem codegen (limitação conhecida do backend, ver [README.md](README.md#limitações-conhecidas) e [issue #172](https://github.com/Bappoz/Crusty/issues/172)); `double` já é suportado. diff --git a/README.md b/README.md index 86b5799..74b29b5 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # Crusty — Compilador C em Rust -Projeto da disciplina de Compiladores 1. Implementa um compilador para um subconjunto da linguagem C, escrito em Rust. +Projeto da disciplina de Compiladores 1. Implementa um compilador para um subconjunto da linguagem C, escrito em Rust, com backend nativo x86-64 (System V ABI, Linux). ## Estágio atual @@ -8,8 +8,18 @@ Projeto da disciplina de Compiladores 1. Implementa um compilador para um subcon |------|--------| | Análise léxica | Completo | | Análise sintática | Completo | -| Análise semântica | Em desenvolvimento | -| Geração de código | Não iniciado | +| Análise semântica | Completo | +| IR (TAC) | Completo | +| Otimizações (CSE, DCE, constant folding, copy propagation, LICM, inlining) | Completo | +| Geração de código x86-64 | Completo para tipos inteiros, ponteiros, structs, arrays, globais e `double` (registradores XMM) | + +### Limitações conhecidas + +- **`float` não tem codegen.** `double` já é suportado pelo backend x86-64 (registradores XMM, issue [#172](https://github.com/Bappoz/Crusty/issues/172)), mas `float` ainda não — o analisador semântico aceita e tipa `float`, mas o backend ainda não emite o código correspondente. Programas que usam `float` falham com `error: code generation` no estágio final; use `double` no lugar. +- O modo REPL interativo (executar `crusty` sem argumentos) não está implementado. +- `--dump-ir` ainda não imprime a IR (placeholder). + +Fora isso, o pipeline completo (lexer → parser → análise semântica → IR → otimizações → assembly x86-64 → executável ELF via `gcc`) funciona ponta a ponta para um subconjunto relevante de C: tipos inteiros, `char` e `double`, ponteiros, structs, arrays de tamanho fixo, enums, typedefs, variáveis globais, todas as estruturas de controle (`if`/`while`/`do-while`/`for`/`switch`) e chamadas de função. ## Estrutura do projeto @@ -18,43 +28,54 @@ src/ ├── lexer/ Análise léxica — transforma código-fonte em tokens ├── parser/ Análise sintática — constrói a AST via Pratt parsing ├── analyser/ Análise semântica — tabela de símbolos, escopos, verificação de tipos -├── codegen/ Geração de código — esqueleto (não implementado) +├── ir/ Geração e lowering da IR intermediária (TAC) +├── codegen/ Geração de código — otimizações sobre TAC (inter/) e backend x86-64 (last/) ├── common/ Estruturas compartilhadas: AST, erros, spans, utilitários +├── examples/ Arquivos .c de exemplo usados em testes e demonstrações └── tests/ Testes unitários por módulo +tests/ Testes de integração e smoke tests (ponta a ponta, com gcc) +docs/ Documentação técnica de cada fase do compilador ``` -## Pré-requisitos +## Começando -- [Rust](https://rustup.rs/) 1.70+ +Instruções completas de instalação e configuração do ambiente (Rust, `gcc`, verificação de toolchain) estão em [INSTALL.md](INSTALL.md). + +Resumo rápido: ```bash rustup update stable +cargo build --release +cargo run --release -- src/examples/hello_world.c +./hello_world ``` -## Build +## Uso ```bash -cargo build +crusty [flags] ``` -## Uso +Principais flags (lista completa em `crusty` sem argumentos): -Rodar o compilador sobre um arquivo de entrada: - -```bash -cargo run -- -``` +| Flag | Efeito | +|---|---| +| `--dump-tokens` | Lista os tokens emitidos pelo lexer | +| `--dump-ast` | Imprime a AST | +| `--only-lex` / `--only-parse` / `--only-semantic` | Para o pipeline no estágio indicado | +| `-S`, `--emit-asm` / `--emit=asm` | Para após emitir o assembly x86-64 (`.s`) | +| `--emit=obj` | Para após montar o objeto (`.o`), sem linkar | +| `--emit=exe` | Monta e linka um executável ELF rodável (padrão) | +| `-o `, `--out-dir `, `--out-name ` | Controlam o caminho/nome de saída | +| `-O0`\|`-O1`\|`-O2`\|`-O3`, `--opt-level ` | Nível de otimização aplicado à IR | -Exemplo: +Exemplo gerando e executando um binário: ```bash -cargo run -- input.c +cargo run --release -- src/examples/simple.c -o /tmp/simple +/tmp/simple; echo "exit: $?" ``` -O compilador imprime os tokens reconhecidos, a AST e eventuais diagnósticos de erro. - -> O modo REPL interativo (sem argumentos) ainda não está implementado. - ## Funcionalidades implementadas **Lexer** @@ -81,60 +102,39 @@ O compilador imprime os tokens reconhecidos, a AST e eventuais diagnósticos de - Promoção numérica implícita (Double > Float > Long > Int) - Detecção de atribuição a `const` -## Testes +**IR e otimizações** +- Lowering de AST para TAC (Three-Address Code), incluindo arrays fixos, structs e globais +- Pipeline de otimização configurável por nível (`-O0`..`-O3`): constant folding, common subexpression elimination, dead code elimination, copy propagation, loop-invariant code motion, inlining -### Todos os testes unitários +**Backend x86-64** +- Convenção de chamada System V ABI (inteiros/ponteiros em `rdi`..`r9`/`rax`, `double` em `xmm0`..`xmm7`) +- Endereço de variáveis, indexação de array, acesso a membro de struct (`.`, `->`), address-of/deref +- Aritmética, comparações e literais de `double` via registradores XMM (`addsd`/`subsd`/`mulsd`/`divsd`/`ucomisd`) +- `sizeof` em tempo de compilação +- Variáveis globais com acesso RIP-relative +- Peephole optimizer sobre o assembly emitido +- Emissão de `.s`, montagem de `.o` e link de executável via `gcc` -```bash -cargo test -``` +## Testes -### Filtrar por módulo +Cobertura completa (testes unitários e testes com arquivos `.c` reais, executados de ponta a ponta) está documentada em [TESTER.md](TESTER.md). -```bash -cargo test lexical # testes do scanner/lexer (21 casos) -cargo test parser_test # testes do parser / AST (76 casos) -cargo test semantic_test # testes do analisador semântico (21 casos) -cargo test symbol_test # testes da tabela de símbolos (11 casos) -cargo test analyzer_test # testes de integração do analisador (3 casos) -cargo test source # testes de SourceFile e spans (12 casos) -cargo test lexer_file # testes do scanner lendo arquivos (7 casos) -cargo test parser_file # testes do parser lendo arquivos (4 casos) -cargo test literals # testes de literais numéricos (4 casos) -cargo test ast_errors # testes de erros de AST (4 casos) -cargo test token # testes de tokens individuais (2 casos) -``` - -### Com saída detalhada +Resumo rápido: ```bash -cargo test -- --nocapture +cargo test --all # ~361 testes (unitários + integração + smoke e2e) +cargo clippy -- -D warnings +cargo fmt --check ``` -### Módulos de teste - -| Arquivo | Cobertura | Testes | -|---|---|---| -| `src/tests/lexical_test.rs` | Scanner: operadores, palavras-chave, literais | 21 | -| `src/tests/parser_test.rs` | Parser / construção de AST | 76 | -| `src/tests/semantic_test.rs` | Verificação de tipos, undefined vars, const | 21 | -| `src/tests/symbol_test.rs` | Tabela de símbolos, escopos, redeclaração | 11 | -| `src/tests/source_test.rs` | `SourceFile`, `ByteSpan`, posicionamento | 12 | -| `src/tests/lexer_file_test.rs` | Scanner sobre arquivos reais | 7 | -| `src/tests/parser_file_test.rs` | Parser sobre arquivos reais | 4 | -| `src/tests/literals_test.rs` | Literais inteiros, floats, strings | 4 | -| `src/tests/ast_errors.rs` | Diagnósticos e erros de AST | 4 | -| `src/tests/analyzer_test.rs` | Integração léxico → sintático → semântico | 3 | -| `src/tests/token_test.rs` | `Token` e `TokenKind` | 2 | - -**Total: 165 testes** - ## Documentação técnica - [Lexer](docs/lexer.md) — scanner, tokens, erros léxicos - [Parser](docs/parser.md) — Pratt parser, AST, recuperação de erros - [Analisador Semântico](docs/semantic.md) — tabela de símbolos, verificação de tipos - [Precedência de Operadores C](docs/c_operator_precedence.md) — tabela C11 e mapeamento para binding powers +- [INSTALL.md](INSTALL.md) — como preparar o ambiente e compilar o projeto +- [TESTER.md](TESTER.md) — como rodar e interpretar todos os testes ## Contribuidores diff --git a/TESTER.md b/TESTER.md new file mode 100644 index 0000000..09cb9fc --- /dev/null +++ b/TESTER.md @@ -0,0 +1,145 @@ +# TESTER — como testar o Crusty + +Este documento cobre todas as formas de testar o compilador: testes unitários, testes de integração com arquivos `.c` reais, e testes de smoke ponta a ponta que montam/linkam/executam o binário gerado via `gcc`. + +Pré-requisito: ambiente configurado conforme [INSTALL.md](INSTALL.md) (Rust + `gcc`). + +## Visão geral das suítes + +| Suíte | Localização | O que verifica | Testes | +|---|---|---|---| +| Testes unitários da lib | `src/tests/*.rs` | Lexer, parser, analisador semântico, codegen, em isolamento | 295 | +| Testes unitários do binário | `src/main.rs` (`#[cfg(test)]`) | Parsing de flags da CLI | 10 | +| `tests/integration_test.rs` | Pipeline lexer→parser→semântico sobre arquivos `.c` reais (válidos e inválidos) | 16 | +| `tests/codegen_smoke.rs` | TAC montado manualmente → assembly → `gcc` → execução, checando exit code | 5 | +| `tests/exe_smoke_test.rs` | Código-fonte C real → pipeline completo → executável ELF → execução | 26 | +| `tests/double_codegen_test.rs` | Codegen de `double`/XMM (assembly emitido e execução real, issue #172) | 7 | +| `tests/licm_test.rs` | Otimização de loop-invariant code motion | 2 | + +Total atual: **361 testes**, todos passando em `developer`. + +## Rodando tudo + +```bash +cargo test --all +``` + +Isso compila e roda todas as suítes acima, na ordem mostrada por `cargo`. + +## Testes unitários (por módulo) + +Os testes unitários vivem em `src/tests/` e cobrem cada fase do compilador isoladamente, sem precisar de arquivos externos nem de `gcc`. + +```bash +cargo test --lib # só a suíte unitária da lib (sem integração/smoke) +cargo test lexical # scanner/lexer: operadores, palavras-chave, literais +cargo test parser_test # parser / construção de AST +cargo test semantic_test # verificação de tipos, undefined vars, const +cargo test symbol_test # tabela de símbolos, escopos, redeclaração +cargo test source # SourceFile, ByteSpan, posicionamento +cargo test lexer_file # scanner sobre arquivos reais +cargo test parser_file # parser sobre arquivos reais +cargo test literals # literais inteiros, floats, strings +cargo test ast_errors # diagnósticos e erros de AST +cargo test analyzer_test # integração léxico → sintático → semântico +cargo test token # Token e TokenKind +cargo test codegen_test # geração de código (unitário) +cargo test peephole_test # otimizador de assembly (peephole) +cargo test unmap_safe_test # segurança de unmap/memmap do SourceFile +``` + +Saída detalhada (não suprime `println!`/`eprintln!` dos testes): + +```bash +cargo test -- --nocapture +``` + +Rodar um único teste pelo nome exato: + +```bash +cargo test test_licm_loop_with_invariant +``` + +## Testes com arquivos `.c` reais + +### `tests/integration_test.rs` — front-end completo, sem executar binário + +Roda lexer → parser → análise semântica sobre arquivos em `tests/integration/valid/` e `tests/integration/invalid/`, verificando que programas válidos não geram diagnósticos e que programas inválidos geram exatamente o erro esperado (variável não declarada, redeclaração, atribuição a `const`, mismatch de tipo, aridade de chamada, etc). + +```bash +cargo test --test integration_test +``` + +Para adicionar um novo caso: crie um `.c` em `tests/integration/valid/` (deve compilar sem erros) ou `tests/integration/invalid/` (deve falhar com o diagnóstico esperado) e adicione o caso correspondente em `tests/integration_test.rs`. + +### `tests/exe_smoke_test.rs` e `tests/codegen_smoke.rs` — ponta a ponta, com execução real + +Estes são os testes mais completos: compilam código C real até assembly x86-64, montam e linkam com `gcc` em um executável ELF, executam o binário e verificam o **exit code** (e, quando aplicável, a saída em stdout). + +```bash +cargo test --test exe_smoke_test +cargo test --test codegen_smoke +cargo test --test double_codegen_test +``` + +Se `gcc` não estiver disponível no `PATH`, esses testes são pulados (skip) automaticamente — verifique a saída de `cargo test -- --nocapture` por `gcc indisponivel: pulando teste de smoke` para confirmar. + +### Testando manualmente com os arquivos de `src/examples/` + +O diretório `src/examples/` contém programas `.c` de exemplo usados como referência/demonstração. Para testar manualmente o pipeline completo sobre um deles: + +```bash +# 1. compilar e linkar +cargo run --release -- src/examples/simple.c -o /tmp/simple + +# 2. executar e checar o resultado +/tmp/simple; echo "exit code: $?" +``` + +Para inspecionar estágios intermediários: + +```bash +cargo run -- src/examples/simple.c --dump-tokens # tokens do lexer +cargo run -- src/examples/simple.c --dump-ast # AST +cargo run -- src/examples/simple.c --emit=asm -o /tmp/simple.s # assembly x86-64 gerado +``` + +Exemplos disponíveis e seu status atual: + +| Arquivo | Compila e roda? | Observação | +|---|---|---| +| `hello_world.c` | Sim | Imprime `Hello, World!` | +| `simple.c` | Sim | | +| `demo_presentation.c` | Sim | Demo usada na apresentação da disciplina | +| `declarations.c` | Gera assembly, mas não tem `main` | Não é pensado para ser linkado/executado isoladamente | +| `full_code1.c` | Não | Usa `float`, sem codegen ainda (`double` já é suportado, ver issue #172) | +| `operators.c` | Não — nem com `gcc` | Tem statements em escopo global, o que não é C válido (confirmado com `gcc -fsyntax-only`); não é um bug do compilador | + +### Testando com seus próprios arquivos `.c` + +Qualquer arquivo `.c` válido pode ser usado diretamente: + +```bash +cargo run --release -- caminho/para/arquivo.c -o /tmp/saida +/tmp/saida +``` + +Para depurar um erro de compilação, repita o comando com `--dump-ast` ou `--only-semantic` para isolar em qual fase o problema ocorre. + +## Checagens de qualidade (rodadas no CI) + +O CI (`.github/workflows/`) roda, nesta ordem, em todo push/PR para `developer` e `master`: + +```bash +cargo build --all +cargo test --all +cargo clippy -- -D warnings +cargo fmt --check +``` + +Rode as quatro localmente antes de abrir um PR — é exatamente o que será verificado automaticamente. + +## Cobertura conhecida e limitações dos testes + +- `double` tem cobertura dedicada em `tests/double_codegen_test.rs` (issue #172). Não há testes automatizados para `float` em codegen, porque a feature ainda não existe no backend. +- Os smoke tests de execução (`exe_smoke_test.rs`, `codegen_smoke.rs`, `double_codegen_test.rs`) dependem de Linux x86-64 + `gcc`; em outras plataformas eles são pulados, não falham. diff --git a/docs/index.md b/docs/index.md index 4b4506a..d06c6c8 100644 --- a/docs/index.md +++ b/docs/index.md @@ -13,7 +13,9 @@ Código-fonte (.c) ↓ [Analisador] → diagnósticos semânticos ↓ -[Codegen] → (não implementado) +[IR / TAC] → Three-Address Code + otimizações (-O0..-O3) + ↓ +[Codegen] → assembly x86-64 → .o / executável ELF via gcc ``` ## Módulos @@ -22,8 +24,9 @@ Código-fonte (.c) |--------|--------|--------------| | Lexer | Completo | [lexer.md](lexer.md) | | Parser | Completo | [parser.md](parser.md) | -| Analisador Semântico | Em desenvolvimento | [semantic.md](semantic.md) | -| Geração de código | Não iniciado | — | +| Analisador Semântico | Completo | [semantic.md](semantic.md) | +| IR (TAC) e otimizações | Completo | — | +| Geração de código x86-64 | Completo para tipos inteiros, ponteiros, structs, arrays, globais e `double`; `float` ainda sem codegen ([issue #172](https://github.com/Bappoz/Crusty/issues/172)) | — | ## Referências @@ -31,4 +34,4 @@ Código-fonte (.c) ## Repositório -[github.com/Bappoz/crusty](https://github.com/Bappoz/crusty) +[github.com/Bappoz/Crusty](https://github.com/Bappoz/Crusty) diff --git a/docs/parser.md b/docs/parser.md index 1d4e408..4836051 100644 --- a/docs/parser.md +++ b/docs/parser.md @@ -305,7 +305,7 @@ Parseadas por `parse_prefix_expr()` antes do loop Pratt: ### Postfix Expressions -`try_parse_postfix(lhs)` retorna `true` se consumiu algo: +`try_parse_postfix(lhs)` retorna `(expr, true)` se consumiu algo (tomando `lhs` por valor e devolvendo o novo nó) ou `(lhs, false)` caso contrário: | Token | Nó produzido | |---|---| diff --git a/docs/semantic.md b/docs/semantic.md index 1304bc2..c0eee9a 100644 --- a/docs/semantic.md +++ b/docs/semantic.md @@ -15,6 +15,9 @@ struct SemanticAnalyser { sym: SymbolTable, current_fn_ret: Option, // tipo de retorno da função atual diagnostics: Vec, + warnings: Vec, + loop_depth: usize, // para validar break/continue + switch_depth: usize, // para validar break em switch } ``` @@ -83,6 +86,7 @@ analyse_stmt::For → enter_scope / exit_scope (init pode declarar variáve 3. Declara cada parâmetro no novo escopo 4. Analisa todos os statements do corpo 5. Restaura `current_fn_ret`; `exit_scope` +6. Verifica a heurística `body_always_returns` se a função não for `void`; emite aviso `MissingReturn` se faltar retorno. ### Struct @@ -170,34 +174,51 @@ Promoção numérica: `Double > Float > Long > Int > Short/Char`. | Nó | Tipo retornado | |---|---| -| `Unary`, `Prefix`, `Postfix` | mesmo tipo do operando | +| `Unary(AddrOf)` | `Pointer(T)` onde `T` é o tipo do operando | +| `Unary(Deref)` | tipo base `T` de `Pointer(T)` ou `Array(T)` | +| `Unary(-, ~)`, `Prefix`, `Postfix` | mesmo tipo do operando | | `CompoundAssign` | tipo do LHS | | `Cast(qty, _)` | `qty` resolvido | | `Sizeof(_)`, `SizeofType(_)` | `unsigned int` | -| `Call(callee, args)` | `void` (sentinela — lookup de retorno não implementado) | -| `Index(arr, idx)` | tipo do elemento (desreferencia `Array(T)` ou `Pointer(T)`) | -| `Ternary(cond, then, else)` | tipo do ramo `then` | +| `Call(callee, args)` | Tipo de retorno registrado na assinatura. Checa aridade e tipos de argumentos. | +| `Index(arr, idx)` | tipo do elemento (desreferencia `Array(T)` ou `Pointer(T)`). `idx` deve ser numérico inteiro. | +| `Ternary(cond, then, else)` | O tipo promovido/comum dos ramos `then` e `else`, após verificação de compatibilidade. | --- -## Erros Semânticos +## Diagnósticos Semânticos -Todos são do tipo `CompilerError::Semantic(SemanticError { span, kind })`: +Erros geram `CompilerError::Semantic`, avisos geram `CompilerWarning::Semantic`. A análise não é interrompida por diagnósticos. + +### Erros (CompilerError) | Kind | Causa | |---|---| | `Redeclaration(name)` | Nome já declarado no escopo atual | | `UndefinedVariable(name)` | Identificador não encontrado em nenhum escopo | +| `UndefinedFunction(name)` | Chamada de função sem definição ou protótipo registrado | | `AssignToConst(name)` | Atribuição a variável declarada com `const` | -| `TypeMismatch { expected, found }` | Tipos incompatíveis em atribuição ou operação binária | +| `TypeMismatch { expected, found }` | Tipos incompatíveis (atribuição, operação binária, retorno, ternário) | | `UndefinedStruct(name)` | Acesso a membro de struct não registrada | -| `FieldNotFound { struct_name, field_name }` | Campo não existe na struct | +| `FieldNotFound { struct, field }` | Campo não existe na struct | +| `InvalidSwitchType` | Expressão de switch não é de tipo inteiro | +| `BreakOutsideLoop` | Uso de `break` fora de um bloco iterativo (`for`/`while`) ou `switch` | +| `ContinueOutsideLoop` | Uso de `continue` fora de um bloco iterativo (`for`/`while`) | +| `ArityMismatch { expected, found }` | Quantidade incorreta de argumentos na chamada de função | +| `NotIndexable` | Tentativa de usar operador de índice `[]` em tipo que não é array nem ponteiro | +| `InvalidIndexType` | Expressão usada no índice não é do tipo inteiro | + +### Avisos (CompilerWarning) + +| Kind | Causa | +|---|---| +| `MissingReturn(name)` | Função declarada com retorno não-void sem garantia de `return` em todos os caminhos | +| `UnusedVariable(name)` | Variável local declarada mas não referenciada | +| `MayBeUninitialized(name)`| Variável lida sem antes ter garantido inicialização (por declaração ou atribuição) | --- ## Limitações atuais (TODO) -- Verificação de tipo de retorno de função (`return expr` vs. tipo declarado) -- Lookup de tipo de retorno em chamadas de função (`Call` retorna `void` sentinela) -- Verificação de compatibilidade entre ramos `then`/`else` no ternário - Aritmética de ponteiro para `Sub` (ponteiro − ponteiro → `ptrdiff_t`) +- Constant folding robusto (hoje o compilador avalia as heurísticas baseado estaticamente na árvore do código, sem resolver valores em runtime na análise) diff --git a/src/analyser/mod.rs b/src/analyser/mod.rs index 65cec25..bf43948 100644 --- a/src/analyser/mod.rs +++ b/src/analyser/mod.rs @@ -2,4 +2,5 @@ pub mod semantic; pub mod symbol_table; pub use semantic::analyse; +pub use semantic::analyse_with_builtins; pub use semantic::SemanticAnalyser; diff --git a/src/analyser/semantic.rs b/src/analyser/semantic.rs index c7db0b9..7cb8158 100644 --- a/src/analyser/semantic.rs +++ b/src/analyser/semantic.rs @@ -3,6 +3,7 @@ use crate::common::ast::ast::{Program, QualifierType, Type}; use crate::common::ast::decl::Decl; use crate::common::ast::expr::{Expr, Literal, MemberAccess}; use crate::common::ast::stmt::Stmt; +use crate::common::builtins::BuiltinsLibs; use crate::common::errors::error_data::Span; use crate::common::errors::types::{ CompilerError, CompilerWarning, Diagnostic, SemanticError, SemanticErrorKind, SemanticWarning, @@ -16,6 +17,13 @@ pub struct SemanticAnalyser { pub current_fn_ret: Option, pub diagnostics: Vec, pub warnings: Vec, + /// Profundidade de loops aninhados (for/while/do-while). + /// Usada para validar `continue` e `break`. + pub loop_depth: usize, + /// Profundidade de `switch` aninhados. + /// Usada para validar `break` (switch aceita break, mas não continue). + pub switch_depth: usize, + pub stdio_enabled: bool, } impl SemanticAnalyser { @@ -25,6 +33,16 @@ impl SemanticAnalyser { current_fn_ret: None, diagnostics: Vec::new(), warnings: Vec::new(), + loop_depth: 0, + switch_depth: 0, + stdio_enabled: false, + } + } + + pub fn with_builtins(builtins: crate::common::builtins::BuiltinsLibs) -> Self { + Self { + stdio_enabled: builtins.stdio, + ..Self::new() } } @@ -168,7 +186,7 @@ impl SemanticAnalyser { } self.sym.enter_scope(); - let prev_ret = self.current_fn_ret.replace(resolved_ret); + let prev_ret = self.current_fn_ret.replace(resolved_ret.clone()); for (qty, name) in params { let resolved_qty = self.resolve_type(qty); @@ -193,6 +211,15 @@ impl SemanticAnalyser { self.analyse_stmt(stmt); } + // Aviso conservador: função não-void cujo corpo não termina com return. + if resolved_ret.ty != Type::Void && !body_always_returns(body) { + self.warnings + .push(CompilerWarning::Semantic(SemanticWarning { + span: span.clone(), + kind: SemanticWarningKind::MissingReturn(name.clone()), + })); + } + self.current_fn_ret = prev_ret; self.exit_scope_checking_unused(); } @@ -244,10 +271,23 @@ impl SemanticAnalyser { pub fn analyse_stmt(&mut self, stmt: &Stmt) { match stmt { Stmt::VarDecl(qty, name, init, span) => { - if let Some(expr) = init { - self.analyse_expr(expr); - } let resolved_qty = self.resolve_type(qty); + let initialized = if let Some(expr) = init { + let init_ty = self.analyse_expr(expr); + if !types_compatible_for_assign(&resolved_qty.ty, &init_ty.ty) { + self.diagnostics + .push(CompilerError::Semantic(SemanticError { + span: expr.span(), + kind: SemanticErrorKind::TypeMismatch { + expected: type_name(&resolved_qty.ty), + found: type_name(&init_ty.ty), + }, + })); + } + true + } else { + false + }; let symbol = Symbol { name: name.clone(), ty: resolved_qty.clone(), @@ -257,7 +297,7 @@ impl SemanticAnalyser { prototype_only: false, // Locais começam "não usadas"; a leitura marca `used = true`. used: false, - initialized: init.is_some(), + initialized, }; if let Err(e) = self.sym.declare(symbol) { self.diagnostics.push(e); @@ -287,7 +327,7 @@ impl SemanticAnalyser { } (Some(expected), Some(e)) => { let found_expr_qty = self.analyse_expr(e); - if expected.ty != found_expr_qty.ty { + if !types_compatible_for_assign(&expected.ty, &found_expr_qty.ty) { self.diagnostics .push(CompilerError::Semantic(SemanticError { span: span.clone(), @@ -320,10 +360,14 @@ impl SemanticAnalyser { } Stmt::While(cond, body, _) => { self.analyse_expr(cond); + self.loop_depth += 1; self.analyse_stmt(body); + self.loop_depth -= 1; } Stmt::DoWhile(cond, body, _) => { + self.loop_depth += 1; self.analyse_stmt(body); + self.loop_depth -= 1; self.analyse_expr(cond); } Stmt::For(init, cond, inc, body, _) => { @@ -337,18 +381,56 @@ impl SemanticAnalyser { if let Some(e) = inc { self.analyse_expr(e); } + self.loop_depth += 1; self.analyse_stmt(body); + self.loop_depth -= 1; self.exit_scope_checking_unused(); } - Stmt::Switch(expr, cases, _) => { - self.analyse_expr(expr); + Stmt::Switch(expr, cases, span) => { + let disc_ty = self.analyse_expr(expr); + // Em C, o discriminante do switch deve ser de tipo inteiro ou enum. + let is_switch_ok = matches!( + disc_ty.ty, + Type::Int | Type::Long | Type::Short | Type::Char | Type::Enum(_) + ); + if !is_switch_ok { + self.diagnostics + .push(CompilerError::Semantic(SemanticError { + span: span.clone(), + kind: SemanticErrorKind::InvalidSwitchType { + found: type_name(&disc_ty.ty), + }, + })); + } + self.switch_depth += 1; for case in cases { + if let crate::common::ast::stmt::SwitchLabel::Case(case_expr) = &case.label { + self.analyse_expr(case_expr); + } for s in &case.stmts { self.analyse_stmt(s); } } + self.switch_depth -= 1; + } + Stmt::Break(span) => { + if self.loop_depth == 0 && self.switch_depth == 0 { + self.diagnostics + .push(CompilerError::Semantic(SemanticError { + span: span.clone(), + kind: SemanticErrorKind::BreakOutsideLoop, + })); + } + } + Stmt::Continue(span) => { + if self.loop_depth == 0 { + self.diagnostics + .push(CompilerError::Semantic(SemanticError { + span: span.clone(), + kind: SemanticErrorKind::ContinueOutsideLoop, + })); + } } - Stmt::Break(_) | Stmt::Continue(_) => {} } } @@ -422,9 +504,26 @@ impl SemanticAnalyser { } } } - Expr::Unary(_, e, _) | Expr::Prefix(_, e, _) | Expr::Postfix(_, e, _) => { - self.analyse_expr(e) + Expr::Unary(op, e, _) => { + let inner_ty = self.analyse_expr(e); + match op { + crate::common::ast::expr::UnOp::AddrOf => QualifierType { + ty: Type::Pointer(Box::new(inner_ty.ty)), + is_const: false, + is_unsigned: false, + }, + crate::common::ast::expr::UnOp::Deref => match inner_ty.ty { + Type::Pointer(base) | Type::Array(base, _) => QualifierType { + ty: *base, + is_const: inner_ty.is_const, + is_unsigned: inner_ty.is_unsigned, + }, + _ => inner_ty, + }, + _ => inner_ty, + } } + Expr::Prefix(_, e, _) | Expr::Postfix(_, e, _) => self.analyse_expr(e), Expr::CompoundAssign(_, lhs, rhs, _) => { let lhs_ty = self.analyse_expr(lhs); self.analyse_expr(rhs); @@ -441,6 +540,25 @@ impl SemanticAnalyser { Expr::SizeofType(_, _) => uint_type(), Expr::Call(callee, args, span) => { if let Expr::Ident(name, id_span) = callee.as_ref() { + if name == "printf" { + if !self.stdio_enabled { + self.diagnostics + .push(CompilerError::Semantic(SemanticError { + span: id_span.clone(), + kind: SemanticErrorKind::MissingLibraryHeader { + header: "stdio.h".to_string(), + symbol: "printf".to_string(), + }, + })); + for a in args { + self.analyse_expr(a); + } + return unknown_type(); + } + + return self.analyse_printf_call(args, span, id_span); + } + match self.sym.lookup(name) { None => { self.diagnostics @@ -537,7 +655,7 @@ impl SemanticAnalyser { } match arr_ty.ty { - Type::Array(inner) | Type::Pointer(inner) => QualifierType { + Type::Array(inner, _) | Type::Pointer(inner) => QualifierType { ty: *inner, is_const: arr_ty.is_const, is_unsigned: arr_ty.is_unsigned, @@ -675,7 +793,9 @@ impl SemanticAnalyser { fn resolve_type_inner(&self, ty: &Type) -> Type { match ty { - Type::Array(inner) => Type::Array(Box::new(self.resolve_type_inner(inner))), + Type::Array(inner, size) => { + Type::Array(Box::new(self.resolve_type_inner(inner)), *size) + } Type::Pointer(inner) => Type::Pointer(Box::new(self.resolve_type_inner(inner))), Type::Alias(name) => self .sym @@ -732,12 +852,101 @@ impl SemanticAnalyser { } } } + + fn analyse_printf_call(&mut self, args: &[Expr], span: &Span, id_span: &Span) -> QualifierType { + if args.is_empty() { + self.diagnostics + .push(CompilerError::Semantic(SemanticError { + span: span.clone(), + kind: SemanticErrorKind::ArityMismatch { + expected: 1, + found: 0, + }, + })); + return unknown_type(); + } + + let fmt_ty = self.analyse_expr(&args[0]); + if !is_string_like(&fmt_ty.ty) { + self.diagnostics + .push(CompilerError::Semantic(SemanticError { + span: args[0].span(), + kind: SemanticErrorKind::TypeMismatch { + expected: "char*".to_string(), + found: type_name(&fmt_ty.ty), + }, + })); + } + + match args.len() { + 1 => {} + 2 => { + let arg_ty = self.analyse_expr(&args[1]); + if let Expr::Literal(Literal::String(fmt), _) = &args[0] { + let needs_pointer = fmt.contains("%s"); + let needs_scalar = fmt.contains("%d") + || fmt.contains("%i") + || fmt.contains("%u") + || fmt.contains("%c"); + + if needs_pointer && !is_string_like(&arg_ty.ty) { + self.diagnostics + .push(CompilerError::Semantic(SemanticError { + span: args[1].span(), + kind: SemanticErrorKind::TypeMismatch { + expected: "char*".to_string(), + found: type_name(&arg_ty.ty), + }, + })); + } else if needs_scalar && !is_scalar(&arg_ty.ty) { + self.diagnostics + .push(CompilerError::Semantic(SemanticError { + span: args[1].span(), + kind: SemanticErrorKind::TypeMismatch { + expected: "scalar".to_string(), + found: type_name(&arg_ty.ty), + }, + })); + } + } else if !is_scalar(&arg_ty.ty) && !is_string_like(&arg_ty.ty) { + self.diagnostics + .push(CompilerError::Semantic(SemanticError { + span: args[1].span(), + kind: SemanticErrorKind::TypeMismatch { + expected: "scalar".to_string(), + found: type_name(&arg_ty.ty), + }, + })); + } + } + n => { + self.diagnostics + .push(CompilerError::Semantic(SemanticError { + span: id_span.clone(), + kind: SemanticErrorKind::ArityMismatch { + expected: 2, + found: n, + }, + })); + } + } + + QualifierType { + ty: Type::Int, + is_const: false, + is_unsigned: false, + } + } } /// API pública: analisa o programa e retorna todos os diagnósticos semânticos /// (erros e avisos) como `Vec`. pub fn analyse(prog: &Program) -> Vec { - let mut analyser = SemanticAnalyser::new(); + analyse_with_builtins(prog, BuiltinsLibs::default()) +} + +pub fn analyse_with_builtins(prog: &Program, builtins: BuiltinsLibs) -> Vec { + let mut analyser = SemanticAnalyser::with_builtins(builtins); analyser.analyse_program(prog); analyser .diagnostics @@ -747,6 +956,35 @@ pub fn analyse(prog: &Program) -> Vec { .collect() } +/// Heurística conservadora: retorna `true` se o último statement do corpo é +/// `Stmt::Return(Some(_))` ou se algum statement de nível de topo claramente +/// sempre retorna (ex.: `if` com then E else ambos retornando). +/// +/// Não tenta análise de fluxo completa; apenas evita falsos positivos nas +/// situações mais comuns de funções de disciplina. +fn body_always_returns(stmts: &[Stmt]) -> bool { + match stmts.last() { + Some(Stmt::Return(Some(_), _)) => true, + Some(Stmt::If(_, then, Some(else_), _)) => { + stmt_always_returns(then) && stmt_always_returns(else_) + } + Some(Stmt::Block(inner, _)) => body_always_returns(inner), + _ => false, + } +} + +/// Verifica se um único statement sempre retorna (auxiliar de `body_always_returns`). +fn stmt_always_returns(stmt: &Stmt) -> bool { + match stmt { + Stmt::Return(Some(_), _) => true, + Stmt::Block(stmts, _) => body_always_returns(stmts), + Stmt::If(_, then, Some(else_), _) => { + stmt_always_returns(then) && stmt_always_returns(else_) + } + _ => false, + } +} + fn infer_literal_type(lit: &Literal) -> QualifierType { let ty = match lit { Literal::Int(_) => Type::Int, @@ -777,6 +1015,10 @@ fn uint_type() -> QualifierType { } } +fn is_string_like(ty: &Type) -> bool { + matches!(ty, Type::Pointer(inner) if matches!(&**inner, Type::Char)) +} + // ── Type helpers ──────────────────────────────────────────────────────────── /// Retorna `true` se o tipo é numérico (inteiro ou ponto flutuante). @@ -789,7 +1031,7 @@ fn is_numeric(ty: &Type) -> bool { /// Retorna `true` se o tipo é ponteiro ou array (array decai para ponteiro em C). fn is_pointer(ty: &Type) -> bool { - matches!(ty, Type::Pointer(_) | Type::Array(_)) + matches!(ty, Type::Pointer(_) | Type::Array(_, _)) } /// Retorna `true` se o tipo é escalar (numérico, ponteiro ou enum). @@ -809,7 +1051,10 @@ fn type_name(ty: &Type) -> String { Type::Double => "double".into(), Type::Void => "void".into(), Type::Pointer(inner) => format!("{}*", type_name(inner)), - Type::Array(inner) => format!("{}[]", type_name(inner)), + Type::Array(inner, size) => match size { + Some(size) => format!("{}[{size}]", type_name(inner)), + None => format!("{}[]", type_name(inner)), + }, Type::Struct(n) => format!("struct {}", n), Type::Enum(n) => format!("enum {}", n), Type::Alias(n) => n.clone(), diff --git a/src/codegen/inter/mod.rs b/src/codegen/inter/mod.rs index 8b13789..da683e5 100644 --- a/src/codegen/inter/mod.rs +++ b/src/codegen/inter/mod.rs @@ -1 +1,70 @@ +pub mod opt; +pub mod optimizations; +#[derive(Debug, Clone, PartialEq, Eq)] +pub struct Cfg { + pub blocks: Vec, +} + +impl Cfg { + pub fn new() -> Self { + Self { blocks: Vec::new() } + } + + pub fn add_block(&mut self, block: BasicBlock) { + self.blocks.push(block); + } +} + +impl Default for Cfg { + fn default() -> Self { + Self::new() + } +} + +#[derive(Debug, Clone, PartialEq, Eq)] +pub struct BasicBlock { + pub label: String, + pub instructions: Vec, + pub successors: Vec, +} + +impl BasicBlock { + pub fn new(label: impl Into) -> Self { + Self { + label: label.into(), + instructions: Vec::new(), + successors: Vec::new(), + } + } +} + +#[derive(Debug, Clone, PartialEq, Eq)] +pub enum Instruction { + Assign { + dst: String, + value: Value, + }, + Binary { + dst: String, + op: BinaryOp, + lhs: Value, + rhs: Value, + }, + Nop, +} + +#[derive(Debug, Clone, PartialEq, Eq, Hash)] +pub enum Value { + Int(i64), + Temp(String), +} + +#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)] +pub enum BinaryOp { + Add, + Sub, + Mul, + Div, + Mod, +} diff --git a/src/codegen/inter/opt/constant_fold.rs b/src/codegen/inter/opt/constant_fold.rs new file mode 100644 index 0000000..30b8cb3 --- /dev/null +++ b/src/codegen/inter/opt/constant_fold.rs @@ -0,0 +1,57 @@ +use super::OptPass; +use crate::codegen::inter::{BinaryOp, Cfg, Instruction, Value}; + +pub struct ConstantFoldPass; + +impl OptPass for ConstantFoldPass { + fn name(&self) -> &'static str { + "constant-fold" + } + + fn run(&self, cfg: &mut Cfg) -> bool { + let mut changed = false; + + for block in &mut cfg.blocks { + for instruction in &mut block.instructions { + if let Instruction::Binary { dst, op, lhs, rhs } = instruction { + let folded = fold_binary(*op, lhs, rhs); + if let Some(value) = folded { + *instruction = Instruction::Assign { + dst: dst.clone(), + value, + }; + changed = true; + } + } + } + } + + changed + } +} + +fn fold_binary(op: BinaryOp, lhs: &Value, rhs: &Value) -> Option { + let (Value::Int(lhs), Value::Int(rhs)) = (lhs, rhs) else { + return None; + }; + + let value = match op { + BinaryOp::Add => lhs.checked_add(*rhs)?, + BinaryOp::Sub => lhs.checked_sub(*rhs)?, + BinaryOp::Mul => lhs.checked_mul(*rhs)?, + BinaryOp::Div => { + if *rhs == 0 { + return None; + } + lhs.checked_div(*rhs)? + } + BinaryOp::Mod => { + if *rhs == 0 { + return None; + } + lhs.checked_rem(*rhs)? + } + }; + + Some(Value::Int(value)) +} diff --git a/src/codegen/inter/opt/copy_prop.rs b/src/codegen/inter/opt/copy_prop.rs new file mode 100644 index 0000000..bb6b827 --- /dev/null +++ b/src/codegen/inter/opt/copy_prop.rs @@ -0,0 +1,224 @@ +use std::collections::HashMap; + +use super::OptPass; +use crate::codegen::inter::{Cfg, Instruction, Value}; + +/// Copy propagation e constant propagation, combinadas, sobre o TAC. +/// +/// Para cada bloco basico, mantem um mapa `temp -> valor` com a ultima +/// atribuicao simples (`dst = src` ou `dst = literal`) conhecida para cada +/// destino. Usos subsequentes desse destino sao substituidos diretamente +/// pelo valor mapeado, ate que o destino seja redefinido. +pub struct CopyPropagationPass; + +impl OptPass for CopyPropagationPass { + fn name(&self) -> &'static str { + "copy-propagation" + } + + fn run(&self, cfg: &mut Cfg) -> bool { + let mut changed = false; + + for block in &mut cfg.blocks { + changed |= propagate_in_block(&mut block.instructions); + } + + changed + } +} + +type CopyMap = HashMap; + +fn propagate_in_block(instructions: &mut [Instruction]) -> bool { + let mut changed = false; + let mut copies: CopyMap = HashMap::new(); + + for instruction in instructions.iter_mut() { + match instruction { + Instruction::Assign { dst, value } => { + let resolved = resolve(&copies, value); + if resolved != *value { + *value = resolved.clone(); + changed = true; + } + + invalidate(&mut copies, dst); + if !matches!(&resolved, Value::Temp(name) if name == dst) { + copies.insert(dst.clone(), resolved); + } + } + Instruction::Binary { dst, lhs, rhs, .. } => { + let resolved_lhs = resolve(&copies, lhs); + if resolved_lhs != *lhs { + *lhs = resolved_lhs; + changed = true; + } + + let resolved_rhs = resolve(&copies, rhs); + if resolved_rhs != *rhs { + *rhs = resolved_rhs; + changed = true; + } + + invalidate(&mut copies, dst); + } + Instruction::Nop => {} + } + } + + changed +} + +fn resolve(copies: &CopyMap, value: &Value) -> Value { + match value { + Value::Temp(name) => copies.get(name).cloned().unwrap_or_else(|| value.clone()), + other => other.clone(), + } +} + +/// Remove do mapa qualquer entrada que dependa de `name`, seja como chave +/// (o proprio destino esta sendo redefinido) ou como valor (algum outro +/// destino havia sido marcado como copia de `name`). +fn invalidate(copies: &mut CopyMap, name: &str) { + copies.remove(name); + copies.retain(|_, v| !matches!(v, Value::Temp(n) if n == name)); +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::codegen::inter::{BasicBlock, BinaryOp}; + + fn assign(dst: &str, value: Value) -> Instruction { + Instruction::Assign { + dst: dst.to_string(), + value, + } + } + + fn binary(dst: &str, op: BinaryOp, lhs: Value, rhs: Value) -> Instruction { + Instruction::Binary { + dst: dst.to_string(), + op, + lhs, + rhs, + } + } + + #[test] + fn propagates_simple_copy() { + let mut block = BasicBlock::new("entry"); + block.instructions.push(binary( + "a", + BinaryOp::Add, + Value::Temp("p".into()), + Value::Temp("q".into()), + )); + block + .instructions + .push(assign("b", Value::Temp("a".into()))); + block.instructions.push(binary( + "c", + BinaryOp::Add, + Value::Temp("b".into()), + Value::Int(1), + )); + + let changed = propagate_in_block(&mut block.instructions); + + assert!(changed); + assert_eq!( + block.instructions[2], + binary("c", BinaryOp::Add, Value::Temp("a".into()), Value::Int(1)) + ); + } + + #[test] + fn propagates_constant() { + let mut block = BasicBlock::new("entry"); + block.instructions.push(assign("x", Value::Int(10))); + block.instructions.push(binary( + "y", + BinaryOp::Mul, + Value::Temp("x".into()), + Value::Int(2), + )); + + let changed = propagate_in_block(&mut block.instructions); + + assert!(changed); + assert_eq!( + block.instructions[1], + binary("y", BinaryOp::Mul, Value::Int(10), Value::Int(2)) + ); + } + + #[test] + fn stops_propagating_after_redefinition() { + let mut block = BasicBlock::new("entry"); + block.instructions.push(assign("x", Value::Int(10))); + block + .instructions + .push(assign("x", Value::Temp("p".into()))); + block.instructions.push(binary( + "y", + BinaryOp::Mul, + Value::Temp("x".into()), + Value::Int(2), + )); + + propagate_in_block(&mut block.instructions); + + assert_eq!( + block.instructions[2], + binary("y", BinaryOp::Mul, Value::Temp("p".into()), Value::Int(2)) + ); + } + + #[test] + fn invalidates_dependents_when_source_is_redefined() { + let mut block = BasicBlock::new("entry"); + block + .instructions + .push(assign("b", Value::Temp("a".into()))); + block.instructions.push(assign("a", Value::Int(7))); + block.instructions.push(binary( + "c", + BinaryOp::Add, + Value::Temp("b".into()), + Value::Int(1), + )); + + propagate_in_block(&mut block.instructions); + + assert_eq!( + block.instructions[2], + binary("c", BinaryOp::Add, Value::Temp("b".into()), Value::Int(1)) + ); + } + + #[test] + fn pass_runs_across_blocks_and_reports_changes() { + let mut cfg = Cfg::new(); + + let mut block = BasicBlock::new("entry"); + block.instructions.push(assign("a", Value::Int(3))); + block.instructions.push(binary( + "b", + BinaryOp::Add, + Value::Temp("a".into()), + Value::Int(1), + )); + cfg.add_block(block); + + let pass = CopyPropagationPass; + let changed = pass.run(&mut cfg); + + assert!(changed); + assert_eq!( + cfg.blocks[0].instructions[1], + binary("b", BinaryOp::Add, Value::Int(3), Value::Int(1)) + ); + assert!(!pass.run(&mut cfg)); + } +} diff --git a/src/codegen/inter/opt/cse.rs b/src/codegen/inter/opt/cse.rs new file mode 100644 index 0000000..ead84fd --- /dev/null +++ b/src/codegen/inter/opt/cse.rs @@ -0,0 +1,227 @@ +use std::collections::HashMap; + +use super::OptPass; +use crate::codegen::inter::{BinaryOp, Cfg, Instruction, Value}; + +/// Eliminacao local (intra-bloco) de subexpressoes comuns. +/// +/// Para cada bloco basico, mantem um mapa das expressoes `lhs op rhs` ja +/// computadas. Quando a mesma expressao aparece de novo sem que `lhs`/`rhs` +/// tenham sido modificados, a instrucao redundante e removida e os usos +/// futuros do seu destino sao reescritos para o destino ja calculado. +pub struct CsePass; + +impl OptPass for CsePass { + fn name(&self) -> &'static str { + "common-subexpression-elimination" + } + + fn run(&self, cfg: &mut Cfg) -> bool { + let mut changed = false; + + for block in &mut cfg.blocks { + changed |= eliminate_in_block(&mut block.instructions); + } + + changed + } +} + +type AvailableExprs = HashMap<(Value, BinaryOp, Value), String>; + +fn eliminate_in_block(instructions: &mut Vec) -> bool { + let mut changed = false; + let mut available: AvailableExprs = HashMap::new(); + let mut rename: HashMap = HashMap::new(); + let mut result = Vec::with_capacity(instructions.len()); + + for instruction in instructions.drain(..) { + match instruction { + Instruction::Binary { dst, op, lhs, rhs } => { + let lhs = resolve(&rename, lhs); + let rhs = resolve(&rename, rhs); + + invalidate(&mut available, &mut rename, &dst); + + let key = (lhs.clone(), op, rhs.clone()); + if let Some(cached) = available.get(&key) { + rename.insert(dst, cached.clone()); + changed = true; + continue; + } + + available.insert(key, dst.clone()); + result.push(Instruction::Binary { dst, op, lhs, rhs }); + } + Instruction::Assign { dst, value } => { + let value = resolve(&rename, value); + invalidate(&mut available, &mut rename, &dst); + result.push(Instruction::Assign { dst, value }); + } + other => result.push(other), + } + } + + *instructions = result; + changed +} + +fn resolve(rename: &HashMap, value: Value) -> Value { + match value { + Value::Temp(name) => { + let mut current = name; + while let Some(next) = rename.get(¤t) { + if *next == current { + break; + } + current = next.clone(); + } + Value::Temp(current) + } + other => other, + } +} + +/// Remove do cache qualquer expressao disponivel que dependa de `name`, +/// seja como operando ou como destino ja calculado, pois `name` esta +/// sendo redefinido. +fn invalidate(available: &mut AvailableExprs, rename: &mut HashMap, name: &str) { + available.retain(|(lhs, _, rhs), cached| { + !references(lhs, name) && !references(rhs, name) && cached != name + }); + rename.retain(|_, target| target != name); +} + +fn references(value: &Value, name: &str) -> bool { + matches!(value, Value::Temp(n) if n == name) +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::codegen::inter::BasicBlock; + + fn binary(dst: &str, op: BinaryOp, lhs: Value, rhs: Value) -> Instruction { + Instruction::Binary { + dst: dst.to_string(), + op, + lhs, + rhs, + } + } + + #[test] + fn eliminates_repeated_expression_in_same_block() { + let mut block = BasicBlock::new("entry"); + block.instructions.push(binary( + "t1", + BinaryOp::Add, + Value::Temp("a".into()), + Value::Temp("b".into()), + )); + block.instructions.push(binary( + "t2", + BinaryOp::Add, + Value::Temp("a".into()), + Value::Temp("b".into()), + )); + block.instructions.push(Instruction::Assign { + dst: "y".to_string(), + value: Value::Temp("t2".into()), + }); + + let changed = eliminate_in_block(&mut block.instructions); + + assert!(changed); + assert_eq!( + block.instructions, + vec![ + binary( + "t1", + BinaryOp::Add, + Value::Temp("a".into()), + Value::Temp("b".into()) + ), + Instruction::Assign { + dst: "y".to_string(), + value: Value::Temp("t1".into()), + }, + ] + ); + } + + #[test] + fn does_not_eliminate_when_operand_is_redefined_between_uses() { + let mut block = BasicBlock::new("entry"); + block.instructions.push(binary( + "t1", + BinaryOp::Add, + Value::Temp("a".into()), + Value::Temp("b".into()), + )); + block.instructions.push(Instruction::Assign { + dst: "a".to_string(), + value: Value::Int(5), + }); + block.instructions.push(binary( + "t2", + BinaryOp::Add, + Value::Temp("a".into()), + Value::Temp("b".into()), + )); + + let changed = eliminate_in_block(&mut block.instructions); + + assert!(!changed); + assert_eq!(block.instructions.len(), 3); + } + + #[test] + fn keeps_different_expressions_in_cache_independently() { + let mut block = BasicBlock::new("entry"); + block.instructions.push(binary( + "x", + BinaryOp::Mul, + Value::Temp("a".into()), + Value::Temp("b".into()), + )); + block.instructions.push(binary( + "y", + BinaryOp::Sub, + Value::Temp("a".into()), + Value::Temp("d".into()), + )); + + let changed = eliminate_in_block(&mut block.instructions); + + assert!(!changed); + assert_eq!(block.instructions.len(), 2); + } + + #[test] + fn pass_reports_changes_across_multiple_blocks() { + let mut cfg = Cfg::new(); + + let mut block = BasicBlock::new("entry"); + block.instructions.push(binary( + "t1", + BinaryOp::Add, + Value::Temp("a".into()), + Value::Temp("b".into()), + )); + block.instructions.push(binary( + "t2", + BinaryOp::Add, + Value::Temp("a".into()), + Value::Temp("b".into()), + )); + cfg.add_block(block); + + let pass = CsePass; + let changed = pass.run(&mut cfg); + + assert!(changed); + assert_eq!(cfg.blocks[0].instructions.len(), 1); + assert!(!pass.run(&mut cfg)); + } +} diff --git a/src/codegen/inter/opt/dce.rs b/src/codegen/inter/opt/dce.rs new file mode 100644 index 0000000..6353b99 --- /dev/null +++ b/src/codegen/inter/opt/dce.rs @@ -0,0 +1,24 @@ +use super::OptPass; +use crate::codegen::inter::{Cfg, Instruction}; + +pub struct DeadCodeElimPass; + +impl OptPass for DeadCodeElimPass { + fn name(&self) -> &'static str { + "dead-code-elimination" + } + + fn run(&self, cfg: &mut Cfg) -> bool { + let mut changed = false; + + for block in &mut cfg.blocks { + let before = block.instructions.len(); + block + .instructions + .retain(|instruction| !matches!(instruction, Instruction::Nop)); + changed |= block.instructions.len() != before; + } + + changed + } +} diff --git a/src/codegen/inter/opt/inline.rs b/src/codegen/inter/opt/inline.rs new file mode 100644 index 0000000..659a770 --- /dev/null +++ b/src/codegen/inter/opt/inline.rs @@ -0,0 +1,14 @@ +use super::OptPass; +use crate::codegen::inter::Cfg; + +pub struct InliningPass; + +impl OptPass for InliningPass { + fn name(&self) -> &'static str { + "inlining" + } + + fn run(&self, _cfg: &mut Cfg) -> bool { + false + } +} diff --git a/src/codegen/inter/opt/licm.rs b/src/codegen/inter/opt/licm.rs new file mode 100644 index 0000000..b7cbd24 --- /dev/null +++ b/src/codegen/inter/opt/licm.rs @@ -0,0 +1,347 @@ +use super::OptPass; +use crate::codegen::inter::{BasicBlock, Cfg, Instruction, Value}; +use std::collections::{HashMap, HashSet}; + +pub struct LoopInvariantCodeMotionPass; + +impl OptPass for LoopInvariantCodeMotionPass { + fn name(&self) -> &'static str { + "loop-invariant-code-motion" + } + + fn run(&self, cfg: &mut Cfg) -> bool { + let predecessors = get_predecessors(cfg); + let dominators = self.compute_dominators(cfg, &predecessors); + + let mut back_edges = Vec::new(); + for (i, block) in cfg.blocks.iter().enumerate() { + for &succ in &block.successors { + if let Some(doms) = dominators.get(&i) { + if doms.contains(&succ) { + back_edges.push((succ, i)); // (header, tail) + } + } + } + } + + for (header, tail) in back_edges { + let loop_body = self.get_loop_body(&predecessors, header, tail); + let invariants = self.compute_invariants(cfg, &loop_body, &dominators); + + if !invariants.is_empty() { + // Collect invariant destinations + let invariants_set: HashSet = invariants + .iter() + .filter_map(|inst| match inst { + Instruction::Assign { dst, .. } | Instruction::Binary { dst, .. } => { + Some(dst.clone()) + } + _ => None, + }) + .collect(); + + // 1. Remove the invariant instructions from the loop body blocks + for &block_id in &loop_body { + cfg.blocks[block_id].instructions.retain(|inst| match inst { + Instruction::Assign { dst, .. } | Instruction::Binary { dst, .. } => { + !invariants_set.contains(dst) + } + _ => true, + }); + } + + // 2. Create the preheader + let header_label = &cfg.blocks[header].label; + let preheader_label = format!("{}_preheader", header_label); + let mut preheader = BasicBlock::new(preheader_label); + preheader.instructions = invariants; + preheader.successors.push(header + 1); + + // 3. Find outside predecessors + let outside_preds: Vec = predecessors[header] + .iter() + .copied() + .filter(|pred| !loop_body.contains(pred)) + .collect(); + + // 4. Insert the preheader block at index `header` + cfg.blocks.insert(header, preheader); + + // 5. Update successors in other blocks + for (i, b) in cfg.blocks.iter_mut().enumerate() { + if i == header { + continue; + } + for succ in &mut b.successors { + if *succ >= header { + *succ += 1; + } + } + } + + // 6. Redirect outside predecessors + for pred in outside_preds { + let new_pred_idx = if pred >= header { pred + 1 } else { pred }; + let pred_block = &mut cfg.blocks[new_pred_idx]; + for succ in &mut pred_block.successors { + if *succ == header + 1 { + *succ = header; + } + } + } + + return true; + } + } + + false + } +} + +fn get_predecessors(cfg: &Cfg) -> Vec> { + let mut preds = vec![Vec::new(); cfg.blocks.len()]; + for (i, block) in cfg.blocks.iter().enumerate() { + for &succ in &block.successors { + if succ < preds.len() { + preds[succ].push(i); + } + } + } + preds +} + +fn uses_variable(inst: &Instruction, name: &str) -> bool { + match inst { + Instruction::Assign { value, .. } => { + matches!(value, Value::Temp(n) if n == name) + } + Instruction::Binary { lhs, rhs, .. } => { + matches!(lhs, Value::Temp(n) if n == name) || matches!(rhs, Value::Temp(n) if n == name) + } + Instruction::Nop => false, + } +} + +impl LoopInvariantCodeMotionPass { + fn compute_dominators( + &self, + cfg: &Cfg, + predecessors: &[Vec], + ) -> HashMap> { + let mut dominators: HashMap> = HashMap::new(); + let all_blocks: HashSet = (0..cfg.blocks.len()).collect(); + let entry = 0; + + dominators.insert(entry, vec![entry].into_iter().collect()); + + for block in 1..cfg.blocks.len() { + dominators.insert(block, all_blocks.clone()); + } + + let mut changed = true; + while changed { + changed = false; + for (block, preds) in predecessors.iter().enumerate().skip(1) { + if preds.is_empty() { + continue; + } + + let mut current_intersection = + dominators.get(&preds[0]).cloned().unwrap_or_default(); + for &pred in preds.iter().skip(1) { + if let Some(pred_doms) = dominators.get(&pred) { + current_intersection = current_intersection + .intersection(pred_doms) + .cloned() + .collect(); + } + } + current_intersection.insert(block); + + let old_doms = dominators.get(&block).unwrap(); + if ¤t_intersection != old_doms { + dominators.insert(block, current_intersection); + changed = true; + } + } + } + dominators + } + + fn get_loop_body( + &self, + predecessors: &[Vec], + header: usize, + tail: usize, + ) -> HashSet { + let mut loop_body = HashSet::new(); + loop_body.insert(header); + loop_body.insert(tail); + + let mut stack = vec![tail]; + + while let Some(node) = stack.pop() { + for &pred in &predecessors[node] { + if !loop_body.contains(&pred) { + loop_body.insert(pred); + stack.push(pred); + } + } + } + loop_body + } + + fn compute_invariants( + &self, + cfg: &Cfg, + loop_body: &HashSet, + dominators: &HashMap>, + ) -> Vec { + let mut invariants_set: HashSet = HashSet::new(); + let mut invariant_instrs: Vec = Vec::new(); + + let mut def_counts = HashMap::new(); + for &block_id in loop_body { + for inst in &cfg.blocks[block_id].instructions { + match inst { + Instruction::Assign { dst, .. } | Instruction::Binary { dst, .. } => { + *def_counts.entry(dst.clone()).or_insert(0) += 1; + } + _ => {} + } + } + } + + let mut changed = true; + while changed { + changed = false; + for &block_id in loop_body { + for inst in &cfg.blocks[block_id].instructions { + match inst { + Instruction::Assign { dst, value } => { + if invariants_set.contains(dst) { + continue; + } + if def_counts.get(dst) != Some(&1) { + continue; + } + if self.is_value_stable(value, loop_body, &invariants_set, cfg) + && self.is_safe_to_move(dst, block_id, loop_body, dominators, cfg) + { + invariants_set.insert(dst.clone()); + invariant_instrs.push(inst.clone()); + changed = true; + } + } + Instruction::Binary { + dst, + op: _, + lhs, + rhs, + } => { + if invariants_set.contains(dst) { + continue; + } + if def_counts.get(dst) != Some(&1) { + continue; + } + if self.is_value_stable(lhs, loop_body, &invariants_set, cfg) + && self.is_value_stable(rhs, loop_body, &invariants_set, cfg) + && self.is_safe_to_move(dst, block_id, loop_body, dominators, cfg) + { + invariants_set.insert(dst.clone()); + invariant_instrs.push(inst.clone()); + changed = true; + } + } + _ => {} + } + } + } + } + invariant_instrs + } + + fn is_safe_to_move( + &self, + dst: &str, + def_block: usize, + loop_body: &HashSet, + dominators: &HashMap>, + cfg: &Cfg, + ) -> bool { + // 1. Find exit blocks of the loop + let mut exit_blocks = HashSet::new(); + for &block_id in loop_body { + let block = &cfg.blocks[block_id]; + for &succ in &block.successors { + if !loop_body.contains(&succ) { + exit_blocks.insert(block_id); + } + } + } + + // 2. Check if def_block dominates all exit blocks + let dominates_all_exits = exit_blocks.iter().all(|&exit_block| { + if let Some(doms) = dominators.get(&exit_block) { + doms.contains(&def_block) + } else { + false + } + }); + + if dominates_all_exits { + return true; + } + + // 3. Check if variable is not used after/outside the loop + let mut used_after_loop = false; + for (i, block) in cfg.blocks.iter().enumerate() { + if !loop_body.contains(&i) { + for inst in &block.instructions { + if uses_variable(inst, dst) { + used_after_loop = true; + break; + } + } + } + } + + !used_after_loop + } + + fn is_value_stable( + &self, + val: &Value, + loop_body: &HashSet, + invariants: &HashSet, + cfg: &Cfg, + ) -> bool { + match val { + Value::Int(_) => true, + Value::Temp(name) => { + if invariants.contains(name) { + return true; + } + self.is_defined_outside_loop(name, loop_body, cfg) + } + } + } + + fn is_defined_outside_loop(&self, name: &str, loop_body: &HashSet, cfg: &Cfg) -> bool { + for &block_id in loop_body { + let block = &cfg.blocks[block_id]; + for inst in &block.instructions { + match inst { + Instruction::Assign { dst, .. } | Instruction::Binary { dst, .. } + if dst == name => + { + return false; + } + _ => {} + } + } + } + true + } +} diff --git a/src/codegen/inter/opt/mod.rs b/src/codegen/inter/opt/mod.rs new file mode 100644 index 0000000..bfefb2d --- /dev/null +++ b/src/codegen/inter/opt/mod.rs @@ -0,0 +1,193 @@ +use super::Cfg; + +pub mod constant_fold; +pub mod copy_prop; +pub mod cse; +pub mod dce; +pub mod inline; +pub mod licm; + +pub use constant_fold::ConstantFoldPass; +pub use copy_prop::CopyPropagationPass; +pub use cse::CsePass; +pub use dce::DeadCodeElimPass; +pub use inline::InliningPass; +pub use licm::LoopInvariantCodeMotionPass; + +/// Interface comum para passes de otimizacao sobre o CFG/TAC intermediario. +/// +/// Um pass deve retornar `true` quando altera o `Cfg`, permitindo que o +/// `PassManager` itere ate ponto fixo. +pub trait OptPass { + fn name(&self) -> &'static str; + + fn run(&self, cfg: &mut Cfg) -> bool; +} + +pub struct PassManager { + passes: Vec>, +} + +impl PassManager { + pub fn new() -> Self { + Self { passes: Vec::new() } + } + + pub fn add(&mut self, pass: P) { + self.passes.push(Box::new(pass)); + } + + pub fn is_empty(&self) -> bool { + self.passes.is_empty() + } + + pub fn len(&self) -> usize { + self.passes.len() + } + + pub fn pass_names(&self) -> Vec<&'static str> { + self.passes.iter().map(|pass| pass.name()).collect() + } + + /// Executa todos os passes ate ponto fixo ou `max_iter` iteracoes. + /// + /// Retorna o numero de iteracoes completas executadas. + pub fn run(&self, cfg: &mut Cfg, max_iter: usize) -> usize { + let mut iterations = 0; + + for _ in 0..max_iter { + let mut changed = false; + + for pass in &self.passes { + changed |= pass.run(cfg); + } + + iterations += 1; + + if !changed { + break; + } + } + + iterations + } +} + +impl Default for PassManager { + fn default() -> Self { + Self::new() + } +} + +#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)] +pub enum OptLevel { + #[default] + O0, + O1, + O2, + O3, +} + +impl OptLevel { + pub fn parse(value: &str) -> Option { + match value { + "0" | "O0" | "-O0" => Some(Self::O0), + "1" | "O1" | "-O1" => Some(Self::O1), + "2" | "O2" | "-O2" => Some(Self::O2), + "3" | "O3" | "-O3" => Some(Self::O3), + _ => None, + } + } +} + +pub fn pipeline_for_level(level: OptLevel) -> PassManager { + let mut pm = PassManager::new(); + + match level { + OptLevel::O0 => {} + OptLevel::O1 => { + pm.add(ConstantFoldPass); + pm.add(DeadCodeElimPass); + } + OptLevel::O2 => { + pm.add(ConstantFoldPass); + pm.add(DeadCodeElimPass); + pm.add(CopyPropagationPass); + pm.add(CsePass); + } + OptLevel::O3 => { + pm.add(ConstantFoldPass); + pm.add(DeadCodeElimPass); + pm.add(CopyPropagationPass); + pm.add(CsePass); + pm.add(LoopInvariantCodeMotionPass); + pm.add(InliningPass); + } + } + + pm +} + +pub fn default_pipeline() -> PassManager { + pipeline_for_level(OptLevel::O2) +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::codegen::inter::{BasicBlock, BinaryOp, Instruction, Value}; + + #[test] + fn pass_manager_runs_until_fixed_point() { + let mut cfg = Cfg::new(); + let mut block = BasicBlock::new("entry"); + block.instructions.push(Instruction::Binary { + dst: "t0".to_string(), + op: BinaryOp::Add, + lhs: Value::Int(2), + rhs: Value::Int(3), + }); + cfg.add_block(block); + + let mut pm = PassManager::new(); + pm.add(ConstantFoldPass); + + assert_eq!(pm.run(&mut cfg, 10), 2); + assert_eq!( + cfg.blocks[0].instructions[0], + Instruction::Assign { + dst: "t0".to_string(), + value: Value::Int(5), + } + ); + } + + #[test] + fn opt_level_selects_expected_pipeline() { + assert!(pipeline_for_level(OptLevel::O0).is_empty()); + assert_eq!( + pipeline_for_level(OptLevel::O1).pass_names(), + vec!["constant-fold", "dead-code-elimination"] + ); + assert_eq!( + pipeline_for_level(OptLevel::O2).pass_names(), + vec![ + "constant-fold", + "dead-code-elimination", + "copy-propagation", + "common-subexpression-elimination", + ] + ); + assert_eq!( + pipeline_for_level(OptLevel::O3).pass_names(), + vec![ + "constant-fold", + "dead-code-elimination", + "copy-propagation", + "common-subexpression-elimination", + "loop-invariant-code-motion", + "inlining", + ] + ); + } +} diff --git a/src/codegen/inter/optimizations.rs b/src/codegen/inter/optimizations.rs new file mode 100644 index 0000000..d96b12f --- /dev/null +++ b/src/codegen/inter/optimizations.rs @@ -0,0 +1,1063 @@ +use std::collections::{HashMap, HashSet}; + +use crate::{ + common::ast::expr::{BinOp, UnOp}, + ir::tac::{ConstValue, LabelId, Operand, TacInstr, TempId}, +}; + +// ─── Liveness ──────────────────────────────────────────────────────────────── + +/// Resultado da análise de vida útil intraprocessual. +/// +/// `live_before[i]` contém o conjunto de `TempId` que estão vivos +/// *imediatamente antes* da instrução de índice `i`. +pub struct LivenessInfo { + pub live_before: Vec>, +} + +/// Calcula `LivenessInfo` para uma sequência de instruções TAC, respeitando +/// o fluxo de controle (`Jump`/`CondJump`/`Label`). +/// +/// A análise é feita em dois níveis: +/// 1. Particiona `instrs` em blocos básicos e monta os sucessores de cada +/// bloco a partir da última instrução (`Jump`, `CondJump`, `Return` ou +/// fall-through). +/// 2. Resolve as equações de liveness por ponto fixo entre blocos: +/// `live_out(B) = ⋃ live_in(S)` para `S` sucessor de `B`, e +/// `live_in(B) = use(B) ∪ (live_out(B) - def(B))`. +/// +/// Em seguida, refina `live_before` por instrução dentro de cada bloco via +/// varredura backward, partindo de `live_out(B)`. +/// +/// Sem essa atenção ao fluxo de controle, uma definição feita em um ramo de +/// `if/else` e usada somente após o merge poderia parecer morta na varredura +/// puramente linear — levando o DCE a remover código observável. +pub fn compute_liveness(instrs: &[TacInstr]) -> LivenessInfo { + let n = instrs.len(); + if n == 0 { + return LivenessInfo { + live_before: Vec::new(), + }; + } + + let blocks = split_into_blocks(instrs); + let label_to_block = label_block_map(instrs, &blocks); + let successors = compute_successors(instrs, &blocks, &label_to_block); + + let (use_b, def_b) = blocks + .iter() + .map(|range| local_use_def(&instrs[range.clone()])) + .collect::<(Vec<_>, Vec<_>)>(); + + let mut live_in = vec![HashSet::new(); blocks.len()]; + let mut live_out = vec![HashSet::new(); blocks.len()]; + + loop { + let mut changed = false; + + for b in (0..blocks.len()).rev() { + let mut out = HashSet::new(); + for &succ in &successors[b] { + out.extend(live_in[succ].iter().copied()); + } + + let mut inb = use_b[b].clone(); + for t in &out { + if !def_b[b].contains(t) { + inb.insert(*t); + } + } + + if out != live_out[b] || inb != live_in[b] { + changed = true; + } + live_out[b] = out; + live_in[b] = inb; + } + + if !changed { + break; + } + } + + let mut live_before = vec![HashSet::new(); n]; + for (b, range) in blocks.iter().enumerate() { + let mut live = live_out[b].clone(); + + for i in range.clone().rev() { + live_before[i] = live.clone(); + + if let Some(def) = instr_def(&instrs[i]) { + live.remove(&def); + } + for used in instr_uses(&instrs[i]) { + live.insert(used); + } + } + } + + LivenessInfo { live_before } +} + +/// Particiona as instruções em blocos básicos. Um bloco começa em `instrs[0]`, +/// em todo `Label`, e logo após `Jump`/`CondJump`/`Return`. +fn split_into_blocks(instrs: &[TacInstr]) -> Vec> { + let n = instrs.len(); + let mut starts: Vec = vec![0]; + + for (i, instr) in instrs.iter().enumerate() { + match instr { + TacInstr::Label(_) => starts.push(i), + TacInstr::Jump { .. } | TacInstr::CondJump { .. } | TacInstr::Return { .. } + if i + 1 < n => + { + starts.push(i + 1); + } + _ => {} + } + } + + starts.sort_unstable(); + starts.dedup(); + + let mut blocks = Vec::with_capacity(starts.len()); + for (idx, &start) in starts.iter().enumerate() { + let end = starts.get(idx + 1).copied().unwrap_or(n); + if start < end { + blocks.push(start..end); + } + } + blocks +} + +/// Mapeia cada `LabelId` para o índice do bloco que ele inicia. +fn label_block_map( + instrs: &[TacInstr], + blocks: &[std::ops::Range], +) -> HashMap { + let mut map = HashMap::new(); + for (b, range) in blocks.iter().enumerate() { + if let TacInstr::Label(l) = &instrs[range.start] { + map.insert(*l, b); + } + } + map +} + +/// Calcula os blocos sucessores de cada bloco, a partir de sua última instrução. +fn compute_successors( + instrs: &[TacInstr], + blocks: &[std::ops::Range], + label_to_block: &HashMap, +) -> Vec> { + blocks + .iter() + .enumerate() + .map(|(b, range)| match &instrs[range.end - 1] { + TacInstr::Jump { label } => label_to_block.get(label).copied().into_iter().collect(), + TacInstr::CondJump { + then_label, + else_label, + .. + } => [then_label, else_label] + .into_iter() + .filter_map(|l| label_to_block.get(l).copied()) + .collect(), + TacInstr::Return { .. } => Vec::new(), + _ => { + if b + 1 < blocks.len() { + vec![b + 1] + } else { + Vec::new() + } + } + }) + .collect() +} + +/// Calcula `use(B)` (usos não precedidos de definição dentro de `B`, i.e. +/// "upward-exposed") e `def(B)` (todo `TempId` definido em `B`) via varredura +/// backward local ao bloco. +fn local_use_def(block: &[TacInstr]) -> (HashSet, HashSet) { + let mut use_b = HashSet::new(); + let mut def_b = HashSet::new(); + + for instr in block.iter().rev() { + if let Some(def) = instr_def(instr) { + def_b.insert(def); + use_b.remove(&def); + } + for used in instr_uses(instr) { + use_b.insert(used); + } + } + + (use_b, def_b) +} + +/// Retorna o `TempId` definido pela instrução, se houver. +/// +/// `Operand::Var` **nunca** é retornado aqui — variáveis nomeadas do programa C +/// têm semântica observável e nunca são candidatas à eliminação pelo DCE. +fn instr_def(instr: &TacInstr) -> Option { + match instr { + TacInstr::BinOp { dst, .. } => Some(*dst), + TacInstr::UnOp { dst, .. } => Some(*dst), + TacInstr::Copy { + dst: Operand::Temp(t), + .. + } => Some(*t), + TacInstr::Call { dst: Some(t), .. } => Some(*t), + _ => None, + } +} + +/// Retorna todos os `TempId` *usados* como operandos da instrução. +fn instr_uses(instr: &TacInstr) -> Vec { + let mut uses = Vec::new(); + + let push = |uses: &mut Vec, op: &Operand| { + if let Operand::Temp(t) = op { + uses.push(*t); + } + }; + + match instr { + TacInstr::BinOp { lhs, rhs, .. } => { + push(&mut uses, lhs); + push(&mut uses, rhs); + } + TacInstr::UnOp { src, .. } => push(&mut uses, src), + TacInstr::Copy { src, .. } => push(&mut uses, src), + TacInstr::CondJump { cond, .. } => push(&mut uses, cond), + TacInstr::Call { args, .. } => { + for arg in args { + push(&mut uses, arg); + } + } + TacInstr::Return { val: Some(v), .. } => push(&mut uses, v), + _ => {} + } + + uses +} + +// ─── Constant Folding ──────────────────────────────────────────────────────── + +/// Avalia em tempo de compilação operações cujos dois operandos são constantes inteiras. +/// +/// Retorna `true` se alguma instrução foi alterada. +/// +/// **Não** dobra `ConstValue::Double` (evitar divergência de precisão host vs target). +/// **Não** dobra shift com rhs inválido, nem divisão por zero (UB em C). +pub fn constant_fold(instrs: &mut [TacInstr]) -> bool { + let mut changed = false; + + for instr in instrs.iter_mut() { + match instr { + TacInstr::BinOp { + dst, + op, + lhs, + rhs, + ty, + } => { + if let Some(result) = fold_binop(op, lhs, rhs) { + *instr = TacInstr::Copy { + dst: Operand::Temp(*dst), + src: Operand::Const(result), + ty: ty.clone(), + }; + changed = true; + } + } + TacInstr::UnOp { dst, op, src, ty } => { + if let Some(result) = fold_unop(op, src) { + *instr = TacInstr::Copy { + dst: Operand::Temp(*dst), + src: Operand::Const(result), + ty: ty.clone(), + }; + changed = true; + } + } + _ => {} + } + } + + changed +} + +fn fold_binop(op: &BinOp, lhs: &Operand, rhs: &Operand) -> Option { + let (Operand::Const(ConstValue::Int(l)), Operand::Const(ConstValue::Int(r))) = (lhs, rhs) + else { + return None; + }; + let l = *l; + let r = *r; + + let result = match op { + // Aritmética + BinOp::Add => l.checked_add(r)?, + BinOp::Sub => l.checked_sub(r)?, + BinOp::Mul => l.checked_mul(r)?, + BinOp::Div => { + if r == 0 { + return None; // divisão por zero — UB, preservar + } + l.checked_div(r)? + } + BinOp::Mod => { + if r == 0 { + return None; // divisão por zero — UB, preservar + } + l.checked_rem(r)? + } + + // Relacionais — resultado é 0 ou 1 + BinOp::Eq => (l == r) as i64, + BinOp::Neq => (l != r) as i64, + BinOp::Less => (l < r) as i64, + BinOp::Greater => (l > r) as i64, + BinOp::Leq => (l <= r) as i64, + BinOp::Geq => (l >= r) as i64, + + // Lógicos — ambos constantes, curto-circuito não se aplica + BinOp::And => ((l != 0) && (r != 0)) as i64, + BinOp::Or => ((l != 0) || (r != 0)) as i64, + + // Bitwise + BinOp::BitAnd => l & r, + BinOp::BitOr => l | r, + BinOp::BitXor => l ^ r, + + // Shift — UB se rhs < 0 ou rhs >= 64 + BinOp::Shl => { + if !(0..64).contains(&r) { + return None; + } + l.checked_shl(r as u32)? + } + BinOp::Shr => { + if !(0..64).contains(&r) { + return None; + } + l.checked_shr(r as u32)? + } + }; + + Some(ConstValue::Int(result)) +} + +fn fold_unop(op: &UnOp, src: &Operand) -> Option { + let Operand::Const(ConstValue::Int(v)) = src else { + return None; + }; + let v = *v; + + let result = match op { + UnOp::Neg => v.checked_neg()?, + UnOp::BitNot => !v, + UnOp::Not => (v == 0) as i64, + // Deref e AddrOf não podem ser dobrados em tempo de compilação + UnOp::Deref | UnOp::AddrOf => return None, + }; + + Some(ConstValue::Int(result)) +} + +// ─── Constant Propagation ──────────────────────────────────────────────────── + +/// Propaga valores constantes conhecidos de temporários para seus usos posteriores. +/// +/// Mantém um mapa `TempId → ConstValue`. Ao encontrar `Copy { dst: Temp(t), src: Const(v) }`, +/// registra `t → v`. Ao encontrar operandos que referenciam temporários mapeados, +/// substitui pelo `Const` correspondente. Ao encontrar redefinição não-constante de `t`, +/// invalida a entrada do mapa. +/// +/// Retorna `true` se alguma substituição foi feita. +pub fn constant_propagation(instrs: &mut [TacInstr]) -> bool { + let mut changed = false; + let mut const_map: HashMap = HashMap::new(); + + for instr in instrs.iter_mut() { + // 1. Substituir usos de temporários conhecidos pelos seus valores constantes. + propagate_uses(instr, &const_map, &mut changed); + + // 2. Atualizar o mapa conforme o efeito da instrução. + match instr { + // Copy com fonte constante: registrar o mapeamento. + TacInstr::Copy { + dst: Operand::Temp(t), + src: Operand::Const(v), + .. + } => { + const_map.insert(*t, v.clone()); + } + // Copy com fonte não-constante: invalidar — o temporário passou a + // ter um valor desconhecido. + TacInstr::Copy { + dst: Operand::Temp(t), + src: _, + .. + } => { + const_map.remove(t); + } + // Qualquer outra instrução que redefine um temporário invalida a entrada. + TacInstr::BinOp { dst, .. } | TacInstr::UnOp { dst, .. } => { + const_map.remove(dst); + } + TacInstr::Call { dst: Some(t), .. } => { + const_map.remove(t); + } + _ => {} + } + } + + changed +} + +fn propagate_uses( + instr: &mut TacInstr, + const_map: &HashMap, + changed: &mut bool, +) { + let subst = |op: &mut Operand, changed: &mut bool| { + if let Operand::Temp(t) = op { + if let Some(v) = const_map.get(t) { + *op = Operand::Const(v.clone()); + *changed = true; + } + } + }; + + match instr { + TacInstr::BinOp { lhs, rhs, .. } => { + subst(lhs, changed); + subst(rhs, changed); + } + TacInstr::UnOp { src, .. } => subst(src, changed), + TacInstr::Copy { src, .. } => subst(src, changed), + TacInstr::CondJump { cond, .. } => subst(cond, changed), + TacInstr::Call { args, .. } => { + for arg in args.iter_mut() { + subst(arg, changed); + } + } + TacInstr::Return { val: Some(v), .. } => subst(v, changed), + _ => {} + } +} + +// ─── Dead Code Elimination ─────────────────────────────────────────────────── + +/// Remove instruções TAC cujo resultado nunca é lido e que não possuem side-effects. +/// +/// Uma instrução é eliminável se: +/// 1. Define um `TempId` (não `Var`). +/// 2. Esse `TempId` não está vivo após a instrução (`live_before[i+1]`). +/// 3. Não é `Call`, `Return`, `Jump`, `CondJump`, `Label`, +/// nem `Copy` com destino `Operand::Var`. +/// +/// Retorna `true` se alguma instrução foi removida. +pub fn dead_code_eliminate(instrs: &mut Vec, liveness: &LivenessInfo) -> bool { + let n = instrs.len(); + let mut keep = vec![true; n]; + + for i in 0..n { + if has_side_effects(&instrs[i]) { + continue; + } + + if let Some(def) = instr_def(&instrs[i]) { + // O algoritmo backward armazena em live_before[i] o conjunto de + // temporários vivos *antes* de remover o def e adicionar os uses de i. + // Ou seja, live_before[i] é o live-OUT da instrução i — os temporários + // vivos logo após a execução de i. + // Se o def não aparece em live_before[i], ele nunca é lido depois. + if !liveness.live_before[i].contains(&def) { + keep[i] = false; + } + } + } + + let before = instrs.len(); + let mut i = 0; + instrs.retain(|_| { + let k = keep[i]; + i += 1; + k + }); + + instrs.len() != before +} + +/// Retorna `true` para instruções que não podem ser removidas mesmo se o resultado +/// não for usado — elas têm efeitos colaterais observáveis. +fn has_side_effects(instr: &TacInstr) -> bool { + matches!( + instr, + TacInstr::Call { .. } + | TacInstr::Return { .. } + | TacInstr::Jump { .. } + | TacInstr::CondJump { .. } + | TacInstr::Label(_) + | TacInstr::Copy { + dst: Operand::Var(_), + .. + } + | TacInstr::Copy { + dst: Operand::Global(_), + .. + } + ) +} + +// ─── Pipeline ──────────────────────────────────────────────────────────────── + +const MAX_ITER: usize = 10; + +/// Executa os passes de otimização sobre as instruções de uma função até ponto fixo. +pub fn optimize_function(instrs: &mut Vec) { + for _ in 0..MAX_ITER { + let mut changed = false; + + changed |= constant_fold(instrs); + changed |= constant_propagation(instrs); + + let liveness = compute_liveness(instrs); + changed |= dead_code_eliminate(instrs, &liveness); + + if !changed { + break; + } + } +} + +// ─── Tests ─────────────────────────────────────────────────────────────────── + +#[cfg(test)] +mod tests { + use super::*; + use crate::{ + common::ast::ast::Type, + common::ast::expr::{BinOp, UnOp}, + ir::tac::{ConstValue, LabelId, Operand, TacInstr, TempId}, + }; + + fn int(v: i64) -> Operand { + Operand::Const(ConstValue::Int(v)) + } + + fn temp(n: u32) -> Operand { + Operand::Temp(TempId(n)) + } + + fn var(name: &str) -> Operand { + Operand::Var(name.to_string()) + } + + // ── Constant Fold ── + + #[test] + fn fold_add_two_consts() { + let mut instrs = vec![TacInstr::BinOp { + dst: TempId(0), + op: BinOp::Add, + lhs: int(2), + rhs: int(3), + ty: Type::Int, + }]; + assert!(constant_fold(&mut instrs)); + assert_eq!( + instrs[0], + TacInstr::Copy { + dst: temp(0), + src: int(5), + ty: Type::Int, + } + ); + } + + #[test] + fn fold_nested_expression() { + // 2 + 3*4 → t0=3*4, t1=2+t0 + let mut instrs = vec![ + TacInstr::BinOp { + dst: TempId(0), + op: BinOp::Mul, + lhs: int(3), + rhs: int(4), + ty: Type::Int, + }, + TacInstr::BinOp { + dst: TempId(1), + op: BinOp::Add, + lhs: int(2), + rhs: temp(0), + ty: Type::Int, + }, + ]; + // Após fold: t0 = 12, t1 = 2 + t0 (t0 ainda é temp, precisa de propagation) + constant_fold(&mut instrs); + assert_eq!( + instrs[0], + TacInstr::Copy { + dst: temp(0), + src: int(12), + ty: Type::Int, + } + ); + + // Após propagation: t1 = 2 + 12 + constant_propagation(&mut instrs); + // Após segundo fold: t1 = 14 + constant_fold(&mut instrs); + assert_eq!( + instrs[1], + TacInstr::Copy { + dst: temp(1), + src: int(14), + ty: Type::Int, + } + ); + } + + #[test] + fn fold_relational_less() { + let mut instrs = vec![TacInstr::BinOp { + dst: TempId(0), + op: BinOp::Less, + lhs: int(3), + rhs: int(5), + ty: Type::Int, + }]; + assert!(constant_fold(&mut instrs)); + assert_eq!( + instrs[0], + TacInstr::Copy { + dst: temp(0), + src: int(1), + ty: Type::Int, + } + ); + } + + #[test] + fn fold_relational_eq_false() { + let mut instrs = vec![TacInstr::BinOp { + dst: TempId(0), + op: BinOp::Eq, + lhs: int(3), + rhs: int(5), + ty: Type::Int, + }]; + assert!(constant_fold(&mut instrs)); + assert_eq!( + instrs[0], + TacInstr::Copy { + dst: temp(0), + src: int(0), + ty: Type::Int, + } + ); + } + + #[test] + fn fold_bitwise_and() { + // 0b1010 & 0b1100 = 0b1000 = 8 + let mut instrs = vec![TacInstr::BinOp { + dst: TempId(0), + op: BinOp::BitAnd, + lhs: int(0b1010), + rhs: int(0b1100), + ty: Type::Int, + }]; + assert!(constant_fold(&mut instrs)); + assert_eq!( + instrs[0], + TacInstr::Copy { + dst: temp(0), + src: int(8), + ty: Type::Int, + } + ); + } + + #[test] + fn fold_unary_neg() { + let mut instrs = vec![TacInstr::UnOp { + dst: TempId(0), + op: UnOp::Neg, + src: int(7), + ty: Type::Int, + }]; + assert!(constant_fold(&mut instrs)); + assert_eq!( + instrs[0], + TacInstr::Copy { + dst: temp(0), + src: int(-7), + ty: Type::Int, + } + ); + } + + #[test] + fn fold_unary_not() { + // !0 = 1, !5 = 0 + let mut instrs = vec![ + TacInstr::UnOp { + dst: TempId(0), + op: UnOp::Not, + src: int(0), + ty: Type::Int, + }, + TacInstr::UnOp { + dst: TempId(1), + op: UnOp::Not, + src: int(5), + ty: Type::Int, + }, + ]; + constant_fold(&mut instrs); + assert_eq!( + instrs[0], + TacInstr::Copy { + dst: temp(0), + src: int(1), + ty: Type::Int, + } + ); + assert_eq!( + instrs[1], + TacInstr::Copy { + dst: temp(1), + src: int(0), + ty: Type::Int, + } + ); + } + + #[test] + fn fold_div_by_zero_preserved() { + let original = TacInstr::BinOp { + dst: TempId(0), + op: BinOp::Div, + lhs: int(10), + rhs: int(0), + ty: Type::Int, + }; + let mut instrs = vec![original.clone()]; + assert!(!constant_fold(&mut instrs)); + assert_eq!(instrs[0], original); + } + + #[test] + fn fold_shl_negative_rhs_preserved() { + let original = TacInstr::BinOp { + dst: TempId(0), + op: BinOp::Shl, + lhs: int(1), + rhs: int(-1), + ty: Type::Int, + }; + let mut instrs = vec![original.clone()]; + assert!(!constant_fold(&mut instrs)); + assert_eq!(instrs[0], original); + } + + #[test] + fn fold_shl_rhs_64_preserved() { + let original = TacInstr::BinOp { + dst: TempId(0), + op: BinOp::Shl, + lhs: int(1), + rhs: int(64), + ty: Type::Int, + }; + let mut instrs = vec![original.clone()]; + assert!(!constant_fold(&mut instrs)); + assert_eq!(instrs[0], original); + } + + #[test] + fn fold_double_operand_not_folded() { + let original = TacInstr::BinOp { + dst: TempId(0), + op: BinOp::Add, + lhs: Operand::Const(ConstValue::Double(1.0)), + rhs: Operand::Const(ConstValue::Double(2.0)), + ty: Type::Int, + }; + let mut instrs = vec![original.clone()]; + assert!(!constant_fold(&mut instrs)); + assert_eq!(instrs[0], original); + } + + // ── Constant Propagation ── + + #[test] + fn propagation_simple_chain() { + // t0 = 5; t1 = t0 + 3 → t1 = 5 + 3 → (fold) t1 = 8 + let mut instrs = vec![ + TacInstr::Copy { + dst: temp(0), + src: int(5), + ty: Type::Int, + }, + TacInstr::BinOp { + dst: TempId(1), + op: BinOp::Add, + lhs: temp(0), + rhs: int(3), + ty: Type::Int, + }, + ]; + assert!(constant_propagation(&mut instrs)); + assert_eq!( + instrs[1], + TacInstr::BinOp { + dst: TempId(1), + op: BinOp::Add, + lhs: int(5), + rhs: int(3), + ty: Type::Int, + } + ); + // Após fold: t1 = 8 + constant_fold(&mut instrs); + assert_eq!( + instrs[1], + TacInstr::Copy { + dst: temp(1), + src: int(8), + ty: Type::Int, + } + ); + } + + #[test] + fn propagation_invalidated_by_redefinition() { + // t0 = 5; t0 = call f(); t1 = t0 + 1 → t1 NÃO deve ser dobrado + let mut instrs = vec![ + TacInstr::Copy { + dst: temp(0), + src: int(5), + ty: Type::Int, + }, + TacInstr::Call { + dst: Some(TempId(0)), + fn_name: "f".to_string(), + args: vec![], + }, + TacInstr::BinOp { + dst: TempId(1), + op: BinOp::Add, + lhs: temp(0), + rhs: int(1), + ty: Type::Int, + }, + ]; + // propagation não deve substituir t0 na última instrução + constant_propagation(&mut instrs); + assert!(matches!( + &instrs[2], + TacInstr::BinOp { + lhs: Operand::Temp(_), + .. + } + )); + } + + // ── DCE ── + + #[test] + fn dce_removes_unused_temp() { + // t0 = 2 + 3 (nunca lido) + let mut instrs = vec![TacInstr::BinOp { + dst: TempId(0), + op: BinOp::Add, + lhs: int(2), + rhs: int(3), + ty: Type::Int, + }]; + let liveness = compute_liveness(&instrs); + assert!(dead_code_eliminate(&mut instrs, &liveness)); + assert!(instrs.is_empty()); + } + + #[test] + fn dce_preserves_call_even_if_unused() { + // t0 = call f() — side-effect, não pode ser removido + let mut instrs = vec![TacInstr::Call { + dst: Some(TempId(0)), + fn_name: "f".to_string(), + args: vec![], + }]; + let liveness = compute_liveness(&instrs); + assert!(!dead_code_eliminate(&mut instrs, &liveness)); + assert_eq!(instrs.len(), 1); + } + + #[test] + fn dce_preserves_var_assignment() { + // x = 10 (Var — semântica observável, nunca eliminar) + let mut instrs = vec![TacInstr::Copy { + dst: var("x"), + src: int(10), + ty: Type::Int, + }]; + let liveness = compute_liveness(&instrs); + assert!(!dead_code_eliminate(&mut instrs, &liveness)); + assert_eq!(instrs.len(), 1); + } + + #[test] + fn dce_keeps_used_temp() { + // t0 = 5; return t0 → t0 está vivo, não deve ser removido + let mut instrs = vec![ + TacInstr::Copy { + dst: temp(0), + src: int(5), + ty: Type::Int, + }, + TacInstr::Return { + val: Some(temp(0)), + ty: None, + }, + ]; + let liveness = compute_liveness(&instrs); + assert!(!dead_code_eliminate(&mut instrs, &liveness)); + assert_eq!(instrs.len(), 2); + } + + // ── Liveness com fluxo de controle ── + + #[test] + fn dce_keeps_def_used_only_after_if_else_merge() { + // if (cond) t0 = 5; else t0 = 10; + // return t0; + // + // Sem considerar o fluxo de controle, uma varredura puramente linear + // marcaria `t0 = 5` (no ramo then) como morto, pois nada entre ele e + // o fim do bloco then o lê — mas `t0` é lido após o merge. + let mut instrs = vec![ + TacInstr::CondJump { + cond: temp(9), + then_label: LabelId(1), + else_label: LabelId(2), + }, + TacInstr::Label(LabelId(1)), + TacInstr::Copy { + dst: temp(0), + src: int(5), + ty: Type::Int, + }, + TacInstr::Jump { label: LabelId(3) }, + TacInstr::Label(LabelId(2)), + TacInstr::Copy { + dst: temp(0), + src: int(10), + ty: Type::Int, + }, + TacInstr::Label(LabelId(3)), + TacInstr::Return { + val: Some(temp(0)), + ty: None, + }, + ]; + + let liveness = compute_liveness(&instrs); + assert!(!dead_code_eliminate(&mut instrs, &liveness)); + assert_eq!(instrs.len(), 8); + } + + // ── Pipeline / Ponto Fixo ── + + #[test] + fn optimize_function_full_pipeline() { + // int x = 2 + 3 * 4; (x nunca lido, sem return) + // t0 = 3 * 4 + // t1 = 2 + t0 + // x = t1 + let mut instrs = vec![ + TacInstr::BinOp { + dst: TempId(0), + op: BinOp::Mul, + lhs: int(3), + rhs: int(4), + ty: Type::Int, + }, + TacInstr::BinOp { + dst: TempId(1), + op: BinOp::Add, + lhs: int(2), + rhs: temp(0), + ty: Type::Int, + }, + TacInstr::Copy { + dst: var("x"), + src: temp(1), + ty: Type::Int, + }, + ]; + + optimize_function(&mut instrs); + + // Após ponto fixo: t0 e t1 eliminados; x = 14 + assert_eq!(instrs.len(), 1); + assert_eq!( + instrs[0], + TacInstr::Copy { + dst: var("x"), + src: int(14), + ty: Type::Int, + } + ); + } + + #[test] + fn optimize_function_preserves_call_chain() { + // t0 = call f(); t1 = t0 + 0; return t1 + // t1 = t0 + 0 pode ser simplificado mas t0 tem side-effect → não eliminar call + let mut instrs = vec![ + TacInstr::Call { + dst: Some(TempId(0)), + fn_name: "f".to_string(), + args: vec![], + }, + TacInstr::BinOp { + dst: TempId(1), + op: BinOp::Add, + lhs: temp(0), + rhs: int(0), + ty: Type::Int, + }, + TacInstr::Return { + val: Some(temp(1)), + ty: None, + }, + ]; + + optimize_function(&mut instrs); + + // call f() deve ser preservado + assert!(instrs + .iter() + .any(|i| matches!(i, TacInstr::Call { fn_name, .. } if fn_name == "f"))); + } + + #[test] + fn optimize_function_side_effect_label_preserved() { + let mut instrs = vec![ + TacInstr::Label(LabelId(0)), + TacInstr::Return { + val: None, + ty: None, + }, + ]; + optimize_function(&mut instrs); + assert_eq!(instrs.len(), 2); + } +} diff --git a/src/codegen/last/abi.rs b/src/codegen/last/abi.rs new file mode 100644 index 0000000..747f6a8 --- /dev/null +++ b/src/codegen/last/abi.rs @@ -0,0 +1,61 @@ +//! Auxiliares da convencao de chamada System V AMD64 ABI. +//! +//! Apenas a parte inteira do ABI esta coberta, que e o escopo do backend +//! atual (variaveis/temps de 64 bits). Pontos flutuantes (XMM) e vector args +//! ficam fora de escopo por enquanto. +//! +//! Resumo do System V AMD64 (inteiros): +//! - Argumentos inteiros: `rdi`, `rsi`, `rdx`, `rcx`, `r8`, `r9` (depois stack) +//! - Retorno: `rax` +//! - caller-saved: `rax`, `rcx`, `rdx`, `rsi`, `rdi`, `r8`-`r11` +//! - callee-saved: `rbx`, `rbp`, `r12`-`r15` + +/// Registradores (64 bits) usados para passar argumentos inteiros, em ordem. +pub const ARG_REGISTERS: [&str; 6] = ["rdi", "rsi", "rdx", "rcx", "r8", "r9"]; + +/// Registrador de retorno de valores inteiros. +pub const RETURN_REGISTER: &str = "rax"; + +/// Quantidade maxima de argumentos passados em registrador (antes da stack). +pub const MAX_REG_ARGS: usize = ARG_REGISTERS.len(); + +/// Retorna o registrador de argumento para o indice dado, ou `None` se o +/// argumento deve ser passado pela stack (indice >= 6). +pub fn arg_register(index: usize) -> Option<&'static str> { + ARG_REGISTERS.get(index).copied() +} + +/// Offset positivo, a partir de `%rbp`, onde o enesimo argumento passado na +/// stack do chamador fica disponivel dentro da funcao chamada. +/// +/// Para `index >= MAX_REG_ARGS`, o argumento chega em `16 + 8 * (index - 6)` +/// bytes acima de `%rbp` (acima do return address e do `%rbp` salvo). +pub fn stack_arg_offset(index: usize) -> i64 { + debug_assert!(index >= MAX_REG_ARGS); + 16 + 8 * (index - MAX_REG_ARGS) as i64 +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn arg_registers_match_system_v_order() { + assert_eq!(ARG_REGISTERS, ["rdi", "rsi", "rdx", "rcx", "r8", "r9"]); + } + + #[test] + fn arg_register_lookup() { + assert_eq!(arg_register(0), Some("rdi")); + assert_eq!(arg_register(5), Some("r9")); + assert_eq!(arg_register(6), None); + } + + #[test] + fn stack_arg_offsets_skip_saved_rbp_and_return_addr() { + // 16 = return address (8) + saved %rbp (8). + assert_eq!(stack_arg_offset(6), 16); + assert_eq!(stack_arg_offset(7), 24); + assert_eq!(stack_arg_offset(10), 48); + } +} diff --git a/src/codegen/last/frame.rs b/src/codegen/last/frame.rs new file mode 100644 index 0000000..6bbb48a --- /dev/null +++ b/src/codegen/last/frame.rs @@ -0,0 +1,213 @@ +//! Gerenciamento do stack frame de uma funcao para o backend x86-64. +//! +//! O backend atual usa uma estrategia ingenua porem correta: cada temp e +//! variavel da TAC recebe um slot proprio de 8 bytes na stack. Nao ha +//! alocacao de registradores (spill universal), o que simplifica a selecao de +//! instrucoes e mantem a correcao enquanto o alocador (`reg_alloc`) nao esta +//! integrado ao pipeline de TAC. +//! +//! Layout do frame (referenciado a partir de `%rbp`): +//! +//! ```text +//! alto +-----------+ +//! | arg stack | argumentos do chamador (acima de rbp) +//! | ret addr | +//! | saved rbp | <- %rbp +//! -8 | slot 0 | primeira var/temp local +//! -16 | slot 1 | +//! | ... | +//! baixo +-----------+ <- %rsp (apos `subq $frame_size, %rsp`) +//! ``` +//! +//! O tamanho do frame e sempre alinhado em 16 bytes para manter o alinhamento +//! exigido pelo System V AMD64 ABI no ponto de `call`. + +use std::collections::HashMap; + +use crate::ir::tac::Operand; + +/// Chave que identifica unicamente um valor residente no frame. +#[derive(Debug, Clone, PartialEq, Eq, Hash)] +pub enum SlotKey { + Temp(u32), + Var(String), +} + +impl SlotKey { + /// Mapeia um `Operand` para sua chave de slot. Constantes nao tem slot + /// (sao emitidas como imediato) e retornam `None`. `Deref` nao tem slot + /// proprio: o slot relevante e o do ponteiro que ele indireciona. + pub fn from_operand(op: &Operand) -> Option { + match op { + Operand::Temp(temp) => Some(Self::Temp(temp.0)), + Operand::Var(name) => Some(Self::Var(name.clone())), + Operand::Global(_) | Operand::Const(_) => None, + Operand::Deref(inner) => Self::from_operand(inner), + } + } +} + +/// Stack frame de uma unica funcao. +#[derive(Debug, Clone)] +pub struct Frame { + /// Mapeia cada chave de slot para seu offset relativo a `%rbp`. + offsets: HashMap, + /// Proximo offset negativo a ser usado para um novo slot local. + next_local_offset: i64, +} + +impl Default for Frame { + fn default() -> Self { + Self::new() + } +} + +impl Frame { + /// Cria um frame vazio. O primeiro slot local alocado comeca em `-8` + /// (ou em `-tamanho`, arredondado para 8, se maior que 8 bytes). + pub fn new() -> Self { + Self { + offsets: HashMap::new(), + next_local_offset: 0, + } + } + + /// Aloca (se ainda nao existir) e retorna o offset de `key`, reservando + /// exatamente um slot de 8 bytes. Equivalente a + /// `allocate_local_sized(key, 8)` — usado por temporarios e variaveis + /// escalares/ponteiro, que sempre cabem em 8 bytes neste backend. + pub fn allocate_local(&mut self, key: SlotKey) -> i64 { + self.allocate_local_sized(key, 8) + } + + /// Aloca (se ainda nao existir) um bloco contiguo de `size` bytes + /// (arredondado para multiplo de 8) e retorna o offset do seu primeiro + /// byte (o endereco mais baixo do bloco). Usado para variaveis cujo + /// valor nao cabe em 8 bytes, como structs: campos sao acessados como + /// `offset_of(var) + offset_do_campo`. + /// + /// Aloca sempre decrementando primeiro o cursor de offset, depois + /// atribuindo: isso preserva a sequencia historica de offsets + /// (-8, -16, -24, ...) para alocacoes de 8 bytes, e generaliza de forma + /// continua para blocos maiores. + pub fn allocate_local_sized(&mut self, key: SlotKey, size: i64) -> i64 { + if let Some(&offset) = self.offsets.get(&key) { + return offset; + } + let aligned_size = align_up(size.max(1), 8); + self.next_local_offset -= aligned_size; + let offset = self.next_local_offset; + self.offsets.insert(key, offset); + offset + } + + /// Fixa um offset explicito para `key` (usado para argumentos recebidos + /// via stack do chamador, que vivem em offsets positivos). + pub fn set_offset(&mut self, key: SlotKey, offset: i64) { + self.offsets.insert(key, offset); + } + + /// Retorna o offset de `key`, se existir. + pub fn offset_of(&self, key: &SlotKey) -> Option { + self.offsets.get(key).copied() + } + + /// Numero de slots locais alocados (offsets negativos). + pub fn local_slot_count(&self) -> usize { + self.offsets.values().filter(|&&off| off < 0).count() + } + + /// Tamanho total do frame em bytes, alinhado em 16. + /// + /// Retorna 0 quando nao ha slots locais (funcao "leaf" sem frame). + /// Calculado a partir do cursor de alocacao (`next_local_offset`), e + /// nao de `local_slot_count() * 8`: blocos maiores que 8 bytes (ex.: + /// structs via `allocate_local_sized`) ocupam mais de um "slot" de + /// contabilidade, mas devem ser contados pelo numero real de bytes. + pub fn frame_size(&self) -> i64 { + align_up(-self.next_local_offset, 16) + } +} + +fn align_up(value: i64, alignment: i64) -> i64 { + debug_assert!(alignment > 0); + (value + alignment - 1) / alignment * alignment +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::ir::tac::TempId; + + #[test] + fn allocate_local_assigns_distinct_negative_offsets() { + let mut frame = Frame::new(); + let a = frame.allocate_local(SlotKey::Var("x".to_string())); + let b = frame.allocate_local(SlotKey::Temp(TempId(0).0)); + + assert_eq!(a, -8); + assert_eq!(b, -16); + } + + #[test] + fn allocate_local_is_idempotent() { + let mut frame = Frame::new(); + let key = SlotKey::Var("x".to_string()); + + let first = frame.allocate_local(key.clone()); + let second = frame.allocate_local(key); + + assert_eq!(first, second); + assert_eq!(frame.local_slot_count(), 1); + } + + #[test] + fn set_offset_for_stack_argument_is_positive() { + let mut frame = Frame::new(); + frame.set_offset(SlotKey::Var("extra".to_string()), 16); + + assert_eq!( + frame.offset_of(&SlotKey::Var("extra".to_string())), + Some(16) + ); + } + + #[test] + fn frame_size_aligned_to_16() { + let mut frame = Frame::new(); + // 1 slot -> 8 bytes crus -> alinhado para 16. + frame.allocate_local(SlotKey::Var("a".to_string())); + assert_eq!(frame.frame_size(), 16); + + // 2 slots -> 16 bytes. + frame.allocate_local(SlotKey::Var("b".to_string())); + assert_eq!(frame.frame_size(), 16); + + // 3 slots -> 24 bytes -> alinhado para 32. + frame.allocate_local(SlotKey::Var("c".to_string())); + assert_eq!(frame.frame_size(), 32); + } + + #[test] + fn empty_frame_has_zero_size() { + let frame = Frame::new(); + assert_eq!(frame.frame_size(), 0); + assert_eq!(frame.local_slot_count(), 0); + } + + #[test] + fn slot_key_from_operand_skips_constants() { + assert_eq!( + SlotKey::from_operand(&Operand::Temp(TempId(3))), + Some(SlotKey::Temp(3)) + ); + assert_eq!( + SlotKey::from_operand(&Operand::Var("y".to_string())), + Some(SlotKey::Var("y".to_string())) + ); + assert_eq!( + SlotKey::from_operand(&Operand::Const(crate::ir::tac::ConstValue::Int(7))), + None + ); + } +} diff --git a/src/codegen/last/mod.rs b/src/codegen/last/mod.rs index 8b13789..676170b 100644 --- a/src/codegen/last/mod.rs +++ b/src/codegen/last/mod.rs @@ -1 +1,15 @@ +//! Backend final (last): traducao da IR (TAC) para codigo de maquina alvo. +//! +//! Atualmente o unico alvo implementado e x86-64 (sintaxe AT&T / System V +//! AMD64 ABI). Os modulos seguintes organizam as responsabilidades: +//! +//! - [`abi`]: convencao de chamada System V AMD64. +//! - [`frame`]: gerenciamento do stack frame de cada funcao. +//! - [`x86_64`]: selecao de instrucoes e emissao de assembly. +pub mod abi; +pub mod frame; +pub mod peephole; +pub mod x86_64; + +pub use x86_64::emit_program; diff --git a/src/codegen/last/peephole.rs b/src/codegen/last/peephole.rs new file mode 100644 index 0000000..06b5084 --- /dev/null +++ b/src/codegen/last/peephole.rs @@ -0,0 +1,129 @@ +pub type AsmInstr = String; + +pub struct PeepholePass { + pub window: usize, +} + +impl Default for PeepholePass { + fn default() -> Self { + Self::new() + } +} + +impl PeepholePass { + pub fn new() -> Self { + Self { window: 2 } + } + + pub fn run(&self, instrs: &mut Vec) -> bool { + let mut overall_mutated = false; + let mut changed = true; + + while changed { + changed = false; + let mut optimized = Vec::new(); + let mut i = 0; + + while i < instrs.len() { + let l1 = &instrs[i]; + let t1 = l1.trim(); // Remove espaços antes e depois para facilitar a leitura + + // PADRÃO 3: Add/Sub por zero + // Exemplo gerado: addq $0, %rax ou subq $0, %rcx + if t1.starts_with("addq $0,") || t1.starts_with("subq $0,") { + i += 1; // Pula essa instrução (deleta) + changed = true; + continue; + } + + // Padrões que exigem olhar 2 instruções (Janela de 2) + if i + 1 < instrs.len() { + let l2 = &instrs[i + 1]; + let t2 = l2.trim(); + + // PADRÃO 1 e 2: Mov redundante E Load após Store no mesmo endereço + // No formato AT&T: "movq A, B" seguido de "movq B, A" + if t1.starts_with("movq ") && t2.starts_with("movq ") { + let p1: Vec<&str> = t1.split_whitespace().collect(); + let p2: Vec<&str> = t2.split_whitespace().collect(); + + // movq [1] [2] -> [1] tem uma vírgula no final + if p1.len() == 3 && p2.len() == 3 { + let src1 = p1[1].trim_end_matches(','); + let dst1 = p1[2]; + let src2 = p2[1].trim_end_matches(','); + let dst2 = p2[2]; + + // Se a origem do 1º for o destino do 2º e vice-versa + if src1 == dst2 && dst1 == src2 { + optimized.push(l1.clone()); // Mantém só a primeira + i += 2; // Pula a segunda + changed = true; + continue; + } + } + } + + // PADRÃO 5: Jump para instrução seguinte + // Exemplo: jmp .L_main_L1 \n .L_main_L1: + if t1.starts_with("jmp ") && t2.ends_with(':') { + let target = t1.strip_prefix("jmp ").unwrap(); + let label = t2.strip_suffix(':').unwrap(); + + if target == label { + optimized.push(l2.clone()); // Mantém só a Label (apaga o jmp) + i += 2; + changed = true; + continue; + } + } + + // PADRÃO 4: Multiplicação por potência de 2 + // O Crusty gera: movq $8, %rcx \n imulq %rcx, %rax + if t1.starts_with("movq $") + && t1.ends_with(", %rcx") + && t2 == "imulq %rcx, %rax" + { + let val_str = t1 + .strip_prefix("movq $") + .unwrap() + .strip_suffix(", %rcx") + .unwrap(); + + if let Ok(val) = val_str.parse::() { + if val > 0 && (val as u64).is_power_of_two() { + let shift = (val as u64).trailing_zeros(); + // Substitui as duas por um shift rápido + optimized.push(format!(" shlq ${}, %rax", shift)); + i += 2; + changed = true; + continue; + } + } + } + + // PADRÃO 6: Compare com 0 -> Test + // O Crusty gera: movq $0, %rcx \n cmpq %rcx, %rax + if t1 == "movq $0, %rcx" && t2 == "cmpq %rcx, %rax" { + optimized.push(" testq %rax, %rax".to_string()); + i += 2; + changed = true; + continue; + } + } + + // Se não casou com nenhum padrão, apenas salva a instrução e vai para a próxima + optimized.push(l1.clone()); + i += 1; + } + + // Atualiza o vetor de instruções para a próxima rodada do while + if changed { + *instrs = optimized; + overall_mutated = true; + } + } + + overall_mutated + } +} diff --git a/src/codegen/last/x86_64.rs b/src/codegen/last/x86_64.rs new file mode 100644 index 0000000..b090418 --- /dev/null +++ b/src/codegen/last/x86_64.rs @@ -0,0 +1,1164 @@ +//! Selecao de instrucoes e emissao de assembly x86-64 (sintaxe AT&T / GAS). +//! +//! O backend traduz cada instrucao da TAC para uma ou mais instrucoes x86-64. +//! A estrategia adotada e a geracao "naive" (sem alocacao de registradores): +//! todo temp e variavel vive em um slot da stack, e usamos `%rax`/`%rcx` como +//! scratch para computar cada operacao. Isso mantem a correcao enquanto o +//! alocador (`codegen::reg_alloc`) nao esta integrado ao fluxo de TAC. +//! +//! Convencao de chamada: System V AMD64 ABI. +//! - Argumentos inteiros: `rdi`, `rsi`, `rdx`, `rcx`, `r8`, `r9` +//! - Retorno: `rax` +//! - Stack alinhada em 16 bytes no ponto de `call`. +//! +//! A saida pode ser montada diretamente com `gcc -x assembler-with-cpp` ou +//! `as`. + +use crate::codegen::last::abi; +use crate::codegen::last::frame::{Frame, SlotKey}; +use crate::codegen::last::peephole::PeepholePass; +use crate::common::ast::ast::Type; +use crate::common::ast::expr::{BinOp, UnOp}; +use crate::common::errors::types::CodegenError; +use crate::ir::tac::{ConstValue, LabelId, Operand, TacFunction, TacInstr, TacProgram}; +use std::collections::HashMap; + +type EmitResult = Result; + +/// Registrador escolhido para materializar o endereco de um `Operand::Deref` +/// antes do deref propriamente dito. Caller-saved e nunca usado como `reg` +/// pelos chamadores de `load_op`/`store_op` neste backend (que usam apenas +/// `rax`/`rcx`/registradores de argumento), entao e seguro como scratch +/// dedicado mesmo em derefs aninhados. +const DEREF_SCRATCH_REG: &str = "r11"; + +/// Acumulador de linhas de assembly com indentacao controlada. +struct Emitter { + out: String, +} + +#[derive(Default)] +struct StringPool { + entries: Vec<(String, String)>, + labels: HashMap, + double_entries: Vec<(String, f64)>, + double_labels: HashMap, +} + +impl StringPool { + fn collect(prog: &TacProgram) -> Self { + let mut pool = Self::default(); + for global in &prog.globals { + if let Some(value) = &global.init { + pool.visit_operand(&Operand::Const(value.clone())); + } + } + for func in &prog.functions { + for instr in &func.instrs { + pool.visit_instr(instr); + } + } + pool + } + + fn visit_instr(&mut self, instr: &TacInstr) { + match instr { + TacInstr::BinOp { lhs, rhs, .. } => { + self.visit_operand(lhs); + self.visit_operand(rhs); + } + TacInstr::UnOp { src, .. } => self.visit_operand(src), + TacInstr::Copy { dst, src, .. } => { + self.visit_operand(dst); + self.visit_operand(src); + } + TacInstr::CondJump { cond, .. } => self.visit_operand(cond), + TacInstr::Call { args, .. } => { + for arg in args { + self.visit_operand(arg); + } + } + TacInstr::Return { val, .. } => { + if let Some(val) = val { + self.visit_operand(val); + } + } + TacInstr::Jump { .. } | TacInstr::Label(_) => {} + } + } + + fn visit_operand(&mut self, op: &Operand) { + match op { + Operand::Const(ConstValue::String(value)) => { + self.label_for(value); + } + Operand::Const(ConstValue::Double(value)) => { + self.label_for_double(*value); + } + Operand::Deref(inner) => self.visit_operand(inner), + _ => {} + } + } + + fn label_for(&mut self, value: &str) -> String { + if let Some(label) = self.labels.get(value) { + return label.clone(); + } + + let label = format!(".LC{}", self.entries.len()); + self.entries.push((label.clone(), value.to_string())); + self.labels.insert(value.to_string(), label.clone()); + label + } + + fn label_for_double(&mut self, value: f64) -> String { + let key = value.to_string(); + if let Some(label) = self.double_labels.get(&key) { + return label.clone(); + } + + let label = format!(".LC_DBL{}", self.double_entries.len()); + self.double_entries.push((label.clone(), value)); + self.double_labels.insert(key, label.clone()); + label + } +} + +impl Emitter { + fn new() -> Self { + Self { out: String::new() } + } + + /// Instrucao: indentada em 4 espacos. + fn insn(&mut self, text: &str) { + self.out.push_str(" "); + self.out.push_str(text); + self.out.push('\n'); + } + + /// Diretiva/rotulo: sem indentacao (ex.: `.text`, `main:`). + fn raw(&mut self, text: &str) { + self.out.push_str(text); + self.out.push('\n'); + } + + fn comment(&mut self, text: &str) { + self.out.push_str(" # "); + self.out.push_str(text); + self.out.push('\n'); + } + + fn blank(&mut self) { + self.out.push('\n'); + } + + fn append_str(&mut self, text: &str) { + self.out.push_str(text); + } + + fn into_string(self) -> String { + self.out + } +} + +/// Emite o assembly de um programa TAC completo, prefixando a diretiva de +/// secao `.text`. +pub fn emit_program(prog: &TacProgram) -> EmitResult { + let strings = StringPool::collect(prog); + let mut em = Emitter::new(); + + if !strings.entries.is_empty() || !strings.double_entries.is_empty() { + em.raw(".section .rodata"); + + for (label, value) in &strings.entries { + em.raw(&format!("{label}:")); + em.raw(&format!(" .asciz {}", escape_asm_string(value))); + } + + for (label, value) in &strings.double_entries { + em.raw(&format!("{label}:")); + em.raw(&format!(" .double {value}")); + } + em.blank(); + } + emit_globals(&mut em, prog, &strings)?; + em.raw(".text"); + for func in &prog.functions { + em.blank(); + em.append_str(&emit_function(func, &strings)?); + } + // Marca a stack como nao-executavel em formatos ELF. Essa secao nao + // existe no COFF usado pelo MinGW e tornaria o assembly invalido la. + em.blank(); + #[cfg(not(target_os = "windows"))] + em.raw(".section .note.GNU-stack,\"\",@progbits"); + Ok(em.into_string()) +} + +fn emit_globals(em: &mut Emitter, prog: &TacProgram, strings: &StringPool) -> EmitResult<()> { + let initialized: Vec<_> = prog.globals.iter().filter(|g| g.init.is_some()).collect(); + if !initialized.is_empty() { + em.raw(".data"); + for global in initialized { + em.raw(".balign 8"); + em.raw(&format!(".globl {}", global.name)); + em.raw(&format!("{}:", global.name)); + match global.init.as_ref().expect("filtrado acima") { + ConstValue::Int(value) => em.raw(&format!(" .quad {value}")), + ConstValue::Char(value) => em.raw(&format!(" .quad {}", *value as i64)), + ConstValue::Double(value) => em.raw(&format!(" .quad {}", value.to_bits())), + ConstValue::String(value) => { + let label = strings + .labels + .get(value) + .expect("string global deve ter sido coletada"); + em.raw(&format!(" .quad {label}")); + } + } + if global.size > 8 { + em.raw(&format!(" .zero {}", global.size - 8)); + } + } + em.blank(); + } + + let zeroed: Vec<_> = prog.globals.iter().filter(|g| g.init.is_none()).collect(); + if !zeroed.is_empty() { + em.raw(".bss"); + for global in zeroed { + em.raw(".balign 8"); + em.raw(&format!(".globl {}", global.name)); + em.raw(&format!("{}:", global.name)); + em.raw(&format!(" .zero {}", global.size)); + } + em.blank(); + } + Ok(()) +} + +/// Emite o assembly de uma unica funcao: directiva `.globl`, rotulo, +/// prologue, corpo e epilogue. +fn emit_function(func: &TacFunction, strings: &StringPool) -> EmitResult { + let mut em = Emitter::new(); + em.comment(&format!("function {}", func.name)); + em.raw(&format!(".globl {}", func.name)); + em.raw(&format!("{}:", func.name)); + + let frame = build_frame(func); + + // Prologue + em.insn("pushq %rbp"); + em.insn("movq %rsp, %rbp"); + let frame_size = frame.frame_size(); + if frame_size > 0 { + em.insn(&format!("subq ${frame_size}, %rsp")); + } + + // Spill dos argumentos recebidos em registrador para seus slots locais. + for (index, name) in func.params.iter().enumerate() { + if let Some(reg) = abi::arg_register(index) { + let offset = frame + .offset_of(&SlotKey::Var(name.clone())) + .expect("slot do parametro deve estar alocado"); + em.insn(&format!("movq %{reg}, {offset}(%rbp)")); + } + } + + // Corpo + let epilogue_label = format!(".L_{}_epilogue", func.name); + for instr in &func.instrs { + emit_instr(&mut em, instr, &frame, &func.name, &epilogue_label, strings)?; + } + + // Epilogue (alvo de todos os `return`). Caso a funcao nao tenha `return` + // explicito, a queda natural chega aqui e retorna o que estiver em %rax. + em.raw(&format!("{epilogue_label}:")); + em.insn("movq %rbp, %rsp"); + em.insn("popq %rbp"); + em.insn("ret"); + + // Aplica a otimizacao peephole sobre o assembly gerado, linha a linha, + // ate atingir ponto fixo. + let raw_asm = em.into_string(); + let mut instrs: Vec = raw_asm.lines().map(|s| s.to_string()).collect(); + let peephole = PeepholePass::new(); + peephole.run(&mut instrs); + + Ok(instrs.join("\n")) +} + +/// Constroi o stack frame pre-escaneando todas as instrucoes para alocar um +/// slot para cada temp/variavel e mapear os parametros para suas posicoes. +/// +/// Variaveis listadas em `func.var_sizes` (hoje, structs locais) recebem um +/// bloco contiguo do tamanho indicado em vez do slot escalar padrao de 8 +/// bytes; demais variaveis e temporarios usam sempre 8 bytes. +fn build_frame(func: &TacFunction) -> Frame { + let mut frame = Frame::new(); + + let size_of = |name: &str| func.var_sizes.get(name).copied().unwrap_or(8); + + for (index, name) in func.params.iter().enumerate() { + let key = SlotKey::Var(name.clone()); + match abi::arg_register(index) { + Some(_) => { + frame.allocate_local_sized(key, size_of(name)); + } + None => { + // Argumento passado via stack do chamador: ja esta disponivel + // em offset positivo a partir de %rbp. + frame.set_offset(key, abi::stack_arg_offset(index)); + } + } + } + + for instr in &func.instrs { + for key in slot_keys_of(instr) { + // Nao realoca stack-args (offsets positivos) que ja foram mapeados. + if frame.offset_of(&key).is_some() { + continue; + } + let size = match &key { + SlotKey::Var(name) => size_of(name), + SlotKey::Temp(_) => 8, + }; + frame.allocate_local_sized(key, size); + } + } + + frame +} + +/// Coleta todas as chaves de slot referenciadas por uma instrucao (dst, +/// fontes e operandos de leitura). +fn slot_keys_of(instr: &TacInstr) -> Vec { + let mut keys = Vec::new(); + + let consider = |keys: &mut Vec, op: &Operand| { + if let Some(key) = SlotKey::from_operand(op) { + keys.push(key); + } + }; + + match instr { + TacInstr::BinOp { dst, lhs, rhs, .. } => { + keys.push(SlotKey::Temp(dst.0)); + consider(&mut keys, lhs); + consider(&mut keys, rhs); + } + TacInstr::UnOp { dst, src, .. } => { + keys.push(SlotKey::Temp(dst.0)); + consider(&mut keys, src); + } + TacInstr::Copy { dst, src, .. } => { + consider(&mut keys, dst); + consider(&mut keys, src); + } + TacInstr::CondJump { cond, .. } => consider(&mut keys, cond), + TacInstr::Call { dst, args, .. } => { + if let Some(dst) = dst { + keys.push(SlotKey::Temp(dst.0)); + } + for arg in args { + consider(&mut keys, arg); + } + } + TacInstr::Return { val, .. } => { + if let Some(val) = val { + consider(&mut keys, val); + } + } + TacInstr::Jump { .. } | TacInstr::Label(_) => {} + } + + keys +} + +fn emit_instr( + em: &mut Emitter, + instr: &TacInstr, + frame: &Frame, + func_name: &str, + epilogue_label: &str, + strings: &StringPool, +) -> EmitResult<()> { + match instr { + TacInstr::Label(label) => { + em.raw(&format!("{}:", local_label(func_name, label))); + Ok(()) + } + TacInstr::Jump { label } => { + em.insn(&format!("jmp {}", local_label(func_name, label))); + Ok(()) + } + TacInstr::CondJump { + cond, + then_label, + else_label, + } => { + load_op(em, frame, cond, "rax", strings, &Type::Int)?; + em.insn("testq %rax, %rax"); + em.insn(&format!("jne {}", local_label(func_name, then_label))); + em.insn(&format!("jmp {}", local_label(func_name, else_label))); + Ok(()) + } + TacInstr::Copy { dst, src, ty } => { + let reg = if matches!(ty, Type::Double) { + "xmm0" + } else { + "rax" + }; + load_op(em, frame, src, reg, strings, ty)?; + store_op(em, frame, dst, reg, strings, ty)?; + Ok(()) + } + TacInstr::BinOp { + dst, + op, + lhs, + rhs, + ty, + } => emit_binop(em, op, lhs, rhs, *dst, frame, strings, ty), + TacInstr::UnOp { dst, op, src, ty } => emit_unop(em, op, src, *dst, frame, strings, ty), + TacInstr::Call { dst, fn_name, args } => emit_call(em, fn_name, args, *dst, frame, strings), + TacInstr::Return { val, ty } => { + if let Some(val) = val { + // 1. Resolve o tipo (se for None, assume Int) + let resolved_ty = ty.as_ref().unwrap_or(&Type::Int); + // 2. Escolhe o registrador de retorno correto + let reg = if matches!(resolved_ty, Type::Double) { + "xmm0" + } else { + "rax" + }; + // 3. Carrega o valor para o registrador + load_op(em, frame, val, reg, strings, resolved_ty)?; + } + em.insn(&format!("jmp {epilogue_label}")); + Ok(()) + } + } +} + +#[allow(clippy::too_many_arguments)] +fn emit_binop( + em: &mut Emitter, + op: &BinOp, + lhs: &Operand, + rhs: &Operand, + dst: crate::ir::tac::TempId, + frame: &Frame, + strings: &StringPool, + ty: &Type, +) -> EmitResult<()> { + if matches!(op, BinOp::And | BinOp::Or) { + emit_logical(em, matches!(op, BinOp::Or), lhs, rhs, dst, frame, strings)?; + return Ok(()); + } + + if matches!(ty, Type::Double) { + load_op(em, frame, lhs, "xmm0", strings, ty)?; + load_op(em, frame, rhs, "xmm1", strings, ty)?; + + match op { + BinOp::Add => em.insn("addsd %xmm1, %xmm0"), + BinOp::Sub => em.insn("subsd %xmm1, %xmm0"), + BinOp::Mul => em.insn("mulsd %xmm1, %xmm0"), + BinOp::Div => em.insn("divsd %xmm1, %xmm0"), + BinOp::Less => emit_comparison_double(em, "setb"), + BinOp::Greater => emit_comparison_double(em, "seta"), + BinOp::Leq => emit_comparison_double(em, "setbe"), + BinOp::Geq => emit_comparison_double(em, "setae"), + BinOp::Eq => emit_comparison_double(em, "sete"), + BinOp::Neq => emit_comparison_double(em, "setne"), + _ => { + return Err(codegen_error( + "Operacao double nao suportada", + Some("binop"), + )) + } + } + + let is_relational = matches!( + op, + BinOp::Less | BinOp::Greater | BinOp::Leq | BinOp::Geq | BinOp::Eq | BinOp::Neq + ); + if is_relational { + store_op(em, frame, &Operand::Temp(dst), "rax", strings, &Type::Int)?; + } else { + store_op(em, frame, &Operand::Temp(dst), "xmm0", strings, ty)?; + } + + return Ok(()); + } + + load_op(em, frame, lhs, "rax", strings, ty)?; + load_op(em, frame, rhs, "rcx", strings, ty)?; + + match op { + BinOp::Add => em.insn("addq %rcx, %rax"), + BinOp::Sub => em.insn("subq %rcx, %rax"), + BinOp::Mul => em.insn("imulq %rcx, %rax"), + BinOp::Div => { + em.insn("cqto"); + em.insn("idivq %rcx"); + } + BinOp::Mod => { + em.insn("cqto"); + em.insn("idivq %rcx"); + em.insn("movq %rdx, %rax"); + } + BinOp::BitAnd => em.insn("andq %rcx, %rax"), + BinOp::BitOr => em.insn("orq %rcx, %rax"), + BinOp::BitXor => em.insn("xorq %rcx, %rax"), + BinOp::Shl => em.insn("shlq %cl, %rax"), + BinOp::Shr => em.insn("sarq %cl, %rax"), + BinOp::Less => emit_comparison(em, "setl"), + BinOp::Greater => emit_comparison(em, "setg"), + BinOp::Leq => emit_comparison(em, "setle"), + BinOp::Geq => emit_comparison(em, "setge"), + BinOp::Eq => emit_comparison(em, "sete"), + BinOp::Neq => emit_comparison(em, "setne"), + BinOp::And | BinOp::Or => unreachable!(), + } + + store_op(em, frame, &Operand::Temp(dst), "rax", strings, ty)?; + Ok(()) +} + +fn emit_comparison(em: &mut Emitter, setcc: &str) { + em.insn("cmpq %rcx, %rax"); + em.insn(&format!("{setcc} %al")); + em.insn("movzbq %al, %rax"); +} + +fn emit_comparison_double(em: &mut Emitter, setcc: &str) { + em.insn("ucomisd %xmm1, %xmm0"); + em.insn(&format!("{setcc} %al")); + em.insn("movzbq %al, %rax"); +} + +fn emit_logical( + em: &mut Emitter, + is_or: bool, + lhs: &Operand, + rhs: &Operand, + dst: crate::ir::tac::TempId, + frame: &Frame, + strings: &StringPool, +) -> EmitResult<()> { + let ty = &Type::Int; + // Normaliza lhs para 0/1 em %rdx. + load_op(em, frame, lhs, "rax", strings, ty)?; + em.insn("testq %rax, %rax"); + em.insn("setne %al"); + em.insn("movzbq %al, %rax"); + em.insn("movq %rax, %rdx"); + + // Normaliza rhs para 0/1 em %rax. + load_op(em, frame, rhs, "rax", strings, ty)?; + em.insn("testq %rax, %rax"); + em.insn("setne %al"); + em.insn("movzbq %al, %rax"); + + if is_or { + em.insn("orq %rdx, %rax"); + } else { + em.insn("andq %rdx, %rax"); + } + + store_op(em, frame, &Operand::Temp(dst), "rax", strings, ty)?; + Ok(()) +} + +fn emit_unop( + em: &mut Emitter, + op: &UnOp, + src: &Operand, + dst: crate::ir::tac::TempId, + frame: &Frame, + strings: &StringPool, + ty: &Type, +) -> EmitResult<()> { + // `&x` precisa do *endereco* do slot de `src`, nao do seu valor: nao + // passa por `load_op` (que faria `movq slot(%rbp), %reg`, carregando o + // conteudo em vez do endereco). + if matches!(op, UnOp::AddrOf) { + if let Operand::Global(name) = src { + em.insn(&format!("leaq {name}(%rip), %rax")); + store_op(em, frame, &Operand::Temp(dst), "rax", strings, &Type::Int)?; + return Ok(()); + } + let key = SlotKey::from_operand(src).ok_or_else(|| { + codegen_error( + "endereco-de (&) requer uma variavel ou temporario com slot", + Some("unop"), + ) + })?; + let offset = frame + .offset_of(&key) + .expect("operando de & deve ter slot alocado no frame"); + em.insn(&format!("leaq {offset}(%rbp), %rax")); + store_op(em, frame, &Operand::Temp(dst), "rax", strings, &Type::Int)?; + return Ok(()); + } + if matches!(ty, Type::Double) { + return Err(codegen_error( + "Operacoes unarias ainda nao suportadas para double", + Some("unop"), + )); + } + + load_op(em, frame, src, "rax", strings, ty)?; + match op { + UnOp::Neg => em.insn("negq %rax"), + UnOp::BitNot => em.insn("notq %rax"), + UnOp::Not => { + em.insn("testq %rax, %rax"); + em.insn("sete %al"); + em.insn("movzbq %al, %rax"); + } + UnOp::Deref => { + em.insn("movq (%rax), %rax"); + } + UnOp::AddrOf => unreachable!("tratado antes do load_op acima"), + } + store_op(em, frame, &Operand::Temp(dst), "rax", strings, ty)?; + Ok(()) +} + +fn emit_call( + em: &mut Emitter, + fn_name: &str, + args: &[Operand], + dst: Option, + frame: &Frame, + strings: &StringPool, +) -> EmitResult<()> { + // Argumentos alem de `MAX_REG_ARGS` vao para a stack do chamador, na + // ordem inversa (o primeiro arg de stack fica no topo, mais proximo do + // endereco de retorno), espelhando `abi::stack_arg_offset`. + let stack_arg_count = args.len().saturating_sub(abi::MAX_REG_ARGS); + // Mantem a stack 16-alinhada no `call`: cada push usa 8 bytes, entao uma + // quantidade impar de argumentos de stack precisa de 8 bytes de padding. + let padding = if stack_arg_count % 2 == 1 { 8 } else { 0 }; + if padding > 0 { + em.insn(&format!("subq ${padding}, %rsp")); + } + let stack_args = &args[args.len().min(abi::MAX_REG_ARGS)..]; + for arg in stack_args.iter().rev() { + load_op(em, frame, arg, "rax", strings, &Type::Int)?; + em.insn("pushq %rax"); + } + + for (index, arg) in args.iter().take(abi::MAX_REG_ARGS).enumerate() { + let reg = abi::arg_register(index).expect("index < MAX_REG_ARGS sempre tem registrador"); + load_op(em, frame, arg, "rax", strings, &Type::Int)?; + em.insn(&format!("movq %rax, %{reg}")); + } + + em.insn(&format!("call {fn_name}")); + + let cleanup = (stack_arg_count as i64) * 8 + padding; + if cleanup > 0 { + em.insn(&format!("addq ${cleanup}, %rsp")); + } + + if let Some(dst) = dst { + store_op(em, frame, &Operand::Temp(dst), "rax", strings, &Type::Int)?; + } + Ok(()) +} + +/// Carrega `op` para o registrador nomeado (ex.: "rax", "rcx"). +fn load_op( + em: &mut Emitter, + frame: &Frame, + op: &Operand, + reg: &str, + strings: &StringPool, + ty: &Type, +) -> EmitResult<()> { + let mov_insn = if matches!(ty, Type::Double) { + "movsd" + } else { + "movq" + }; + match op { + Operand::Const(ConstValue::String(value)) => { + let label = strings + .labels + .get(value) + .expect("string literal deve ter sido coletada"); + em.insn(&format!("leaq {label}(%rip), %{reg}")); + Ok(()) + } + Operand::Const(ConstValue::Double(_)) if !matches!(ty, Type::Double) => Err(codegen_error( + "literal double usado em contexto nao-double (apenas double e' suportado neste backend; float ainda nao)", + Some("load"), + )), + Operand::Const(ConstValue::Double(value)) => { + let label = strings + .double_labels + .get(&value.to_string()) + .expect("double literal deve ter sido coletado"); + + em.insn(&format!("movsd {label}(%rip), %{reg}")); + Ok(()) + } + Operand::Const(value) => { + em.insn(&format!("movq ${}, %{reg}", const_immediate(value)?)); + Ok(()) + } + Operand::Temp(temp) => { + let offset = frame + .offset_of(&SlotKey::Temp(temp.0)) + .expect("temp sem slot alocado"); + em.insn(&format!("{mov_insn} {offset}(%rbp), %{reg}")); + Ok(()) + } + Operand::Var(name) => { + let offset = frame + .offset_of(&SlotKey::Var(name.clone())) + .expect("var sem slot alocado"); + em.insn(&format!("{mov_insn} {offset}(%rbp), %{reg}")); + Ok(()) + } + Operand::Global(name) => { + em.insn(&format!("{mov_insn} {name}(%rip), %{reg}")); + Ok(()) + } + Operand::Deref(inner) => { + // `%r11` e scratch/caller-saved e nao e usado como `reg` por + // nenhum chamador de `load_op`/`store_op` neste backend, entao e + // seguro usa-lo aqui para materializar o ponteiro antes do deref. + // O proprio ponteiro e sempre um endereco de 8 bytes, independente + // do tipo do valor apontado. + load_op(em, frame, inner, DEREF_SCRATCH_REG, strings, &Type::Int)?; + em.insn(&format!("{mov_insn} (%{DEREF_SCRATCH_REG}), %{reg}")); + Ok(()) + } + } +} + +/// Armazena o registrador nomeado em `op` (que deve ser temp, var ou deref). +fn store_op( + em: &mut Emitter, + frame: &Frame, + op: &Operand, + reg: &str, + strings: &StringPool, + ty: &Type, +) -> EmitResult<()> { + let mov_insn = if matches!(ty, Type::Double) { + "movsd" + } else { + "movq" + }; + if let Operand::Deref(inner) = op { + load_op(em, frame, inner, DEREF_SCRATCH_REG, strings, &Type::Int)?; + em.insn(&format!("{mov_insn} %{reg}, (%{DEREF_SCRATCH_REG})")); + return Ok(()); + } + + if let Operand::Global(name) = op { + em.insn(&format!("{mov_insn} %{reg}, {name}(%rip)")); + return Ok(()); + } + + let offset = match op { + Operand::Temp(temp) => frame + .offset_of(&SlotKey::Temp(temp.0)) + .expect("temp sem slot alocado"), + Operand::Var(name) => frame + .offset_of(&SlotKey::Var(name.clone())) + .expect("var sem slot alocado"), + Operand::Global(_) => unreachable!("tratado antes do match acima"), + Operand::Const(_) => { + return Err(codegen_error( + "nao e possivel armazenar em uma constante", + Some("store"), + )) + } + Operand::Deref(_) => unreachable!("tratado antes do match acima"), + }; + em.insn(&format!("{mov_insn} %{reg}, {offset}(%rbp)")); + Ok(()) +} + +fn const_immediate(value: &ConstValue) -> EmitResult { + match value { + ConstValue::Int(v) => Ok(v.to_string()), + ConstValue::Char(c) => Ok((*c as i64).to_string()), + ConstValue::Double(_) => Err(codegen_error( + "codegen de double nao suportado neste backend", + Some("const"), + )), + ConstValue::String(_) => unreachable!("string literals are emitted through rodata"), + } +} + +fn codegen_error(message: &str, instruction: Option<&str>) -> CodegenError { + CodegenError { + message: message.to_string(), + instruction: instruction.map(str::to_string), + } +} + +fn escape_asm_string(value: &str) -> String { + let mut out = String::from("\""); + for ch in value.chars() { + match ch { + '\\' => out.push_str("\\\\"), + '"' => out.push_str("\\\""), + '\n' => out.push_str("\\n"), + '\r' => out.push_str("\\r"), + '\t' => out.push_str("\\t"), + '\0' => out.push_str("\\0"), + c if c.is_ascii_graphic() || c == ' ' => out.push(c), + c => out.push_str(&format!("\\x{:02x}", c as u32)), + } + } + out.push('"'); + out +} + +fn local_label(func_name: &str, label: &LabelId) -> String { + format!(".L_{func_name}_L{}", label.0) +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::ir::tac::{ConstValue, Operand, TacFunction, TacInstr, TempId}; + + fn func(name: &str, params: Vec<&str>, instrs: Vec) -> TacFunction { + TacFunction { + name: name.to_string(), + params: params.into_iter().map(String::from).collect(), + instrs, + var_sizes: Default::default(), + } + } + + fn asm_simple_return_const() -> TacFunction { + func( + "main", + Vec::new(), + vec![TacInstr::Return { + val: Some(Operand::Const(ConstValue::Int(42))), + ty: None, + }], + ) + } + + #[test] + fn emit_function_prologue_pushes_rbp_and_sets_frame() { + let out = emit_function(&asm_simple_return_const(), &StringPool::default()).unwrap(); + + assert!(out.contains("pushq %rbp")); + assert!(out.contains("movq %rsp, %rbp")); + assert!(out.contains("ret")); + } + + #[test] + fn emit_function_declares_global_symbol() { + let out = emit_function(&asm_simple_return_const(), &StringPool::default()).unwrap(); + + assert!(out.contains(".globl main")); + assert!(out.contains("main:\n")); + } + + #[test] + fn return_const_loads_immediate_into_rax() { + let out = emit_function(&asm_simple_return_const(), &StringPool::default()).unwrap(); + + assert!(out.contains("movq $42, %rax")); + } + + #[test] + fn binary_add_emits_load_add_store() { + let f = func( + "add", + vec!["a", "b"], + vec![ + TacInstr::BinOp { + dst: TempId(0), + op: BinOp::Add, + lhs: Operand::Var("a".to_string()), + rhs: Operand::Var("b".to_string()), + ty: Type::Int, + }, + TacInstr::Return { + val: Some(Operand::Temp(TempId(0))), + ty: None, + }, + ], + ); + + let out = emit_function(&f, &StringPool::default()).unwrap(); + + assert!(out.contains("movq %rdi, -8(%rbp)")); // spill arg a + assert!(out.contains("movq %rsi, -16(%rbp)")); // spill arg b + assert!(out.contains("addq %rcx, %rax")); + } + + #[test] + fn division_uses_cqto_and_idivq() { + let f = func( + "divmod", + Vec::new(), + vec![ + TacInstr::BinOp { + dst: TempId(0), + op: BinOp::Div, + lhs: Operand::Const(ConstValue::Int(10)), + rhs: Operand::Const(ConstValue::Int(3)), + ty: Type::Int, + }, + TacInstr::Return { + val: Some(Operand::Temp(TempId(0))), + ty: None, + }, + ], + ); + + let out = emit_function(&f, &StringPool::default()).unwrap(); + + assert!(out.contains("cqto")); + assert!(out.contains("idivq %rcx")); + } + + #[test] + fn modulo_moves_rdx_into_rax() { + let f = func( + "mod", + Vec::new(), + vec![TacInstr::BinOp { + dst: TempId(0), + op: BinOp::Mod, + lhs: Operand::Const(ConstValue::Int(10)), + rhs: Operand::Const(ConstValue::Int(3)), + ty: Type::Int, + }], + ); + + let out = emit_function(&f, &StringPool::default()).unwrap(); + + assert!(out.contains("movq %rdx, %rax")); + } + + #[test] + fn less_than_emits_setl() { + let f = func( + "less", + Vec::new(), + vec![TacInstr::BinOp { + dst: TempId(0), + op: BinOp::Less, + lhs: Operand::Const(ConstValue::Int(1)), + rhs: Operand::Const(ConstValue::Int(2)), + ty: Type::Int, + }], + ); + + let out = emit_function(&f, &StringPool::default()).unwrap(); + + assert!(out.contains("cmpq %rcx, %rax")); + assert!(out.contains("setl %al")); + assert!(out.contains("movzbq %al, %rax")); + } + + #[test] + fn call_sets_up_arg_registers() { + let f = func( + "caller", + Vec::new(), + vec![ + TacInstr::Call { + dst: Some(TempId(0)), + fn_name: "soma".to_string(), + args: vec![ + Operand::Const(ConstValue::Int(2)), + Operand::Const(ConstValue::Int(3)), + ], + }, + TacInstr::Return { + val: Some(Operand::Temp(TempId(0))), + ty: None, + }, + ], + ); + + let out = emit_function(&f, &StringPool::default()).unwrap(); + + assert!(out.contains("movq %rax, %rdi")); + assert!(out.contains("movq %rax, %rsi")); + assert!(out.contains("call soma")); + } + + #[test] + fn call_with_seven_args_pushes_one_stack_arg() { + let args = (1..=7) + .map(|n| Operand::Const(ConstValue::Int(n))) + .collect(); + let f = func( + "caller7", + Vec::new(), + vec![ + TacInstr::Call { + dst: Some(TempId(0)), + fn_name: "sum7".to_string(), + args, + }, + TacInstr::Return { + val: Some(Operand::Temp(TempId(0))), + ty: None, + }, + ], + ); + + let out = emit_function(&f, &StringPool::default()).unwrap(); + + assert!(out.contains("subq $8, %rsp")); + assert!(out.contains("pushq %rax")); + assert!(out.contains("call sum7")); + assert!(out.contains("addq $16, %rsp")); + } + + #[test] + fn call_with_eight_args_needs_no_padding() { + let args = (1..=8) + .map(|n| Operand::Const(ConstValue::Int(n))) + .collect(); + let f = func( + "caller8", + Vec::new(), + vec![TacInstr::Call { + dst: Some(TempId(0)), + fn_name: "sum8".to_string(), + args, + }], + ); + + let out = emit_function(&f, &StringPool::default()).unwrap(); + + assert!(!out.contains("subq $8, %rsp")); + assert!(out.contains("addq $16, %rsp")); + } + + #[test] + fn call_with_two_args_emits_no_stack_cleanup() { + let out = emit_function( + &func( + "caller2", + Vec::new(), + vec![TacInstr::Call { + dst: Some(TempId(0)), + fn_name: "soma".to_string(), + args: vec![ + Operand::Const(ConstValue::Int(1)), + Operand::Const(ConstValue::Int(2)), + ], + }], + ), + &StringPool::default(), + ) + .unwrap(); + + assert!(!out.contains("pushq %rax")); + assert!(!out.contains("addq $16, %rsp")); + } + + #[test] + fn cond_jump_uses_test_and_jne() { + let f = func( + "cond", + Vec::new(), + vec![ + TacInstr::CondJump { + cond: Operand::Const(ConstValue::Int(1)), + then_label: LabelId(0), + else_label: LabelId(1), + }, + TacInstr::Label(LabelId(0)), + TacInstr::Return { + val: Some(Operand::Const(ConstValue::Int(1))), + ty: None, + }, + TacInstr::Label(LabelId(1)), + TacInstr::Return { + val: Some(Operand::Const(ConstValue::Int(0))), + ty: None, + }, + ], + ); + + let out = emit_function(&f, &StringPool::default()).unwrap(); + + assert!(out.contains("testq %rax, %rax")); + assert!(out.contains("jne .L_cond_L0")); + assert!(out.contains("jmp .L_cond_L1")); + assert!(out.contains(".L_cond_L0:")); + assert!(out.contains(".L_cond_L1:")); + } + + #[test] + fn epilogue_label_is_emitted_once() { + let out = emit_function(&asm_simple_return_const(), &StringPool::default()).unwrap(); + + assert_eq!(out.matches(".L_main_epilogue:").count(), 1); + } + + #[test] + fn emit_program_prepends_text_section() { + let prog = TacProgram { + globals: Vec::new(), + functions: vec![asm_simple_return_const()], + }; + + let out = emit_program(&prog).unwrap(); + + assert!(out.starts_with(".text")); + assert!(out.contains(".globl main")); + } + + #[test] + fn logical_and_normalizes_operands() { + let f = func( + "land", + Vec::new(), + vec![TacInstr::BinOp { + dst: TempId(0), + op: BinOp::And, + lhs: Operand::Const(ConstValue::Int(1)), + rhs: Operand::Const(ConstValue::Int(0)), + ty: Type::Int, + }], + ); + + let out = emit_function(&f, &StringPool::default()).unwrap(); + + assert!(out.contains("setne %al")); + assert!(out.contains("andq %rdx, %rax")); + } + + #[test] + fn logical_or_emits_orq() { + let f = func( + "lor", + Vec::new(), + vec![TacInstr::BinOp { + dst: TempId(0), + op: BinOp::Or, + lhs: Operand::Const(ConstValue::Int(1)), + rhs: Operand::Const(ConstValue::Int(0)), + ty: Type::Int, + }], + ); + + let out = emit_function(&f, &StringPool::default()).unwrap(); + + assert!(out.contains("orq %rdx, %rax")); + } +} diff --git a/src/common/ast/ast.rs b/src/common/ast/ast.rs index d54e8d7..0baa81d 100644 --- a/src/common/ast/ast.rs +++ b/src/common/ast/ast.rs @@ -10,7 +10,7 @@ pub enum Type { Float, Double, Void, - Array(Box), + Array(Box, Option), Pointer(Box), Struct(String), Enum(String), diff --git a/src/common/ast/pretty.rs b/src/common/ast/pretty.rs index a413374..fe73cd4 100644 --- a/src/common/ast/pretty.rs +++ b/src/common/ast/pretty.rs @@ -357,7 +357,10 @@ fn fmt_type(ty: &Type) -> String { Type::Double => "double".into(), Type::Void => "void".into(), Type::Pointer(inner) => format!("{}*", fmt_type(inner)), - Type::Array(inner) => format!("{}[]", fmt_type(inner)), + Type::Array(inner, size) => match size { + Some(size) => format!("{}[{size}]", fmt_type(inner)), + None => format!("{}[]", fmt_type(inner)), + }, Type::Struct(n) => format!("struct {}", n), Type::Enum(n) => format!("enum {}", n), Type::Alias(n) => n.clone(), diff --git a/src/common/builtins.rs b/src/common/builtins.rs new file mode 100644 index 0000000..1c540bd --- /dev/null +++ b/src/common/builtins.rs @@ -0,0 +1,5 @@ +#[derive(Debug, Default, Clone, Copy, PartialEq, Eq)] +pub struct BuiltinsLibs { + pub stdbool: bool, + pub stdio: bool, +} diff --git a/src/common/errors/types.rs b/src/common/errors/types.rs index 7a3f3b2..1ba1119 100644 --- a/src/common/errors/types.rs +++ b/src/common/errors/types.rs @@ -206,6 +206,18 @@ pub enum SemanticErrorKind { NotIndexable { found: String, }, + /// Expressão do `switch` não é de tipo inteiro. + InvalidSwitchType { + found: String, + }, + /// `break` usado fora de loop ou `switch`. + BreakOutsideLoop, + /// `continue` usado fora de loop. + ContinueOutsideLoop, + MissingLibraryHeader { + header: String, + symbol: String, + }, } #[derive(Debug)] @@ -309,6 +321,41 @@ impl ToReport for SemanticError { self.span.clone(), format!("'{}' não é indexável (esperado array ou ponteiro)", found), ), + SemanticErrorKind::InvalidSwitchType { found } => { + Report::new("invalid switch expression type") + .with_span(self.span.clone()) + .with_label( + self.span.clone(), + format!( + "expressão do switch deve ser inteira, encontrado '{}'", + found + ), + ) + .with_help("use int, char, short, long ou enum como discriminante do switch") + } + SemanticErrorKind::BreakOutsideLoop => Report::new("break outside loop or switch") + .with_span(self.span.clone()) + .with_label( + self.span.clone(), + "'break' só pode ser usado dentro de loop ou switch".to_string(), + ) + .with_help("mova o 'break' para dentro de um for, while, do-while ou switch"), + SemanticErrorKind::ContinueOutsideLoop => Report::new("continue outside loop") + .with_span(self.span.clone()) + .with_label( + self.span.clone(), + "'continue' só pode ser usado dentro de loop".to_string(), + ) + .with_help("mova o 'continue' para dentro de um for, while ou do-while"), + SemanticErrorKind::MissingLibraryHeader { header, symbol } => { + Report::new("missing library header") + .with_span(self.span.clone()) + .with_label( + self.span.clone(), + format!("'{}' requires <{}>", symbol, header), + ) + .with_help("adiciona o #include correto antes de usar esse simbolo") + } } } } @@ -326,6 +373,8 @@ pub enum SemanticWarningKind { UnusedVariable(String), /// Variável lida antes de receber qualquer inicializador ou atribuição. MayBeUninitialized(String), + /// Função não-void sem `return` detectado no caminho de saída principal. + MissingReturn(String), } #[derive(Debug)] @@ -355,6 +404,16 @@ impl ToReport for SemanticWarning { ) .with_help("inicialize a variavel antes de usa-la") } + SemanticWarningKind::MissingReturn(fn_name) => Report::new("missing return") + .with_span(self.span.clone()) + .with_label( + self.span.clone(), + format!( + "função '{}' não-void pode encerrar sem retornar valor", + fn_name + ), + ) + .with_help("adicione um 'return ;' ao final da função"), } } } diff --git a/src/common/mod.rs b/src/common/mod.rs index 5acf327..a0f558d 100644 --- a/src/common/mod.rs +++ b/src/common/mod.rs @@ -1,4 +1,5 @@ pub mod ast; +pub mod builtins; pub mod errors; pub mod input; pub mod utils; diff --git a/src/examples/demo_presentation.c b/src/examples/demo_presentation.c new file mode 100644 index 0000000..540f172 --- /dev/null +++ b/src/examples/demo_presentation.c @@ -0,0 +1,39 @@ +/* + * Programa-demo para a apresentação final (issue #163). + * + * Mostra, em poucas linhas, os recursos centrais já estáveis do + * subconjunto de C suportado pelo compilador: + * - variaveis e expressoes aritmeticas + * - funcoes com chamada recursiva (factorial) + * - controle de fluxo: if/else e while + * + * Saida esperada: exit code 80 + * factorial(4) = 24 + * sum_even_squares(6) = 2*2 + 4*4 + 6*6 = 56 + * 24 + 56 = 80 + */ + +int factorial(int n) { + if (n <= 1) { + return 1; + } + return n * factorial(n - 1); +} + +int sum_even_squares(int n) { + int total = 0; + int i = 1; + while (i <= n) { + if (i % 2 == 0) { + total = total + i * i; + } + i = i + 1; + } + return total; +} + +int main(void) { + int fact4 = factorial(4); + int squares = sum_even_squares(6); + return fact4 + squares; +} diff --git a/src/examples/simple.c b/src/examples/simple.c new file mode 100644 index 0000000..34bdc57 --- /dev/null +++ b/src/examples/simple.c @@ -0,0 +1,4 @@ + +int main() { + return 42; +} \ No newline at end of file diff --git a/src/ir/cfg.rs b/src/ir/cfg.rs new file mode 100644 index 0000000..245867f --- /dev/null +++ b/src/ir/cfg.rs @@ -0,0 +1,302 @@ +use std::collections::{HashMap, HashSet}; +use std::fmt; + +use crate::ir::tac::{LabelId, TacFunction, TacInstr}; + +#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)] +pub struct BlockId(pub u32); + +impl fmt::Display for BlockId { + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { + write!(f, "B{}", self.0) + } +} + +#[derive(Debug, Clone, PartialEq)] +pub struct BasicBlock { + pub id: BlockId, + pub instrs: Vec, + pub succs: Vec, + pub preds: Vec, +} + +#[derive(Debug, Clone, PartialEq)] +pub struct Cfg { + pub blocks: Vec, + pub entry: BlockId, + pub exit: BlockId, +} + +impl Cfg { + pub fn predecessors(&self, id: BlockId) -> &[BlockId] { + &self.blocks[id.0 as usize].preds + } + + pub fn successors(&self, id: BlockId) -> &[BlockId] { + &self.blocks[id.0 as usize].succs + } +} + +impl fmt::Display for Cfg { + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { + for block in &self.blocks { + writeln!(f, "{}:", block.id)?; + for instr in &block.instrs { + writeln!(f, " {instr}")?; + } + let succs = block + .succs + .iter() + .map(BlockId::to_string) + .collect::>() + .join(", "); + writeln!(f, " succs: [{succs}]")?; + } + Ok(()) + } +} + +/// Indices in `instrs` that start a new basic block: the function start, +/// label targets, and the instruction right after a Jump/CondJump. +pub fn identify_leaders(instrs: &[TacInstr]) -> HashSet { + let mut leaders = HashSet::new(); + if instrs.is_empty() { + return leaders; + } + + leaders.insert(0); + + for (index, instr) in instrs.iter().enumerate() { + match instr { + TacInstr::Jump { .. } | TacInstr::CondJump { .. } if index + 1 < instrs.len() => { + leaders.insert(index + 1); + } + TacInstr::Label(_) => { + leaders.insert(index); + } + _ => {} + } + } + + leaders +} + +pub fn build_cfg(func: &TacFunction) -> Cfg { + let mut leaders: Vec = identify_leaders(&func.instrs).into_iter().collect(); + leaders.sort_unstable(); + + if leaders.is_empty() { + let entry = BlockId(0); + let block = BasicBlock { + id: entry, + instrs: Vec::new(), + succs: Vec::new(), + preds: Vec::new(), + }; + return Cfg { + blocks: vec![block], + entry, + exit: entry, + }; + } + + let mut blocks = Vec::with_capacity(leaders.len()); + let mut label_to_block: HashMap = HashMap::new(); + + for (block_index, &start) in leaders.iter().enumerate() { + let end = leaders + .get(block_index + 1) + .copied() + .unwrap_or(func.instrs.len()); + let instrs = func.instrs[start..end].to_vec(); + let id = BlockId(block_index as u32); + + if let Some(TacInstr::Label(label)) = instrs.first() { + label_to_block.insert(*label, id); + } + + blocks.push(BasicBlock { + id, + instrs, + succs: Vec::new(), + preds: Vec::new(), + }); + } + + for block_index in 0..blocks.len() { + let succs = match blocks[block_index].instrs.last() { + Some(TacInstr::Jump { label }) => vec![label_to_block[label]], + Some(TacInstr::CondJump { + then_label, + else_label, + .. + }) => vec![label_to_block[then_label], label_to_block[else_label]], + Some(TacInstr::Return { .. }) => Vec::new(), + _ => { + if block_index + 1 < blocks.len() { + vec![BlockId((block_index + 1) as u32)] + } else { + Vec::new() + } + } + }; + blocks[block_index].succs = succs; + } + + for block_index in 0..blocks.len() { + let id = blocks[block_index].id; + for succ in blocks[block_index].succs.clone() { + blocks[succ.0 as usize].preds.push(id); + } + } + + let entry = BlockId(0); + let exit = blocks + .iter() + .rev() + .find(|block| block.succs.is_empty()) + .map(|block| block.id) + .unwrap_or_else(|| blocks.last().unwrap().id); + + Cfg { + blocks, + entry, + exit, + } +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::common::ast::ast::Type; + use crate::ir::tac::{ConstValue, Operand, TempId}; + + fn func(instrs: Vec) -> TacFunction { + TacFunction { + name: "test".to_string(), + params: Vec::new(), + instrs, + var_sizes: Default::default(), + } + } + + #[test] + fn straight_line_code_is_one_block() { + let f = func(vec![ + TacInstr::Copy { + dst: Operand::Temp(TempId(0)), + src: Operand::Const(ConstValue::Int(1)), + ty: Type::Int, + }, + TacInstr::Copy { + dst: Operand::Temp(TempId(1)), + src: Operand::Const(ConstValue::Int(2)), + ty: Type::Int, + }, + TacInstr::Return { + val: Some(Operand::Temp(TempId(1))), + ty: None, + }, + ]); + + let cfg = build_cfg(&f); + + assert_eq!(cfg.blocks.len(), 1); + } + + #[test] + fn if_else_produces_three_blocks() { + let cond = Operand::Const(ConstValue::Int(1)); + let then_label = LabelId(0); + let else_label = LabelId(1); + let merge_label = LabelId(2); + + let f = func(vec![ + TacInstr::CondJump { + cond, + then_label, + else_label, + }, + TacInstr::Label(then_label), + TacInstr::Copy { + dst: Operand::Temp(TempId(0)), + src: Operand::Const(ConstValue::Int(1)), + ty: Type::Int, + }, + TacInstr::Jump { label: merge_label }, + TacInstr::Label(else_label), + TacInstr::Copy { + dst: Operand::Temp(TempId(0)), + src: Operand::Const(ConstValue::Int(2)), + ty: Type::Int, + }, + TacInstr::Label(merge_label), + TacInstr::Return { + val: Some(Operand::Temp(TempId(0))), + ty: None, + }, + ]); + + let cfg = build_cfg(&f); + + assert!(cfg.blocks.len() >= 3); + } + + #[test] + fn while_loop_has_back_edge() { + let cond_label = LabelId(0); + let body_label = LabelId(1); + let exit_label = LabelId(2); + let cond = Operand::Const(ConstValue::Int(1)); + + let f = func(vec![ + TacInstr::Label(cond_label), + TacInstr::CondJump { + cond, + then_label: body_label, + else_label: exit_label, + }, + TacInstr::Label(body_label), + TacInstr::Copy { + dst: Operand::Temp(TempId(0)), + src: Operand::Const(ConstValue::Int(1)), + ty: Type::Int, + }, + TacInstr::Jump { label: cond_label }, + TacInstr::Label(exit_label), + TacInstr::Return { + val: None, + ty: None, + }, + ]); + + let cfg = build_cfg(&f); + + let cond_block = cfg + .blocks + .iter() + .find(|b| matches!(b.instrs.first(), Some(TacInstr::Label(l)) if *l == cond_label)) + .unwrap() + .id; + let body_block = cfg + .blocks + .iter() + .find(|b| matches!(b.instrs.first(), Some(TacInstr::Label(l)) if *l == body_label)) + .unwrap() + .id; + + assert!(cfg.successors(body_block).contains(&cond_block)); + } + + #[test] + fn cfg_entry_has_no_predecessors() { + let f = func(vec![TacInstr::Return { + val: None, + ty: None, + }]); + + let cfg = build_cfg(&f); + + assert!(cfg.predecessors(cfg.entry).is_empty()); + } +} diff --git a/src/ir/lower.rs b/src/ir/lower.rs new file mode 100644 index 0000000..2a38564 --- /dev/null +++ b/src/ir/lower.rs @@ -0,0 +1,1520 @@ +use crate::common::ast::{ + ast::{Program, QualifierType, Type}, + decl::Decl, + expr::{BinOp, Expr, Literal, MemberAccess, PostfixOp, PrefixOp, UnOp}, + stmt::{Stmt, SwitchLabel}, +}; +use crate::common::errors::types::CodegenError; +use crate::ir::tac::{ + ConstValue, LabelGen, LabelId, Operand, TacFunction, TacGlobal, TacInstr, TacProgram, TempGen, + TempId, +}; +use std::collections::HashMap; + +type LowerResult = Result; + +/// Layout calculado de uma struct: offset (em bytes, a partir do endereco +/// base da struct) e tipo resolvido de cada campo, mais o tamanho total +/// (arredondado para a maior alinhamento entre os campos). +#[derive(Debug, Clone)] +struct StructLayout { + fields: Vec<(String, i64, Type)>, + size: i64, +} + +#[derive(Debug, Clone)] +pub struct Lowerer { + temps: TempGen, + labels: LabelGen, + instrs: Vec, + /// Tipo declarado de cada variavel/parametro visto ate agora na funcao + /// atual. Usado para resolver `sizeof(expr)`, indexacao via ponteiro e + /// acesso a membro; nao substitui a analise semantica (que ja validou o + /// programa antes do lowering). + var_types: HashMap, + /// Declaracoes locais atualmente visiveis, uma tabela por escopo lexico. + /// Evita que um local de bloco continue escondendo um global apos `}`. + local_scopes: Vec>, + /// Tipos dos objetos no nivel de arquivo. Mantidos separados dos locais + /// para que um nome local sempre tenha precedencia durante o lowering. + global_types: HashMap, + /// Layout (offsets + tamanho) de cada struct declarada no programa, + /// calculado uma vez em `lower_program` e compartilhado entre as + /// funcoes. Vazio quando o `Lowerer` e usado isoladamente (ex.: testes + /// unitarios deste modulo via `Lowerer::new()`). + struct_layouts: HashMap, + /// Tabela de `typedef`: nome do alias -> tipo subjacente (ainda podendo + /// ser outro alias, resolvido por `resolve_alias`). + typedefs: HashMap, +} + +#[derive(Debug, Clone, Copy, Default)] +struct ControlLabels { + break_label: Option, + continue_label: Option, +} + +impl Lowerer { + pub fn new() -> Self { + Self::with_context(HashMap::new(), HashMap::new(), HashMap::new()) + } + + fn with_context( + struct_layouts: HashMap, + typedefs: HashMap, + global_types: HashMap, + ) -> Self { + Self { + temps: TempGen::new(), + labels: LabelGen::new(), + instrs: Vec::new(), + var_types: HashMap::new(), + local_scopes: vec![HashMap::new()], + global_types, + struct_layouts, + typedefs, + } + } + + /// Registra o tipo declarado de `name`, usado depois para resolver + /// `sizeof(name)`. Chamado para parametros de funcao e para cada + /// `VarDecl` conforme o lowering avanca. + fn declare_var_type(&mut self, name: &str, ty: &Type) { + self.var_types.insert(name.to_string(), ty.clone()); + self.local_scopes + .last_mut() + .expect("lowerer sempre possui um escopo local") + .insert(name.to_string(), ty.clone()); + } + + fn type_of_var(&self, name: &str) -> Option<&Type> { + self.local_scopes + .iter() + .rev() + .find_map(|scope| scope.get(name)) + .or_else(|| self.global_types.get(name)) + } + + fn operand_for_var(&self, name: &str) -> Operand { + let is_local = self + .local_scopes + .iter() + .rev() + .any(|scope| scope.contains_key(name)); + if is_local || !self.global_types.contains_key(name) { + Operand::Var(name.to_string()) + } else { + Operand::Global(name.to_string()) + } + } + + fn with_local_scope( + &mut self, + lower: impl FnOnce(&mut Self) -> LowerResult, + ) -> LowerResult { + self.local_scopes.push(HashMap::new()); + let result = lower(self); + self.local_scopes.pop(); + result + } + + /// Tamanho (em bytes) de cada variavel cujo valor nao cabe num slot + /// escalar de 8 bytes — structs locais e arrays fixos. Chamado ao final do + /// lowering de uma funcao, para popular `TacFunction::var_sizes`. + fn compute_var_sizes(&self) -> HashMap { + let mut sizes = HashMap::new(); + for (name, ty) in &self.var_types { + match resolve_alias(ty, &self.typedefs) { + Type::Struct(struct_name) => { + if let Some(layout) = self.struct_layouts.get(&struct_name) { + sizes.insert(name.clone(), layout.size); + } + } + array_ty @ Type::Array(_, _) => { + if let Ok(size) = type_size(&array_ty) { + sizes.insert(name.clone(), size); + } + } + _ => {} + } + } + sizes + } + + /// Infere se uma expressao produz um valor `double`, o suficiente para + /// escolher entre o caminho de codegen inteiro (`rax`/`rcx`) e o de + /// ponto flutuante (`xmm0`/`xmm1`) sem precisar reexecutar a analise + /// semantica completa. Qualquer expressao fora deste subconjunto e + /// tratada como inteira — escopo deliberadamente limitado a literais, + /// variaveis locais, aritmetica basica e cast, conforme a issue #172. + fn expr_type_for_codegen(&self, expr: &Expr) -> Type { + match expr { + Expr::Literal(Literal::Double(_), _) => Type::Double, + Expr::Ident(name, _) => self + .type_of_var(name) + .map(|ty| resolve_alias(ty, &self.typedefs)) + .unwrap_or(Type::Int), + Expr::Binary(lhs, _, rhs, _) => { + if matches!(self.expr_type_for_codegen(lhs), Type::Double) + || matches!(self.expr_type_for_codegen(rhs), Type::Double) + { + Type::Double + } else { + Type::Int + } + } + Expr::Unary(_, inner, _) => self.expr_type_for_codegen(inner), + Expr::Cast(qty, _, _) => resolve_alias(&qty.ty, &self.typedefs), + _ => Type::Int, + } + } + + /// Infere o tipo estatico (ja resolvido de aliases de `typedef`) de um + /// subconjunto limitado de expressoes (identificadores, deref, indice + /// via ponteiro, membro de struct e cast) — o suficiente para resolver + /// o tamanho do elemento em `arr[i]` e o offset de campo em `s.campo`. + /// Nao substitui a analise semantica completa; expressoes fora desse + /// subconjunto retornam um erro explicito em vez de um palpite. + fn infer_type(&self, expr: &Expr) -> LowerResult { + match expr { + Expr::Ident(name, _) => self + .type_of_var(name) + .map(|ty| resolve_alias(ty, &self.typedefs)) + .ok_or_else(|| codegen_error("tipo de variavel desconhecido no lowering", Some("type"))), + Expr::Unary(UnOp::Deref, inner, _) => match self.infer_type(inner)? { + Type::Pointer(t) | Type::Array(t, _) => Ok(resolve_alias(&t, &self.typedefs)), + _ => Err(codegen_error( + "deref de valor que nao e ponteiro/array", + Some("type"), + )), + }, + Expr::Index(arr, _, _) => match self.infer_type(arr)? { + Type::Pointer(t) => Ok(resolve_alias(&t, &self.typedefs)), + Type::Array(t, Some(_)) => Ok(resolve_alias(&t, &self.typedefs)), + Type::Array(_, None) => Err(codegen_error( + "indexacao de array com tamanho desconhecido nao suportada no lowering", + Some("index"), + )), + _ => Err(codegen_error( + "indexacao de valor que nao e ponteiro/array", + Some("index"), + )), + }, + Expr::Member(obj, access, field, _) => { + let layout = self.struct_layout_of_member_base(obj, access)?; + layout + .fields + .iter() + .find(|(name, _, _)| name == field) + .map(|(_, _, ty)| ty.clone()) + .ok_or_else(|| { + codegen_error("campo de struct desconhecido no lowering", Some("member")) + }) + } + Expr::Cast(qty, _, _) => Ok(resolve_alias(&qty.ty, &self.typedefs)), + _ => Err(codegen_error( + "tipo de expressao nao inferido no lowering (suporte limitado a identificador, deref, indice, membro e cast)", + Some("type"), + )), + } + } + + /// Resolve o layout da struct base de um acesso a membro: para `.`, + /// `obj` deve ser a propria struct; para `->`, `obj` deve ser um + /// ponteiro para struct. + fn struct_layout_of_member_base( + &self, + obj: &Expr, + access: &MemberAccess, + ) -> LowerResult<&StructLayout> { + let struct_name = match access { + MemberAccess::Direct => match self.infer_type(obj)? { + Type::Struct(name) => name, + _ => { + return Err(codegen_error( + "acesso '.' em valor que nao e struct", + Some("member"), + )) + } + }, + MemberAccess::Pointer => match self.infer_type(obj)? { + Type::Pointer(inner) => match resolve_alias(&inner, &self.typedefs) { + Type::Struct(name) => name, + _ => { + return Err(codegen_error( + "acesso '->' em ponteiro que nao e para struct", + Some("member"), + )) + } + }, + _ => { + return Err(codegen_error( + "acesso '->' em valor que nao e ponteiro", + Some("member"), + )) + } + }, + }; + self.struct_layouts.get(&struct_name).ok_or_else(|| { + codegen_error( + "layout de struct desconhecido no lowering (campo agregado aninhado nao suportado, ou struct nunca declarada)", + Some("member"), + ) + }) + } + + /// Calcula o endereco (em bytes) de `arr[idx]`. + /// + /// Para ponteiros, a base e o valor do ponteiro. Para arrays fixos, a + /// base e o endereco do bloco contiguo reservado para a variavel. + fn lower_index_address(&mut self, arr: &Expr, idx: &Expr) -> LowerResult { + let (elem_ty, base_ptr) = match self.infer_type(arr)? { + Type::Pointer(inner) => (resolve_alias(&inner, &self.typedefs), self.lower_expr(arr)?), + Type::Array(inner, Some(_)) => ( + resolve_alias(&inner, &self.typedefs), + self.lower_address_of(arr)?, + ), + Type::Array(_, None) => { + return Err(codegen_error( + "indexacao de array com tamanho desconhecido nao suportada no lowering", + Some("index"), + )) + } + _ => { + return Err(codegen_error( + "indexacao de valor que nao e ponteiro/array", + Some("index"), + )) + } + }; + let elem_size = type_size(&elem_ty)?; + let idx_op = self.lower_expr(idx)?; + + let offset = if elem_size == 1 { + idx_op + } else { + let scaled = self.fresh_temp(); + self.instrs.push(TacInstr::BinOp { + dst: scaled, + op: BinOp::Mul, + lhs: idx_op, + rhs: Operand::Const(ConstValue::Int(elem_size)), + ty: Type::Int, + }); + Operand::Temp(scaled) + }; + + let addr = self.fresh_temp(); + self.instrs.push(TacInstr::BinOp { + dst: addr, + op: BinOp::Add, + lhs: base_ptr, + rhs: offset, + ty: Type::Int, + }); + Ok(Operand::Temp(addr)) + } + + /// Calcula o endereco (em bytes) de `obj.campo` ou `obj->campo`: + /// endereco da struct base + offset do campo no layout. + fn lower_member_address( + &mut self, + obj: &Expr, + access: &MemberAccess, + field: &str, + ) -> LowerResult { + let field_offset = { + let layout = self.struct_layout_of_member_base(obj, access)?; + layout + .fields + .iter() + .find(|(name, _, _)| name == field) + .map(|(_, offset, _)| *offset) + .ok_or_else(|| { + codegen_error("campo de struct desconhecido no lowering", Some("member")) + })? + }; + + let base_addr = match access { + MemberAccess::Direct => self.lower_address_of(obj)?, + MemberAccess::Pointer => self.lower_expr(obj)?, + }; + + if field_offset == 0 { + return Ok(base_addr); + } + + let addr = self.fresh_temp(); + self.instrs.push(TacInstr::BinOp { + dst: addr, + op: BinOp::Add, + lhs: base_addr, + rhs: Operand::Const(ConstValue::Int(field_offset)), + ty: Type::Int, + }); + Ok(Operand::Temp(addr)) + } + + /// Calcula o *endereco* (nao o valor) de uma expressao-lvalue: + /// identificador, `*p` (endereco = o proprio `p`), `arr[i]` ou + /// `obj.campo`/`obj->campo`. + fn lower_address_of(&mut self, expr: &Expr) -> LowerResult { + match expr { + Expr::Ident(name, _) => { + let temp = self.fresh_temp(); + let src = self.operand_for_var(name); + self.instrs.push(TacInstr::UnOp { + dst: temp, + op: UnOp::AddrOf, + src, + ty: Type::Int, + }); + Ok(Operand::Temp(temp)) + } + Expr::Unary(UnOp::Deref, inner, _) => self.lower_expr(inner), + Expr::Index(arr, idx, _) => self.lower_index_address(arr, idx), + Expr::Member(obj, access, field, _) => self.lower_member_address(obj, access, field), + _ => Err(codegen_error( + "nao e possivel obter o endereco desta expressao no lowering", + Some("addr"), + )), + } + } + + pub fn lower_expr(&mut self, expr: &Expr) -> LowerResult { + match expr { + Expr::Literal(value, _) => Ok(Operand::Const(lower_literal(value))), + Expr::Ident(name, _) => Ok(self.operand_for_var(name)), + Expr::Binary(lhs_expr, op, rhs_expr, _) => { + let ty = self.expr_type_for_codegen(expr); + let lhs = self.lower_expr(lhs_expr)?; + let rhs = self.lower_expr(rhs_expr)?; + let dst = self.fresh_temp(); + self.instrs.push(TacInstr::BinOp { + dst, + op: op.clone(), + lhs, + rhs, + ty, + }); + Ok(Operand::Temp(dst)) + } + Expr::Unary(op, src_expr, _) => { + let ty = self.expr_type_for_codegen(src_expr); + let src = self.lower_expr(src_expr)?; + let dst = self.fresh_temp(); + self.instrs.push(TacInstr::UnOp { + dst, + op: op.clone(), + src, + ty, + }); + Ok(Operand::Temp(dst)) + } + Expr::Prefix(op, target, _) => self.lower_prefix(op, target), + Expr::Postfix(op, target, _) => self.lower_postfix(op, target), + Expr::Call(callee, args, _) => { + let fn_name = match callee.as_ref() { + Expr::Ident(name, _) => name.clone(), + _ => { + return Err(codegen_error( + "chamada por expressao nao suportada no lowering", + Some("call"), + )); + } + }; + let mut lowered_args = Vec::with_capacity(args.len()); + for arg in args { + lowered_args.push(self.lower_expr(arg)?); + } + let dst = self.fresh_temp(); + self.instrs.push(TacInstr::Call { + dst: Some(dst), + fn_name, + args: lowered_args, + }); + Ok(Operand::Temp(dst)) + } + Expr::Cast(qty, inner, _) => { + let _ = resolve_alias(&qty.ty, &self.typedefs); + self.lower_expr(inner) + } + Expr::Assign(lhs, rhs, _) => { + let ty = self.expr_type_for_codegen(lhs); + let src = self.lower_expr(rhs)?; + let dst = self.lower_assignment_target(lhs)?; + self.emit_copy(dst.clone(), src, ty)?; + Ok(dst) + } + Expr::CompoundAssign(op, lhs, rhs, _) => { + let ty = self.expr_type_for_codegen(lhs); + let dst = self.lower_assignment_target(lhs)?; + let rhs = self.lower_expr(rhs)?; + let temp = self.fresh_temp(); + self.instrs.push(TacInstr::BinOp { + dst: temp, + op: op.clone(), + lhs: dst.clone(), + rhs, + ty: ty.clone(), + }); + self.emit_copy(dst.clone(), Operand::Temp(temp), ty)?; + Ok(dst) + } + Expr::SizeofType(qty, _) => Ok(Operand::Const(ConstValue::Int(type_size(&qty.ty)?))), + Expr::Ternary(cond, then_expr, else_expr, _) => { + let cond = self.lower_expr(cond)?; + let then_label = self.labels.fresh(); + let else_label = self.labels.fresh(); + let end_label = self.labels.fresh(); + let dst = self.fresh_temp(); + + self.instrs.push(TacInstr::CondJump { + cond, + then_label, + else_label, + }); + + self.instrs.push(TacInstr::Label(then_label)); + let then_ty = self.expr_type_for_codegen(then_expr); + let then_val = self.lower_expr(then_expr)?; + self.emit_copy(Operand::Temp(dst), then_val, then_ty)?; + self.emit_jump_unless_terminated(end_label); + + self.instrs.push(TacInstr::Label(else_label)); + let else_ty = self.expr_type_for_codegen(else_expr); + let else_val = self.lower_expr(else_expr)?; + self.emit_copy(Operand::Temp(dst), else_val, else_ty)?; + + self.instrs.push(TacInstr::Label(end_label)); + Ok(Operand::Temp(dst)) + } + Expr::Index(arr, idx, _) => { + let addr = self.lower_index_address(arr, idx)?; + Ok(Operand::Deref(Box::new(addr))) + } + Expr::Member(obj, access, field, _) => { + let addr = self.lower_member_address(obj, access, field)?; + Ok(Operand::Deref(Box::new(addr))) + } + // `sizeof(expr)`: o caso pratico mais comum e `sizeof(variavel)`. + // O tipo declarado de identificadores e rastreado em + // `var_types` (preenchido a partir de parametros e `VarDecl`); + // para qualquer outra forma de expressao ainda nao ha + // informacao de tipo disponivel no lowering. + Expr::Sizeof(inner, _) => match inner.as_ref() { + Expr::Ident(name, _) => { + let ty = self.type_of_var(name).ok_or_else(|| { + codegen_error( + "sizeof(expr): tipo da variavel desconhecido no lowering", + Some("sizeof"), + ) + })?; + Ok(Operand::Const(ConstValue::Int(type_size(ty)?))) + } + _ => Err(codegen_error( + "sizeof(expr) so e suportado para identificadores simples neste backend", + Some("sizeof"), + )), + }, + } + } + + pub fn lower_stmt(&mut self, stmt: &Stmt) -> LowerResult<()> { + self.lower_stmt_with_control(stmt, ControlLabels::default()) + } + + fn lower_stmt_with_control(&mut self, stmt: &Stmt, control: ControlLabels) -> LowerResult<()> { + match stmt { + Stmt::Block(stmts, _) => self.with_local_scope(|lowerer| { + for stmt in stmts { + lowerer.lower_stmt_with_control(stmt, control)?; + } + Ok(()) + }), + Stmt::If(cond, then_branch, else_branch, _) => { + let cond = self.lower_expr(cond)?; + let then_label = self.labels.fresh(); + let else_label = self.labels.fresh(); + let end_label = self.labels.fresh(); + + self.instrs.push(TacInstr::CondJump { + cond, + then_label, + else_label, + }); + + self.instrs.push(TacInstr::Label(then_label)); + self.lower_stmt_with_control(then_branch, control)?; + self.emit_jump_unless_terminated(end_label); + + self.instrs.push(TacInstr::Label(else_label)); + if let Some(else_branch) = else_branch { + self.lower_stmt_with_control(else_branch, control)?; + } + self.instrs.push(TacInstr::Label(end_label)); + Ok(()) + } + Stmt::While(cond, body, _) => { + let cond_label = self.labels.fresh(); + let body_label = self.labels.fresh(); + let end_label = self.labels.fresh(); + + self.instrs.push(TacInstr::Label(cond_label)); + let cond = self.lower_expr(cond)?; + self.instrs.push(TacInstr::CondJump { + cond, + then_label: body_label, + else_label: end_label, + }); + + self.instrs.push(TacInstr::Label(body_label)); + self.lower_stmt_with_control( + body, + ControlLabels { + break_label: Some(end_label), + continue_label: Some(cond_label), + }, + )?; + self.emit_jump_unless_terminated(cond_label); + + self.instrs.push(TacInstr::Label(end_label)); + Ok(()) + } + Stmt::For(init, cond, inc, body, _) => self.with_local_scope(|lowerer| { + if let Some(init) = init { + lowerer.lower_stmt_with_control(init, control)?; + } + + let cond_label = lowerer.labels.fresh(); + let body_label = lowerer.labels.fresh(); + let inc_label = inc.as_ref().map(|_| lowerer.labels.fresh()); + let end_label = lowerer.labels.fresh(); + let continue_label = inc_label.unwrap_or(cond_label); + + lowerer.instrs.push(TacInstr::Label(cond_label)); + if let Some(cond) = cond { + let cond = lowerer.lower_expr(cond)?; + lowerer.instrs.push(TacInstr::CondJump { + cond, + then_label: body_label, + else_label: end_label, + }); + } + + lowerer.instrs.push(TacInstr::Label(body_label)); + lowerer.lower_stmt_with_control( + body, + ControlLabels { + break_label: Some(end_label), + continue_label: Some(continue_label), + }, + )?; + + if let Some(inc_label) = inc_label { + lowerer.instrs.push(TacInstr::Label(inc_label)); + if let Some(inc) = inc { + lowerer.lower_expr(inc)?; + } + } + lowerer.emit_jump_unless_terminated(cond_label); + + lowerer.instrs.push(TacInstr::Label(end_label)); + Ok(()) + }), + Stmt::DoWhile(cond, body, _) => { + let body_label = self.labels.fresh(); + let cond_label = self.labels.fresh(); + let end_label = self.labels.fresh(); + + self.instrs.push(TacInstr::Label(body_label)); + self.lower_stmt_with_control( + body, + ControlLabels { + break_label: Some(end_label), + continue_label: Some(cond_label), + }, + )?; + + self.instrs.push(TacInstr::Label(cond_label)); + let cond = self.lower_expr(cond)?; + self.instrs.push(TacInstr::CondJump { + cond, + then_label: body_label, + else_label: end_label, + }); + + self.instrs.push(TacInstr::Label(end_label)); + Ok(()) + } + Stmt::Break(_) => { + let label = control.break_label.ok_or_else(|| { + codegen_error("break fora de loop/switch nao suportado", Some("break")) + })?; + self.instrs.push(TacInstr::Jump { label }); + Ok(()) + } + Stmt::Continue(_) => { + let label = control.continue_label.ok_or_else(|| { + codegen_error("continue fora de loop nao suportado", Some("continue")) + })?; + self.instrs.push(TacInstr::Jump { label }); + Ok(()) + } + Stmt::ExprStmt(expr, _) => { + self.lower_expr(expr)?; + Ok(()) + } + Stmt::Return(expr, _) => { + let ty = expr.as_ref().map(|expr| self.expr_type_for_codegen(expr)); + let val = expr + .as_ref() + .map(|expr| self.lower_expr(expr)) + .transpose()?; + self.instrs.push(TacInstr::Return { val, ty }); + Ok(()) + } + Stmt::VarDecl(qty, name, init, _) => { + self.declare_var_type(name, &qty.ty); + if let Some(init) = init { + let ty = resolve_alias(&qty.ty, &self.typedefs); + let src = self.lower_expr(init)?; + self.emit_copy(Operand::Var(name.clone()), src, ty)?; + } + Ok(()) + } + Stmt::Switch(disc, cases, _) => { + let disc_op = self.lower_expr(disc)?; + let end_label = self.labels.fresh(); + + // Um label por `case`/`default`; serve tanto de alvo da + // comparacao quanto de entrada do corpo daquele caso. + let case_labels: Vec = cases.iter().map(|_| self.labels.fresh()).collect(); + let default_index = cases + .iter() + .position(|case| matches!(case.label, SwitchLabel::Default)); + + // Cadeia de comparacoes: testa cada `case` (na ordem em que + // aparece); `default` nao entra na comparacao, e usado como + // fallback ao final da cadeia. + for (index, case) in cases.iter().enumerate() { + if let SwitchLabel::Case(case_expr) = &case.label { + let case_val = self.lower_expr(case_expr)?; + let cmp = self.fresh_temp(); + self.instrs.push(TacInstr::BinOp { + dst: cmp, + op: BinOp::Eq, + lhs: disc_op.clone(), + rhs: case_val, + ty: Type::Int, + }); + let next_test = self.labels.fresh(); + self.instrs.push(TacInstr::CondJump { + cond: Operand::Temp(cmp), + then_label: case_labels[index], + else_label: next_test, + }); + self.instrs.push(TacInstr::Label(next_test)); + } + } + let fallback_label = default_index.map_or(end_label, |i| case_labels[i]); + self.instrs.push(TacInstr::Jump { + label: fallback_label, + }); + + // Corpo dos casos, em ordem, sem break implicito entre eles + // (fallthrough real de C); `break` salta para `end_label`. + for (index, case) in cases.iter().enumerate() { + self.instrs.push(TacInstr::Label(case_labels[index])); + for stmt in &case.stmts { + self.lower_stmt_with_control( + stmt, + ControlLabels { + break_label: Some(end_label), + continue_label: control.continue_label, + }, + )?; + } + } + + self.instrs.push(TacInstr::Label(end_label)); + Ok(()) + } + } + } + + pub fn finish(self) -> Vec { + self.instrs + } + + fn fresh_temp(&mut self) -> TempId { + self.temps.fresh() + } + + fn lower_prefix(&mut self, op: &PrefixOp, target: &Expr) -> LowerResult { + let ty = self.expr_type_for_codegen(target); + let dst = self.lower_assignment_target(target)?; + let temp = self.fresh_temp(); + self.instrs.push(TacInstr::BinOp { + dst: temp, + op: prefix_bin_op(op), + lhs: dst.clone(), + rhs: Operand::Const(ConstValue::Int(1)), + ty: ty.clone(), + }); + self.emit_copy(dst.clone(), Operand::Temp(temp), ty)?; + Ok(dst) + } + + fn lower_postfix(&mut self, op: &PostfixOp, target: &Expr) -> LowerResult { + let ty = self.expr_type_for_codegen(target); + let dst = self.lower_assignment_target(target)?; + let old = self.fresh_temp(); + self.emit_copy(Operand::Temp(old), dst.clone(), ty.clone())?; + + let new = self.fresh_temp(); + self.instrs.push(TacInstr::BinOp { + dst: new, + op: postfix_bin_op(op), + lhs: dst.clone(), + rhs: Operand::Const(ConstValue::Int(1)), + ty: ty.clone(), + }); + self.emit_copy(dst, Operand::Temp(new), ty)?; + Ok(Operand::Temp(old)) + } + + fn lower_assignment_target(&mut self, expr: &Expr) -> LowerResult { + match expr { + Expr::Ident(name, _) => { + // Atribuicao direta entre structs (`q = p;`) copiaria o + // valor inteiro da struct; este backend so move 8 bytes por + // `Copy` (todo o resto do codegen trata cada valor como um + // unico quadword). Para structs que cabem em 8 bytes isso + // ja funciona corretamente de gracas; para maiores, + // copiaria so os 8 primeiros bytes silenciosamente — + // recusa explicitamente em vez disso. + if let Some(ty) = self.type_of_var(name) { + if let Type::Struct(struct_name) = resolve_alias(ty, &self.typedefs) { + let size = self + .struct_layouts + .get(&struct_name) + .map(|l| l.size) + .unwrap_or(8); + if size > 8 { + return Err(codegen_error( + "atribuicao direta entre structs maiores que 8 bytes nao suportada neste backend (copie campo a campo)", + Some("assign"), + )); + } + } + } + Ok(self.operand_for_var(name)) + } + // `*p` como destino (`*p = x;`, `*p += 1;`, `(*p)++` etc.): o + // ponteiro em si e um rvalue comum, mas o destino da escrita e o + // endereco para o qual ele aponta. + Expr::Unary(UnOp::Deref, inner, _) => { + let ptr = self.lower_expr(inner)?; + Ok(Operand::Deref(Box::new(ptr))) + } + // `arr[i] = x;` (com `arr` ponteiro): mesmo enderecamento usado + // na leitura, via `lower_index_address`. + Expr::Index(arr, idx, _) => { + let addr = self.lower_index_address(arr, idx)?; + Ok(Operand::Deref(Box::new(addr))) + } + // `obj.campo = x;` / `obj->campo = x;`: mesmo enderecamento + // usado na leitura, via `lower_member_address`. + Expr::Member(obj, access, field, _) => { + let addr = self.lower_member_address(obj, access, field)?; + Ok(Operand::Deref(Box::new(addr))) + } + _ => Err(codegen_error( + "destino de atribuicao nao suportado no lowering", + Some("assign"), + )), + } + } + + fn emit_copy(&mut self, dst: Operand, src: Operand, ty: Type) -> LowerResult<()> { + match dst { + Operand::Temp(_) | Operand::Var(_) | Operand::Global(_) | Operand::Deref(_) => { + self.instrs.push(TacInstr::Copy { dst, src, ty }); + Ok(()) + } + Operand::Const(_) => Err(codegen_error( + "constante nao pode ser destino de copia", + Some("copy"), + )), + } + } + + fn emit_jump_unless_terminated(&mut self, label: LabelId) { + if !matches!( + self.instrs.last(), + Some(TacInstr::Jump { .. } | TacInstr::Return { .. }) + ) { + self.instrs.push(TacInstr::Jump { label }); + } + } +} + +impl Default for Lowerer { + fn default() -> Self { + Self::new() + } +} + +pub fn lower_function(decl: &Decl) -> LowerResult { + lower_function_with_context(decl, &HashMap::new(), &HashMap::new(), &HashMap::new()) +} + +fn lower_function_with_context( + decl: &Decl, + struct_layouts: &HashMap, + typedefs: &HashMap, + global_types: &HashMap, +) -> LowerResult { + match decl { + Decl::Function(_, name, params, body, _) => { + let mut lowerer = Lowerer::with_context( + struct_layouts.clone(), + typedefs.clone(), + global_types.clone(), + ); + for (qty, param_name) in params { + lowerer.declare_var_type(param_name, &qty.ty); + } + for stmt in body { + lowerer.lower_stmt(stmt)?; + } + + let var_sizes = lowerer.compute_var_sizes(); + Ok(TacFunction { + name: name.clone(), + params: params.iter().map(|(_, name)| name.clone()).collect(), + instrs: lowerer.finish(), + var_sizes, + }) + } + _ => Err(codegen_error( + "lower_function espera Decl::Function", + Some("lower_function"), + )), + } +} + +pub fn lower_program(prog: &Program) -> LowerResult { + let typedefs = build_typedefs(prog); + let struct_layouts = build_struct_layouts(prog, &typedefs); + let global_types = build_global_types(prog); + let globals = lower_globals(prog, &typedefs, &struct_layouts)?; + + let mut functions = Vec::new(); + for decl in &prog.decls { + if matches!(decl, Decl::Function(..)) { + functions.push(lower_function_with_context( + decl, + &struct_layouts, + &typedefs, + &global_types, + )?); + } + } + + Ok(TacProgram { globals, functions }) +} + +fn build_global_types(prog: &Program) -> HashMap { + prog.decls + .iter() + .filter_map(|decl| match decl { + Decl::GlobalVar(qty, name, _, _) => Some((name.clone(), qty.ty.clone())), + _ => None, + }) + .collect() +} + +fn lower_globals( + prog: &Program, + typedefs: &HashMap, + struct_layouts: &HashMap, +) -> LowerResult> { + let mut globals = Vec::new(); + for decl in &prog.decls { + let Decl::GlobalVar(qty, name, init, _) = decl else { + continue; + }; + let ty = resolve_alias(&qty.ty, typedefs); + let size = global_storage_size(&ty, struct_layouts)?; + let init = init.as_ref().map(lower_static_initializer).transpose()?; + globals.push(TacGlobal { + name: name.clone(), + size, + init, + }); + } + Ok(globals) +} + +fn global_storage_size( + ty: &Type, + struct_layouts: &HashMap, +) -> LowerResult { + let raw = match ty { + Type::Struct(name) => { + let layout = struct_layouts.get(name).ok_or_else(|| { + codegen_error( + "layout de struct global desconhecido no lowering", + Some("global"), + ) + })?; + // A selecao de instrucoes atual acessa campos com movq. Reserva + // tambem os bytes alcancados pelo ultimo campo para evitar que + // esse acesso invada o simbolo global seguinte. + layout + .fields + .iter() + .fold(layout.size, |size, (_, offset, _)| size.max(offset + 8)) + } + Type::Char + | Type::Short + | Type::Int + | Type::Long + | Type::Float + | Type::Double + | Type::Pointer(_) + | Type::Enum(_) => 8, + Type::Array(_, _) | Type::Void | Type::Alias(_) | Type::Function(_, _) => { + return Err(codegen_error( + "tipo de variavel global sem tamanho suportado no lowering", + Some("global"), + )); + } + }; + Ok(align_up(raw.max(8), 8)) +} + +fn lower_static_initializer(expr: &Expr) -> LowerResult { + match expr { + Expr::Literal(value, _) => Ok(lower_literal(value)), + Expr::Cast(_, inner, _) => lower_static_initializer(inner), + _ => eval_const_int(expr).map(ConstValue::Int), + } +} + +fn eval_const_int(expr: &Expr) -> LowerResult { + match expr { + Expr::Literal(Literal::Int(value), _) => Ok(*value), + Expr::Literal(Literal::Char(value), _) => Ok(*value as i64), + Expr::Cast(_, inner, _) => eval_const_int(inner), + Expr::Unary(op, inner, _) => { + let value = eval_const_int(inner)?; + match op { + UnOp::Neg => Ok(value.wrapping_neg()), + UnOp::Not => Ok((value == 0) as i64), + UnOp::BitNot => Ok(!value), + UnOp::Deref | UnOp::AddrOf => Err(codegen_error( + "inicializador global nao e uma constante inteira", + Some("global-init"), + )), + } + } + Expr::Binary(lhs, op, rhs, _) => { + let lhs = eval_const_int(lhs)?; + let rhs = eval_const_int(rhs)?; + match op { + BinOp::Add => Ok(lhs.wrapping_add(rhs)), + BinOp::Sub => Ok(lhs.wrapping_sub(rhs)), + BinOp::Mul => Ok(lhs.wrapping_mul(rhs)), + BinOp::Div if rhs != 0 => Ok(lhs.wrapping_div(rhs)), + BinOp::Mod if rhs != 0 => Ok(lhs.wrapping_rem(rhs)), + BinOp::Eq => Ok((lhs == rhs) as i64), + BinOp::Neq => Ok((lhs != rhs) as i64), + BinOp::Less => Ok((lhs < rhs) as i64), + BinOp::Greater => Ok((lhs > rhs) as i64), + BinOp::Leq => Ok((lhs <= rhs) as i64), + BinOp::Geq => Ok((lhs >= rhs) as i64), + BinOp::And => Ok((lhs != 0 && rhs != 0) as i64), + BinOp::Or => Ok((lhs != 0 || rhs != 0) as i64), + BinOp::BitAnd => Ok(lhs & rhs), + BinOp::BitOr => Ok(lhs | rhs), + BinOp::BitXor => Ok(lhs ^ rhs), + BinOp::Shl => Ok(lhs.wrapping_shl(rhs as u32)), + BinOp::Shr => Ok(lhs.wrapping_shr(rhs as u32)), + BinOp::Div | BinOp::Mod => Err(codegen_error( + "divisao por zero em inicializador global", + Some("global-init"), + )), + } + } + Expr::Ternary(cond, then_expr, else_expr, _) => { + if eval_const_int(cond)? != 0 { + eval_const_int(then_expr) + } else { + eval_const_int(else_expr) + } + } + _ => Err(codegen_error( + "inicializador global deve ser uma expressao constante", + Some("global-init"), + )), + } +} + +/// Segue a cadeia de `Type::Alias` ate um tipo concreto, usando a tabela de +/// `typedef` do programa. Limita a profundidade para nao travar em alias +/// ciclico malformado; nesse caso, devolve o alias original sem resolver. +fn resolve_alias(ty: &Type, typedefs: &HashMap) -> Type { + let mut current = ty.clone(); + for _ in 0..8 { + match current { + Type::Alias(name) => match typedefs.get(&name) { + Some(next) => current = next.clone(), + None => return Type::Alias(name), + }, + Type::Pointer(inner) => { + return Type::Pointer(Box::new(resolve_alias(&inner, typedefs))) + } + Type::Array(inner, size) => { + return Type::Array(Box::new(resolve_alias(&inner, typedefs)), size) + } + Type::Function(ret, params) => { + let ret = QualifierType { + ty: resolve_alias(&ret.ty, typedefs), + is_const: ret.is_const, + is_unsigned: ret.is_unsigned, + }; + let params = params + .iter() + .map(|param| QualifierType { + ty: resolve_alias(¶m.ty, typedefs), + is_const: param.is_const, + is_unsigned: param.is_unsigned, + }) + .collect(); + return Type::Function(Box::new(ret), params); + } + other => return other, + } + } + current +} + +/// Coleta `nome -> tipo subjacente` de todo `Decl::Typedef` do programa. +fn build_typedefs(prog: &Program) -> HashMap { + let mut map = HashMap::new(); + for decl in &prog.decls { + if let Decl::Typedef(qty, name, _) = decl { + map.insert(name.clone(), qty.ty.clone()); + } + } + map +} + +/// Calcula o layout de cada `Decl::StructDecl` do programa. Structs cujo +/// layout nao pode ser calculado (campo agregado aninhado — struct/array +/// dentro de struct, ainda nao suportado) sao simplesmente omitidas: usos de +/// `Expr::Member` sobre elas falham depois, no lowering, com um erro +/// explicito em vez de um layout incorreto. +fn build_struct_layouts( + prog: &Program, + typedefs: &HashMap, +) -> HashMap { + let mut layouts = HashMap::new(); + for decl in &prog.decls { + if let Decl::StructDecl(name, fields, _) = decl { + if let Ok(layout) = compute_struct_layout(fields, typedefs) { + layouts.insert(name.clone(), layout); + } + } + } + layouts +} + +fn compute_struct_layout( + fields: &[(QualifierType, String)], + typedefs: &HashMap, +) -> LowerResult { + let mut offset = 0i64; + let mut max_align = 1i64; + let mut laid_out = Vec::with_capacity(fields.len()); + + for (qty, field_name) in fields { + let resolved = resolve_alias(&qty.ty, typedefs); + let size = type_size(&resolved).map_err(|_| { + codegen_error( + "campo de struct com tipo agregado (struct/array) aninhado nao suportado neste backend", + Some("struct"), + ) + })?; + + offset = align_up(offset, size); + laid_out.push((field_name.clone(), offset, resolved)); + offset += size; + max_align = max_align.max(size); + } + + let total = align_up(offset, max_align.max(1)).max(1); + Ok(StructLayout { + fields: laid_out, + size: total, + }) +} + +fn align_up(value: i64, alignment: i64) -> i64 { + if alignment <= 0 { + return value; + } + (value + alignment - 1) / alignment * alignment +} + +/// Gera o TAC e aplica todas as otimizações básicas (constant folding, +/// constant propagation e dead code elimination) até ponto fixo. +/// +/// Este é o ponto de entrada recomendado para a pipeline de compilação. +pub fn lower_and_optimize(prog: &Program) -> LowerResult { + use crate::codegen::inter::optimizations::optimize_function; + + let mut tac = lower_program(prog)?; + for func in &mut tac.functions { + optimize_function(&mut func.instrs); + } + Ok(tac) +} + +fn lower_literal(value: &Literal) -> ConstValue { + match value { + Literal::Int(value) => ConstValue::Int(*value), + Literal::Double(value) => ConstValue::Double(*value), + Literal::Char(value) => ConstValue::Char(*value), + Literal::String(value) => ConstValue::String(value.clone()), + } +} + +fn prefix_bin_op(op: &PrefixOp) -> BinOp { + match op { + PrefixOp::Inc => BinOp::Add, + PrefixOp::Dec => BinOp::Sub, + } +} + +fn postfix_bin_op(op: &PostfixOp) -> BinOp { + match op { + PostfixOp::Inc => BinOp::Add, + PostfixOp::Dec => BinOp::Sub, + } +} + +fn type_size(ty: &Type) -> LowerResult { + match ty { + Type::Char => Ok(1), + Type::Short => Ok(2), + Type::Int | Type::Float | Type::Enum(_) => Ok(4), + Type::Long | Type::Double | Type::Pointer(_) => Ok(8), + Type::Array(inner, Some(len)) => { + let elem_size = type_size(inner)?; + Ok(elem_size * (*len as i64)) + } + Type::Array(_, None) => Err(codegen_error( + "lowering de sizeof(array) requer tamanho de array conhecido", + Some("sizeof"), + )), + Type::Void | Type::Struct(_) | Type::Alias(_) | Type::Function(_, _) => Err(codegen_error( + "lowering de sizeof(type) requer layout/tamanho completo", + Some("sizeof"), + )), + } +} + +fn codegen_error(message: &str, instruction: Option<&str>) -> CodegenError { + CodegenError { + message: message.to_string(), + instruction: instruction.map(str::to_string), + } +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::common::{ + ast::ast::{QualifierType, Type}, + errors::error_data::Span, + }; + + fn span() -> Span { + Span { + line: 1, + end_line: 1, + column_start: 1, + column_end: 1, + } + } + + fn int_ty() -> QualifierType { + QualifierType { + ty: Type::Int, + is_const: false, + is_unsigned: false, + } + } + + fn array_ty(inner: Type, size: Option) -> QualifierType { + QualifierType { + ty: Type::Array(Box::new(inner), size), + is_const: false, + is_unsigned: false, + } + } + + fn int(value: i64) -> Expr { + Expr::Literal(Literal::Int(value), span()) + } + + fn ident(name: &str) -> Expr { + Expr::Ident(name.to_string(), span()) + } + + #[test] + fn lower_binary_expr_produces_temp() { + let expr = Expr::Binary(Box::new(int(2)), BinOp::Add, Box::new(int(3)), span()); + let mut lowerer = Lowerer::new(); + + let result = lowerer.lower_expr(&expr).unwrap(); + + assert_eq!(result, Operand::Temp(TempId(0))); + assert_eq!( + lowerer.finish(), + vec![TacInstr::BinOp { + dst: TempId(0), + op: BinOp::Add, + lhs: Operand::Const(ConstValue::Int(2)), + rhs: Operand::Const(ConstValue::Int(3)), + ty: Type::Int, + }] + ); + } + + #[test] + fn lower_if_else_produces_labels() { + let stmt = Stmt::If( + int(1), + Box::new(Stmt::VarDecl( + int_ty(), + "x".to_string(), + Some(int(2)), + span(), + )), + Some(Box::new(Stmt::VarDecl( + int_ty(), + "y".to_string(), + Some(int(3)), + span(), + ))), + span(), + ); + let mut lowerer = Lowerer::new(); + + lowerer.lower_stmt(&stmt).unwrap(); + let instrs = lowerer.finish(); + + assert!(matches!( + instrs[0], + TacInstr::CondJump { + then_label: LabelId(0), + else_label: LabelId(1), + .. + } + )); + assert_eq!(instrs[1], TacInstr::Label(LabelId(0))); + assert_eq!( + instrs[2], + TacInstr::Copy { + dst: Operand::Var("x".to_string()), + src: Operand::Const(ConstValue::Int(2)), + ty: Type::Int, + } + ); + assert_eq!(instrs[3], TacInstr::Jump { label: LabelId(2) }); + assert_eq!(instrs[4], TacInstr::Label(LabelId(1))); + assert_eq!( + instrs[5], + TacInstr::Copy { + dst: Operand::Var("y".to_string()), + src: Operand::Const(ConstValue::Int(3)), + ty: Type::Int, + } + ); + assert_eq!(instrs[6], TacInstr::Label(LabelId(2))); + } + + #[test] + fn lower_while_produces_backedge() { + let stmt = Stmt::While( + ident("keep_going"), + Box::new(Stmt::VarDecl( + int_ty(), + "x".to_string(), + Some(int(1)), + span(), + )), + span(), + ); + let mut lowerer = Lowerer::new(); + + lowerer.lower_stmt(&stmt).unwrap(); + let instrs = lowerer.finish(); + + assert_eq!(instrs[0], TacInstr::Label(LabelId(0))); + assert!(matches!( + instrs[1], + TacInstr::CondJump { + then_label: LabelId(1), + else_label: LabelId(2), + .. + } + )); + assert_eq!(instrs[2], TacInstr::Label(LabelId(1))); + assert_eq!(instrs[4], TacInstr::Jump { label: LabelId(0) }); + assert_eq!(instrs[5], TacInstr::Label(LabelId(2))); + } + + #[test] + fn lower_function_call_passes_args() { + let arg0 = Expr::Binary(Box::new(int(1)), BinOp::Add, Box::new(int(2)), span()); + let expr = Expr::Call(Box::new(ident("sum")), vec![arg0, int(3)], span()); + let mut lowerer = Lowerer::new(); + + let result = lowerer.lower_expr(&expr).unwrap(); + let instrs = lowerer.finish(); + + assert_eq!(result, Operand::Temp(TempId(1))); + assert_eq!( + instrs[1], + TacInstr::Call { + dst: Some(TempId(1)), + fn_name: "sum".to_string(), + args: vec![Operand::Temp(TempId(0)), Operand::Const(ConstValue::Int(3))], + } + ); + } + + #[test] + fn lower_nested_expr_correct_temp_order() { + let rhs = Expr::Binary(Box::new(int(3)), BinOp::Mul, Box::new(int(4)), span()); + let expr = Expr::Binary(Box::new(int(2)), BinOp::Add, Box::new(rhs), span()); + let mut lowerer = Lowerer::new(); + + let result = lowerer.lower_expr(&expr).unwrap(); + + assert_eq!(result, Operand::Temp(TempId(1))); + assert_eq!( + lowerer.finish(), + vec![ + TacInstr::BinOp { + dst: TempId(0), + op: BinOp::Mul, + lhs: Operand::Const(ConstValue::Int(3)), + rhs: Operand::Const(ConstValue::Int(4)), + ty: Type::Int, + }, + TacInstr::BinOp { + dst: TempId(1), + op: BinOp::Add, + lhs: Operand::Const(ConstValue::Int(2)), + rhs: Operand::Temp(TempId(0)), + ty: Type::Int, + }, + ] + ); + } + + #[test] + fn lower_function_keeps_name_params_and_body() { + let decl = Decl::Function( + int_ty(), + "main".to_string(), + vec![(int_ty(), "argc".to_string())], + vec![Stmt::Return(Some(ident("argc")), span())], + span(), + ); + + let func = lower_function(&decl).unwrap(); + + assert_eq!(func.name, "main"); + assert_eq!(func.params, vec!["argc"]); + assert_eq!( + func.instrs, + vec![TacInstr::Return { + val: Some(Operand::Var("argc".to_string())), + ty: Some(Type::Int), + }] + ); + } + + #[test] + fn lower_fixed_array_local_populates_var_sizes() { + let decl = Decl::Function( + int_ty(), + "main".to_string(), + vec![], + vec![ + Stmt::VarDecl( + array_ty(Type::Int, Some(3)), + "arr".to_string(), + None, + span(), + ), + Stmt::Return(Some(int(0)), span()), + ], + span(), + ); + + let func = lower_function(&decl).unwrap(); + + assert_eq!(func.var_sizes.get("arr"), Some(&12)); + } + + #[test] + fn lower_fixed_array_index_uses_address_of_base() { + let mut lowerer = Lowerer::new(); + lowerer.declare_var_type("arr", &Type::Array(Box::new(Type::Int), Some(3))); + let expr = Expr::Assign( + Box::new(Expr::Index( + Box::new(ident("arr")), + Box::new(int(1)), + span(), + )), + Box::new(int(7)), + span(), + ); + + lowerer.lower_expr(&expr).unwrap(); + let instrs = lowerer.finish(); + + assert!(instrs.iter().any(|instr| matches!( + instr, + TacInstr::UnOp { + op: UnOp::AddrOf, + src: Operand::Var(name), + .. + } if name == "arr" + ))); + assert!(instrs.iter().any(|instr| matches!( + instr, + TacInstr::BinOp { + op: BinOp::Mul, + rhs: Operand::Const(ConstValue::Int(4)), + .. + } + ))); + assert!(instrs.iter().any(|instr| matches!( + instr, + TacInstr::Copy { + dst: Operand::Deref(_), + .. + } + ))); + } +} diff --git a/src/ir/mod.rs b/src/ir/mod.rs new file mode 100644 index 0000000..785d1b9 --- /dev/null +++ b/src/ir/mod.rs @@ -0,0 +1,3 @@ +pub mod cfg; +pub mod lower; +pub mod tac; diff --git a/src/ir/tac.rs b/src/ir/tac.rs new file mode 100644 index 0000000..50df756 --- /dev/null +++ b/src/ir/tac.rs @@ -0,0 +1,267 @@ +use std::fmt; + +use crate::common::ast::ast::Type; +use crate::common::ast::expr::{BinOp, UnOp}; + +#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)] +pub struct TempId(pub u32); + +#[derive(Debug, Clone)] +pub struct TempGen { + next: u32, +} + +impl TempGen { + pub fn new() -> Self { + Self { next: 0 } + } + + pub fn fresh(&mut self) -> TempId { + let temp = TempId(self.next); + self.next += 1; + temp + } +} + +impl Default for TempGen { + fn default() -> Self { + Self::new() + } +} + +#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)] +pub struct LabelId(pub u32); + +#[derive(Debug, Clone)] +pub struct LabelGen { + next: u32, +} + +impl LabelGen { + pub fn new() -> Self { + Self { next: 0 } + } + + pub fn fresh(&mut self) -> LabelId { + let label = LabelId(self.next); + self.next += 1; + label + } +} + +impl Default for LabelGen { + fn default() -> Self { + Self::new() + } +} + +#[derive(Debug, Clone, PartialEq)] +pub enum ConstValue { + Int(i64), + Double(f64), + Char(char), + String(String), +} + +#[derive(Debug, Clone, PartialEq)] +pub enum Operand { + Temp(TempId), + /// Variavel automatica da funcao atual, residente no stack frame. + Var(String), + /// Objeto com duracao de armazenamento estatica, referenciado por simbolo. + Global(String), + Const(ConstValue), + /// Endereco indireto: o ponteiro guardado em `Operand` interno e lido (ou + /// escrito, quando usado como destino de `Copy`) atraves de deref, ex.: + /// `*p` como destino de `*p = x;` ou como valor em `*p + 1`. + Deref(Box), +} + +#[derive(Debug, Clone, PartialEq)] +pub enum TacInstr { + BinOp { + dst: TempId, + op: BinOp, + lhs: Operand, + rhs: Operand, + ty: Type, + }, + UnOp { + dst: TempId, + op: UnOp, + src: Operand, + ty: Type, + }, + Copy { + dst: Operand, + src: Operand, + ty: Type, + }, + Jump { + label: LabelId, + }, + CondJump { + cond: Operand, + then_label: LabelId, + else_label: LabelId, + }, + Call { + dst: Option, + fn_name: String, + args: Vec, + }, + Return { + val: Option, + ty: Option, + }, + Label(LabelId), +} + +#[derive(Debug, Clone, PartialEq, Default)] +pub struct TacFunction { + pub name: String, + pub params: Vec, + pub instrs: Vec, + /// Tamanho em bytes (ja arredondado para multiplo de 8) de variaveis + /// cujo valor nao cabe num slot escalar de 8 bytes — hoje, apenas + /// structs locais. Variaveis ausentes daqui usam o tamanho padrao (8 + /// bytes) no codegen. + pub var_sizes: std::collections::HashMap, +} + +#[derive(Debug, Clone, PartialEq)] +pub struct TacGlobal { + pub name: String, + /// Espaco reservado pelo backend. E no minimo um quadword porque o + /// backend escalar atual faz loads/stores de 64 bits. + pub size: i64, + /// `None` representa a inicializacao estatica implicita com zero. + pub init: Option, +} + +#[derive(Debug, Clone, PartialEq, Default)] +pub struct TacProgram { + pub globals: Vec, + pub functions: Vec, +} + +impl fmt::Display for TempId { + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { + write!(f, "t{}", self.0) + } +} + +impl fmt::Display for LabelId { + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { + write!(f, "L{}", self.0) + } +} + +impl fmt::Display for ConstValue { + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { + match self { + ConstValue::Int(value) => write!(f, "{value}"), + ConstValue::Double(value) => write!(f, "{value}"), + ConstValue::Char(value) => write!(f, "{value:?}"), + ConstValue::String(value) => write!(f, "{value:?}"), + } + } +} + +impl fmt::Display for Operand { + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { + match self { + Operand::Temp(temp) => write!(f, "{temp}"), + Operand::Var(name) => write!(f, "{name}"), + Operand::Global(name) => write!(f, "@{name}"), + Operand::Const(value) => write!(f, "{value}"), + Operand::Deref(inner) => write!(f, "*{inner}"), + } + } +} + +impl fmt::Display for TacInstr { + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { + match self { + TacInstr::BinOp { + dst, + op, + lhs, + rhs, + ty: _, + } => { + write!(f, "{dst} = {lhs} {} {rhs}", bin_op_symbol(op)) + } + TacInstr::UnOp { + dst, + op, + src, + ty: _, + } => { + write!(f, "{dst} = {}{src}", un_op_symbol(op)) + } + TacInstr::Copy { dst, src, ty: _ } => write!(f, "{dst} = {src}"), + TacInstr::Jump { label } => write!(f, "goto {label}"), + TacInstr::CondJump { + cond, + then_label, + else_label, + } => write!(f, "if {cond} goto {then_label} else goto {else_label}"), + TacInstr::Call { dst, fn_name, args } => { + if let Some(dst) = dst { + write!(f, "{dst} = ")?; + } + + write!(f, "call {fn_name}(")?; + for (index, arg) in args.iter().enumerate() { + if index > 0 { + write!(f, ", ")?; + } + write!(f, "{arg}")?; + } + write!(f, ")") + } + TacInstr::Return { val, ty: _ } => { + if let Some(val) = val { + write!(f, "return {val}") + } else { + write!(f, "return") + } + } + TacInstr::Label(label) => write!(f, "{label}:"), + } + } +} + +fn bin_op_symbol(op: &BinOp) -> &'static str { + match op { + BinOp::Add => "+", + BinOp::Sub => "-", + BinOp::Mul => "*", + BinOp::Div => "/", + BinOp::Mod => "%", + BinOp::Eq => "==", + BinOp::Neq => "!=", + BinOp::Less => "<", + BinOp::Greater => ">", + BinOp::Leq => "<=", + BinOp::Geq => ">=", + BinOp::And => "&&", + BinOp::Or => "||", + BinOp::BitAnd => "&", + BinOp::BitOr => "|", + BinOp::BitXor => "^", + BinOp::Shl => "<<", + BinOp::Shr => ">>", + } +} + +fn un_op_symbol(op: &UnOp) -> &'static str { + match op { + UnOp::Neg => "-", + UnOp::Not => "!", + UnOp::BitNot => "~", + UnOp::Deref => "*", + UnOp::AddrOf => "&", + } +} diff --git a/src/lexer/rules/identifiers.rs b/src/lexer/rules/identifiers.rs index ad8a567..d009338 100644 --- a/src/lexer/rules/identifiers.rs +++ b/src/lexer/rules/identifiers.rs @@ -67,6 +67,8 @@ fn lookup_keyword(ident: &str) -> Option { "volatile" => Some(TokenKind::Volatile), "inline" => Some(TokenKind::Inline), "sizeof" => Some(TokenKind::Sizeof), + "true" => Some(TokenKind::IntLiteral(1)), + "false" => Some(TokenKind::IntLiteral(0)), // Não é keyword — é identificador de usuário _ => None, diff --git a/src/lexer/scanner.rs b/src/lexer/scanner.rs index 55e4fb5..161c144 100644 --- a/src/lexer/scanner.rs +++ b/src/lexer/scanner.rs @@ -1,3 +1,4 @@ +use crate::common::builtins::BuiltinsLibs; use crate::common::errors::{ error_data::Span, types::{CompilerError, LexicalError, LexicalErrorKind}, @@ -13,6 +14,7 @@ pub struct Scanner { pub src: SourceFile, pub tokens: Vec, pub diagnostics: Vec, + pub builtins: BuiltinsLibs, /// Pilha de delimitadores abertos ainda não fechados: (char, linha, coluna) delimiter_stack: Vec<(char, usize, usize)>, /// Posição (em bytes) do início do token sendo reconhecido; capturada antes de consumir o primeiro char. @@ -26,6 +28,7 @@ impl Scanner { src, tokens: Vec::new(), diagnostics: Vec::new(), + builtins: BuiltinsLibs::default(), delimiter_stack: Vec::new(), token_start: 0, } @@ -33,6 +36,8 @@ impl Scanner { /// Executa o scanner até o fim do arquivo, populando `tokens` e `diagnostics`, e retorna os tokens produzidos. pub fn scan(&mut self) -> &[Token] { + self.preprocess_builtin_includes(); + while !self.src.is_at_end() { self.skip_whitespaces_and_comments(); if self.src.is_at_end() { @@ -61,6 +66,32 @@ impl Scanner { &self.tokens } + fn preprocess_builtin_includes(&mut self) { + let source = self.src.source.as_str(); + if !source.contains("#include ") && !source.contains("#include ") { + return; + } + + let mut rewritten = String::new(); + for line in source.lines() { + let trimmed = line.trim(); + if trimmed == "#include " { + self.builtins.stdbool = true; + rewritten.push_str("typedef int bool;\n"); + } else if trimmed == "#include " { + self.builtins.stdio = true; + rewritten.push('\n'); + } else { + rewritten.push_str(line); + rewritten.push('\n'); + } + } + + let old_path = self.src.path.clone(); + self.src = SourceFile::from_string(rewritten); + self.src.path = old_path; + } + /// Lê o próximo char e despacha para o método de lexing correto conforme o caractere. fn next_token(&mut self) { let line = self.src.line(); diff --git a/src/lib.rs b/src/lib.rs index d0722b5..b42801f 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -1,6 +1,7 @@ pub mod analyser; pub mod codegen; pub mod common; +pub mod ir; pub mod lexer; pub mod parser; diff --git a/src/main.rs b/src/main.rs index 73bb8c5..fd7b7bf 100644 --- a/src/main.rs +++ b/src/main.rs @@ -1,12 +1,16 @@ -use crusty::analyser::analyse; +use crusty::analyser::analyse_with_builtins; +use crusty::codegen::inter::opt::{pipeline_for_level, OptLevel}; +use crusty::codegen::inter::Cfg; +use crusty::codegen::last; use crusty::common::ast::pretty::pretty_program; use crusty::common::errors::report::{Report, ToReport}; use crusty::common::input::source::SourceFile; +use crusty::ir::lower::lower_program; use crusty::lexer::scanner::Scanner; use crusty::parser::Parser; use std::env; -use std::path::PathBuf; -use std::process::exit; +use std::path::{Path, PathBuf}; +use std::process::{exit, Command}; fn main() -> std::io::Result<()> { let raw: Vec<_> = env::args().collect(); @@ -17,12 +21,21 @@ fn main() -> std::io::Result<()> { None => { eprintln!("Usage: crusty [flags] "); eprintln!("Flags:"); - eprintln!(" --dump-tokens List all tokens emitted by the lexer"); - eprintln!(" --dump-ast Pretty-print the AST"); - eprintln!(" --dump-ir Dump TAC IR (not yet implemented)"); - eprintln!(" --only-lex Stop after lexing"); - eprintln!(" --only-parse Stop after parsing"); - eprintln!(" --only-semantic Stop after semantic analysis"); + eprintln!(" --dump-tokens List all tokens emitted by the lexer"); + eprintln!(" --dump-ast Pretty-print the AST"); + eprintln!(" --dump-ir Dump TAC IR (not yet implemented)"); + eprintln!(" --only-lex Stop after lexing"); + eprintln!(" --only-parse Stop after parsing"); + eprintln!(" --only-semantic Stop after semantic analysis"); + eprintln!(" -o Set the output file path"); + eprintln!(" --out-dir Set the output directory"); + eprintln!(" --out-name Set the output file name"); + eprintln!(" --emit=asm Stop after emitting x86-64 assembly (.s)"); + eprintln!(" --emit=obj Stop after assembling an object file (.o)"); + eprintln!(" --emit=exe Link a runnable executable (default)"); + eprintln!(" -S, --emit-asm Alias for --emit=asm"); + eprintln!(" -O0|-O1|-O2|-O3 Set optimization level"); + eprintln!(" --opt-level 0|1|2|3 Set optimization level"); exit(64); } }; @@ -40,28 +53,65 @@ fn main() -> std::io::Result<()> { // ── CLI arg parsing ────────────────────────────────────────────────────────── +/// O que o pipeline deve produzir como artefato final. +#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)] +enum EmitKind { + /// Para após gerar o assembly x86-64 (`.s`). + Asm, + /// Monta o assembly em um objeto ELF (`.o`) via `gcc -c`, sem linkar. + Obj, + /// Monta e linka um executável ELF rodável (padrão). + #[default] + Exe, +} + +impl EmitKind { + fn parse(value: &str) -> Option { + match value { + "asm" => Some(Self::Asm), + "obj" => Some(Self::Obj), + "exe" => Some(Self::Exe), + _ => None, + } + } +} + struct CliArgs { input_file: Option, + output_file: Option, + output_dir: Option, + output_name: Option, dump_tokens: bool, dump_ast: bool, dump_ir: bool, only_lex: bool, only_parse: bool, only_semantic: bool, + emit: EmitKind, + opt_level: OptLevel, } impl CliArgs { fn parse(args: &[String]) -> Self { let mut cli = CliArgs { input_file: None, + output_file: None, + output_dir: None, + output_name: None, dump_tokens: false, dump_ast: false, dump_ir: false, only_lex: false, only_parse: false, only_semantic: false, + emit: EmitKind::default(), + opt_level: OptLevel::default(), }; - for arg in args.iter().skip(1) { + + let mut i = 1; + while i < args.len() { + let arg = &args[i]; + match arg.as_str() { "--dump-tokens" => cli.dump_tokens = true, "--dump-ast" => cli.dump_ast = true, @@ -69,6 +119,60 @@ impl CliArgs { "--only-lex" => cli.only_lex = true, "--only-parse" => cli.only_parse = true, "--only-semantic" => cli.only_semantic = true, + "-S" | "--emit-asm" => cli.emit = EmitKind::Asm, + "-o" => { + i += 1; + let Some(path) = args.get(i) else { + eprintln!("error: missing value for -o"); + exit(64); + }; + cli.output_file = Some(path.clone()); + } + "--out-dir" => { + i += 1; + let Some(path) = args.get(i) else { + eprintln!("error: missing value for --out-dir"); + exit(64); + }; + cli.output_dir = Some(path.clone()); + } + "--out-name" => { + i += 1; + let Some(name) = args.get(i) else { + eprintln!("error: missing value for --out-name"); + exit(64); + }; + cli.output_name = Some(name.clone()); + } + _ if arg.starts_with("--emit=") => { + let value = arg.strip_prefix("--emit=").unwrap(); + cli.emit = EmitKind::parse(value).unwrap_or_else(|| { + eprintln!("error: invalid --emit value: {value} (expected asm|obj|exe)"); + exit(64); + }); + } + "-O" | "--opt-level" => { + i += 1; + let Some(level) = args.get(i) else { + eprintln!("error: missing value for {arg}"); + exit(64); + }; + cli.opt_level = OptLevel::parse(level).unwrap_or_else(|| { + eprintln!("error: invalid optimization level: {level}"); + exit(64); + }); + } + _ if arg + .strip_prefix("-O") + .filter(|level| !level.is_empty()) + .is_some() => + { + let level = arg.strip_prefix("-O").unwrap(); + cli.opt_level = OptLevel::parse(level).unwrap_or_else(|| { + eprintln!("error: invalid optimization level: {arg}"); + exit(64); + }); + } _ if arg.starts_with("--") => { eprintln!("error: unknown flag '{arg}'"); exit(64); @@ -82,6 +186,8 @@ impl CliArgs { } _ => cli.input_file = Some(arg.clone()), } + + i += 1; } cli } @@ -100,6 +206,24 @@ impl ToReport for DiagnosticError { } } +#[derive(Debug)] +struct IoError(String); + +impl ToReport for IoError { + fn to_report(&self) -> Report { + Report::new(&format!("I/O error: {}", self.0)) + } +} + +#[derive(Debug)] +struct LinkError(String); + +impl ToReport for LinkError { + fn to_report(&self) -> Report { + Report::new(&format!("link/assemble error: {}", self.0)) + } +} + fn run(source: SourceFile, args: &CliArgs) -> Result<(), Box> { // ── Stage 1: Lex ───────────────────────────────────────────────────────── let mut scanner = Scanner::new(source); @@ -149,11 +273,19 @@ fn run(source: SourceFile, args: &CliArgs) -> Result<(), Box> { } // ── Stage 3: Semantic ──────────────────────────────────────────────────── - let sem_errors = analyse(&program); - let sem_count = sem_errors.len(); + let sem_diagnostics = analyse_with_builtins(&program, scanner.builtins); + let sem_warnings: Vec<_> = sem_diagnostics.iter().filter(|d| d.is_warning()).collect(); + if !sem_warnings.is_empty() { + eprintln!("\n=== Semantic Warnings ({}) ===", sem_warnings.len()); + for w in &sem_warnings { + print_warning_report(&w.to_report()); + } + } + + let sem_count = sem_diagnostics.iter().filter(|d| d.is_error()).count(); if sem_count > 0 { eprintln!("\n=== Semantic Errors ({sem_count}) ==="); - for e in &sem_errors { + for e in sem_diagnostics.iter().filter(|d| d.is_error()) { print_report(&e.to_report()); } return Err(Box::new(DiagnosticError { count: sem_count })); @@ -163,6 +295,127 @@ fn run(source: SourceFile, args: &CliArgs) -> Result<(), Box> { return Ok(()); } + // ── Stage 4: Optimization ──────────────────────────────────────────────── + // A geração de TAC/CFG ainda não está integrada; a etapa de otimização + // opera sobre uma instância vazia de `Cfg` até essa integração existir. + let mut cfg = Cfg::new(); + let opt_pipeline = pipeline_for_level(args.opt_level); + opt_pipeline.run(&mut cfg, 10); + + // ── Stage 5: Code generation (x86-64 / AT&T) ───────────────────────────── + let tac_program = lower_program(&program).map_err(|e| Box::new(e) as Box)?; + let asm = last::emit_program(&tac_program).map_err(|e| Box::new(e) as Box)?; + + let output_path = output_path_for( + &args.input_file, + &args.output_file, + &args.output_dir, + &args.output_name, + args.emit, + ); + emit_artifact(&asm, &output_path, args.emit)?; + eprintln!( + "emitted {}: {}", + emit_kind_label(args.emit), + output_path.display() + ); + + Ok(()) +} + +/// Resolve o caminho de saída final, respeitando `-o` quando fornecido e +/// caindo para um default específico de cada `EmitKind` caso contrário. +fn output_path_for( + input: &Option, + output_override: &Option, + output_dir: &Option, + output_name: &Option, + emit: EmitKind, +) -> PathBuf { + if let Some(path) = output_override { + return PathBuf::from(path); + } + + let default_name = match emit { + EmitKind::Exe => "a.out".to_string(), + EmitKind::Asm | EmitKind::Obj => { + let input = input.clone().unwrap_or_else(|| "crusty.out".to_string()); + let mut path = PathBuf::from(input); + path.set_extension(if emit == EmitKind::Asm { "s" } else { "o" }); + return apply_output_dir(path, output_dir); + } + }; + + let path = PathBuf::from(output_name.clone().unwrap_or(default_name)); + apply_output_dir(path, output_dir) +} + +fn apply_output_dir(mut path: PathBuf, output_dir: &Option) -> PathBuf { + if let Some(dir) = output_dir { + let mut full = PathBuf::from(dir); + full.push(path); + path = full; + } + path +} + +fn emit_kind_label(emit: EmitKind) -> &'static str { + match emit { + EmitKind::Asm => "assembly", + EmitKind::Obj => "object file", + EmitKind::Exe => "executable", + } +} + +/// Escreve o assembly gerado no destino final, invocando `gcc` para montar +/// (`--emit=obj`) ou montar+linkar (`--emit=exe`) quando necessário. +fn emit_artifact(asm: &str, output_path: &Path, emit: EmitKind) -> Result<(), Box> { + if emit == EmitKind::Asm { + return std::fs::write(output_path, asm) + .map_err(|e| Box::new(IoError(e.to_string())) as Box); + } + + let asm_path = write_temp_asm(asm)?; + let result = match emit { + EmitKind::Obj => run_gcc(&["-c", "-x", "assembler-with-cpp"], &asm_path, output_path), + EmitKind::Exe => run_gcc(&[], &asm_path, output_path), + EmitKind::Asm => unreachable!(), + }; + let _ = std::fs::remove_file(&asm_path); + result +} + +/// Grava o assembly em um arquivo temporário único, usado como entrada do `gcc`. +fn write_temp_asm(asm: &str) -> Result> { + let mut path = env::temp_dir(); + path.push(format!("crusty_{}.s", std::process::id())); + std::fs::write(&path, asm) + .map_err(|e| Box::new(IoError(e.to_string())) as Box)?; + Ok(path) +} + +/// Invoca `gcc` com flags extras (ex.: `-c -x assembler-with-cpp` para gerar +/// objeto) sobre `asm_path`, escrevendo o resultado em `output_path`. +fn run_gcc( + extra_args: &[&str], + asm_path: &Path, + output_path: &Path, +) -> Result<(), Box> { + let status = Command::new("gcc") + .args(extra_args) + .arg(asm_path) + .arg("-o") + .arg(output_path) + .status() + .map_err(|e| { + Box::new(LinkError(format!("falha ao invocar gcc: {e}"))) as Box + })?; + + if !status.success() { + return Err(Box::new(LinkError(format!( + "gcc terminou com status {status}" + )))); + } Ok(()) } @@ -190,7 +443,15 @@ fn dump_ast(program: &crusty::common::ast::ast::Program) { // ── Error reporting ─────────────────────────────────────────────────────────── fn print_report(report: &Report) { - eprintln!(" error: {}", report.message); + print_report_with_prefix(report, "error"); +} + +fn print_warning_report(report: &Report) { + print_report_with_prefix(report, "warning"); +} + +fn print_report_with_prefix(report: &Report, prefix: &str) { + eprintln!(" {prefix}: {}", report.message); if let Some(span) = &report.span { eprintln!(" --> {}:{}", span.line, span.column_start); } @@ -213,3 +474,136 @@ fn report_and_exit(e: Box) -> ! { } exit(74); } + +#[cfg(test)] +mod tests { + use super::*; + + fn args(values: &[&str]) -> Vec { + values.iter().map(|value| value.to_string()).collect() + } + + #[test] + fn parses_default_opt_level() { + let parsed = CliArgs::parse(&args(&["crusty", "main.c"])); + assert_eq!(parsed.input_file, Some("main.c".to_string())); + assert_eq!(parsed.opt_level, OptLevel::O0); + } + + #[test] + fn parses_short_opt_level_forms() { + assert_eq!( + CliArgs::parse(&args(&["crusty", "-O2", "main.c"])).opt_level, + OptLevel::O2 + ); + assert_eq!( + CliArgs::parse(&args(&["crusty", "-O", "3", "main.c"])).opt_level, + OptLevel::O3 + ); + } + + #[test] + fn parses_long_opt_level_form() { + let parsed = CliArgs::parse(&args(&["crusty", "--opt-level", "1", "main.c"])); + assert_eq!(parsed.opt_level, OptLevel::O1); + } + + #[test] + fn defaults_to_emit_exe() { + let parsed = CliArgs::parse(&args(&["crusty", "main.c"])); + assert_eq!(parsed.emit, EmitKind::Exe); + assert_eq!(parsed.output_file, None); + assert_eq!(parsed.output_dir, None); + assert_eq!(parsed.output_name, None); + } + + #[test] + fn parses_output_flag() { + let parsed = CliArgs::parse(&args(&["crusty", "-o", "prog", "main.c"])); + assert_eq!(parsed.output_file, Some("prog".to_string())); + assert_eq!(parsed.input_file, Some("main.c".to_string())); + } + + #[test] + fn parses_output_dir_and_name_flags() { + let parsed = CliArgs::parse(&args(&[ + "crusty", + "--out-dir", + "build", + "--out-name", + "demo", + "main.c", + ])); + assert_eq!(parsed.output_dir, Some("build".to_string())); + assert_eq!(parsed.output_name, Some("demo".to_string())); + assert_eq!(parsed.input_file, Some("main.c".to_string())); + } + + #[test] + fn parses_emit_flag_variants() { + assert_eq!( + CliArgs::parse(&args(&["crusty", "--emit=asm", "main.c"])).emit, + EmitKind::Asm + ); + assert_eq!( + CliArgs::parse(&args(&["crusty", "--emit=obj", "main.c"])).emit, + EmitKind::Obj + ); + assert_eq!( + CliArgs::parse(&args(&["crusty", "--emit=exe", "main.c"])).emit, + EmitKind::Exe + ); + assert_eq!( + CliArgs::parse(&args(&["crusty", "-S", "main.c"])).emit, + EmitKind::Asm + ); + assert_eq!( + CliArgs::parse(&args(&["crusty", "--emit-asm", "main.c"])).emit, + EmitKind::Asm + ); + } + + #[test] + fn output_path_defaults_per_emit_kind() { + let input = Some("foo/main.c".to_string()); + assert_eq!( + output_path_for(&input, &None, &None, &None, EmitKind::Asm), + PathBuf::from("foo/main.s") + ); + assert_eq!( + output_path_for(&input, &None, &None, &None, EmitKind::Obj), + PathBuf::from("foo/main.o") + ); + assert_eq!( + output_path_for(&input, &None, &None, &None, EmitKind::Exe), + PathBuf::from("a.out") + ); + } + + #[test] + fn output_path_respects_override() { + let input = Some("main.c".to_string()); + let output = Some("custom_out".to_string()); + for emit in [EmitKind::Asm, EmitKind::Obj, EmitKind::Exe] { + assert_eq!( + output_path_for(&input, &output, &None, &None, emit), + PathBuf::from("custom_out") + ); + } + } + + #[test] + fn output_path_joins_directory_and_name() { + let input = Some("src/main.c".to_string()); + assert_eq!( + output_path_for( + &input, + &None, + &Some("build".to_string()), + &Some("demo".to_string()), + EmitKind::Exe, + ), + PathBuf::from("build/demo") + ); + } +} diff --git a/src/parser/parser.rs b/src/parser/parser.rs index 8481031..7105a4a 100644 --- a/src/parser/parser.rs +++ b/src/parser/parser.rs @@ -52,7 +52,11 @@ impl Parser { loop { // Primeiro tratamos todos os postfix, pois têm maior precedência efetiva. - if postfix::try_parse_postfix(self, &mut lhs)? { + // `try_parse_postfix` toma `lhs` por valor e devolve o novo nó (ou o mesmo + // `lhs` intocado quando não há postfix), evitando clones O(n²) em cadeias. + let (next_lhs, consumed) = postfix::try_parse_postfix(self, lhs)?; + lhs = next_lhs; + if consumed { continue; } diff --git a/src/parser/rules/declarations/types.rs b/src/parser/rules/declarations/types.rs index b63f867..8820b1f 100644 --- a/src/parser/rules/declarations/types.rs +++ b/src/parser/rules/declarations/types.rs @@ -4,20 +4,35 @@ use crate::lexer::tokens::token_kind::TokenKind; use crate::parser::parser::Parser; /// Consome sufixos `[expr?]` após o nome de uma variável e envolve o tipo em `Type::Array`. -/// Suporta múltiplas dimensões: `int arr[3][4]` → `Array(Array(Int))`. -/// O tamanho é consumido mas não armazenado (AST atual não possui campo de tamanho). +/// Suporta múltiplas dimensões: `int arr[3][4]` → `Array(Array(Int, Some(4)), Some(3))`. pub fn parse_array_suffix( parser: &mut Parser, mut qty: QualifierType, ) -> Result { + let mut dimensions = Vec::new(); + while parser.check(&TokenKind::LeftBracket) { parser.advance(); - if !parser.check(&TokenKind::RightBracket) { + + let size = if parser.check(&TokenKind::RightBracket) { + None + } else if let TokenKind::IntLiteral(value) = parser.peek_kind() { + let value = *value; parser.parse_expr(0)?; - } + usize::try_from(value).ok() + } else { + parser.parse_expr(0)?; + None + }; + parser.expect(&TokenKind::RightBracket, "']' ao fim do tamanho do array")?; - qty.ty = Type::Array(Box::new(qty.ty)); + dimensions.push(size); + } + + for size in dimensions.into_iter().rev() { + qty.ty = Type::Array(Box::new(qty.ty), size); } + Ok(qty) } diff --git a/src/parser/rules/expressions/postfix.rs b/src/parser/rules/expressions/postfix.rs index bcfbfef..64ee33d 100644 --- a/src/parser/rules/expressions/postfix.rs +++ b/src/parser/rules/expressions/postfix.rs @@ -4,8 +4,13 @@ use crate::lexer::tokens::token_kind::TokenKind; use crate::parser::parser::Parser; /// Tenta parsear uma operação postfix (`()`, `[]`, `.`, `->`, `++`, `--`) sobre `lhs`. -/// Retorna `Ok(true)` se consumiu um postfix, `Ok(false)` se não há postfix aplicável. -pub fn try_parse_postfix(parser: &mut Parser, lhs: &mut Expr) -> Result { +/// +/// Recebe `lhs` **por valor** para encapsulá-lo em um novo nó movendo (e não clonando) +/// a subexpressão. Encadeamentos como `a.b.c.d.e.f` assim crescem em O(n) em vez de O(n²). +/// +/// Retorna `Ok((expr, true))` se consumiu um postfix (com o novo nó já construído) ou +/// `Ok((lhs, false))` quando não há postfix aplicável (devolvendo `lhs` intacto). +pub fn try_parse_postfix(parser: &mut Parser, lhs: Expr) -> Result<(Expr, bool), CompilerError> { match parser.peek_kind() { TokenKind::LeftParen => { let start = lhs.span(); @@ -25,8 +30,8 @@ pub fn try_parse_postfix(parser: &mut Parser, lhs: &mut Expr) -> Result { let start = lhs.span(); @@ -36,10 +41,11 @@ pub fn try_parse_postfix(parser: &mut Parser, lhs: &mut Expr) -> Result { + let start = lhs.span(); let op = parser.advance().clone(); let field_token = parser.advance().clone(); let TokenKind::Identifier(field_name) = field_token.kind.clone() else { @@ -50,26 +56,27 @@ pub fn try_parse_postfix(parser: &mut Parser, lhs: &mut Expr) -> Result { + let start = lhs.span(); let op = parser.advance().clone(); - let span = parser.join_span(lhs.span(), parser.span_of(&op)); + let span = parser.join_span(start, parser.span_of(&op)); let kind = if op.kind == TokenKind::PlusPlus { PostfixOp::Inc } else { PostfixOp::Dec }; - *lhs = Expr::Postfix(kind, Box::new(lhs.clone()), span); - Ok(true) + let new_expr = Expr::Postfix(kind, Box::new(lhs), span); + Ok((new_expr, true)) } - _ => Ok(false), + _ => Ok((lhs, false)), } } diff --git a/src/tests/analyzer_test.rs b/src/tests/analyzer_test.rs index 614c68b..eff9392 100644 --- a/src/tests/analyzer_test.rs +++ b/src/tests/analyzer_test.rs @@ -147,6 +147,7 @@ mod test { #[test] fn test_return_type_mismatch_error() { + // int f() { return "texto"; } → incompatível (char* não é conversível para int) let mut analyser = SemanticAnalyser::new(); let span = dummy_span(); @@ -156,7 +157,7 @@ mod test { is_unsigned: false, }; - let expr_errada = Expr::Literal(Literal::Double(2.5), span.clone()); + let expr_errada = Expr::Literal(Literal::String("texto".to_string()), span.clone()); let stmt_return = Stmt::Return(Some(expr_errada), span.clone()); let funcao_ast = Decl::Function( @@ -174,7 +175,7 @@ mod test { assert_eq!( analyser.diagnostics.len(), 1, - "Deveria ter detectado incopatibilidade: Int x Double." + "Deveria ter detectado incompatibilidade: int x char* (return de string em função int)." ); } diff --git a/src/tests/mod.rs b/src/tests/mod.rs index f596579..f2cd75f 100644 --- a/src/tests/mod.rs +++ b/src/tests/mod.rs @@ -6,7 +6,9 @@ mod lexical_test; mod literals_test; mod parser_file_test; mod parser_test; +mod peephole_test; mod semantic_test; mod source_test; mod symbol_test; +mod tac_test; mod token_test; diff --git a/src/tests/parser_test.rs b/src/tests/parser_test.rs index 711c9ea..b4285ef 100644 --- a/src/tests/parser_test.rs +++ b/src/tests/parser_test.rs @@ -2,7 +2,7 @@ mod tests { use crate::common::ast::ast::{QualifierType, Type}; use crate::common::ast::decl::Decl; - use crate::common::ast::expr::{BinOp, Expr, Literal, PostfixOp, PrefixOp}; + use crate::common::ast::expr::{BinOp, Expr, Literal, MemberAccess, PostfixOp, PrefixOp}; use crate::common::ast::stmt::{Stmt, SwitchLabel}; use crate::common::input::span::ByteSpan; use crate::lexer::tokens::token::Token; @@ -140,6 +140,46 @@ mod tests { assert_eq!(args.len(), 2); } + #[test] + fn parses_deep_member_access_chain_left_associative() { + // a.b.c.d deve produzir Member(Member(Member(a, b), c), d), + // encadeamento left-assoc que cresce em O(n) (regressão do issue #86). + let tokens = vec![ + ident("a", 1), + tk(TokenKind::Dot, 2), + ident("b", 3), + tk(TokenKind::Dot, 4), + ident("c", 5), + tk(TokenKind::Dot, 6), + ident("d", 7), + eof(8), + ]; + + let expr = Parser::new(tokens) + .parse_expr(0) + .expect("encadeamento de membros válido"); + + let Expr::Member(inner, MemberAccess::Direct, field, _) = expr else { + panic!("esperava Member no topo do encadeamento"); + }; + assert_eq!(field, "d"); + + let Expr::Member(inner, MemberAccess::Direct, field, _) = *inner else { + panic!("esperava Member intermediário"); + }; + assert_eq!(field, "c"); + + let Expr::Member(inner, MemberAccess::Direct, field, _) = *inner else { + panic!("esperava Member intermediário"); + }; + assert_eq!(field, "b"); + + assert!(matches!(*inner, Expr::Ident(..))); + if let Expr::Ident(name, _) = *inner { + assert_eq!(name, "a"); + } + } + #[test] fn parses_cast_expression() { let tokens = vec![ @@ -1502,7 +1542,7 @@ mod tests { panic!("esperava GlobalVar"); }; assert_eq!(name, "arr"); - assert!(matches!(qty.ty, Type::Array(_))); + assert!(matches!(qty.ty, Type::Array(_, Some(10)))); } #[test] @@ -1532,7 +1572,7 @@ mod tests { panic!("esperava VarDecl"); }; assert_eq!(name, "arr"); - assert!(matches!(qty.ty, Type::Array(_))); + assert!(matches!(qty.ty, Type::Array(_, Some(5)))); } #[test] @@ -1555,10 +1595,10 @@ mod tests { let Decl::GlobalVar(qty, _, None, _) = &prog.decls[0] else { panic!("esperava GlobalVar"); }; - let Type::Array(inner) = &qty.ty else { + let Type::Array(inner, Some(3)) = &qty.ty else { panic!("esperava Array externo"); }; - assert!(matches!(**inner, Type::Array(_))); + assert!(matches!(**inner, Type::Array(_, Some(4)))); } #[test] @@ -1579,7 +1619,7 @@ mod tests { let Decl::GlobalVar(qty, _, _, _) = &prog.decls[0] else { panic!("esperava GlobalVar"); }; - assert!(matches!(qty.ty, Type::Array(_))); + assert!(matches!(qty.ty, Type::Array(_, None))); } // ── struct ──────────────────────────────────────────────────────────────── diff --git a/src/tests/peephole_test.rs b/src/tests/peephole_test.rs new file mode 100644 index 0000000..1b0d6b2 --- /dev/null +++ b/src/tests/peephole_test.rs @@ -0,0 +1,48 @@ +use crate::codegen::last::peephole::PeepholePass; + +// Função auxiliar para rodar o Peephole rápido nos testes +fn run_peephole(instrs: Vec<&str>) -> Vec { + let mut asm: Vec = instrs.into_iter().map(|s| s.to_string()).collect(); + let pass = PeepholePass::new(); + pass.run(&mut asm); + asm +} + +#[test] +fn test_remove_add_sub_zero() { + // Padrão 3: Soma ou subtração por zero deve sumir + let asm = run_peephole(vec!["addq $0, %rax", "subq $0, %rcx", "movq %rax, %rbx"]); + assert_eq!(asm, vec!["movq %rax, %rbx"]); +} + +#[test] +fn test_remove_redundant_mov() { + // Padrões 1 e 2: Mov de A pra B seguido de B pra A + let asm = run_peephole(vec![ + "movq %rax, %rbx", + "movq %rbx, %rax", // Esse tem que sumir + "ret", + ]); + assert_eq!(asm, vec!["movq %rax, %rbx", "ret"]); +} + +#[test] +fn test_remove_jump_to_next_line() { + // Padrão 5: Pulo para a linha imediatamente abaixo + let asm = run_peephole(vec!["jmp .L_main_L1", ".L_main_L1:", "ret"]); + assert_eq!(asm, vec![".L_main_L1:", "ret"]); +} + +#[test] +fn test_optimize_mul_power_of_two() { + // Padrão 4: Multiplicação por 8 (2^3) deve virar shlq $3 + let asm = run_peephole(vec!["movq $8, %rcx", "imulq %rcx, %rax"]); + assert_eq!(asm, vec![" shlq $3, %rax"]); +} + +#[test] +fn test_optimize_cmp_zero() { + // Padrão 6: Comparar com 0 deve virar testq + let asm = run_peephole(vec!["movq $0, %rcx", "cmpq %rcx, %rax"]); + assert_eq!(asm, vec![" testq %rax, %rax"]); +} diff --git a/src/tests/semantic_test.rs b/src/tests/semantic_test.rs index 2e3b523..40c71f8 100644 --- a/src/tests/semantic_test.rs +++ b/src/tests/semantic_test.rs @@ -5,6 +5,7 @@ mod tests { use crate::common::ast::decl::Decl; use crate::common::ast::expr::{Expr, Literal, MemberAccess}; use crate::common::ast::stmt::Stmt; + use crate::common::builtins::BuiltinsLibs; use crate::common::errors::error_data::Span; use crate::common::errors::types::{ CompilerError, Diagnostic, SemanticErrorKind, SemanticWarningKind, @@ -43,10 +44,14 @@ mod tests { Expr::Ident(name.to_string(), span()) } + fn call(name: &str, args: Vec) -> Expr { + Expr::Call(Box::new(ident(name)), args, span()) + } + fn program(stmts: Vec) -> Program { Program { decls: vec![Decl::Function( - qty(Type::Int), + qty(Type::Void), "main".into(), vec![], stmts, @@ -393,6 +398,55 @@ mod tests { assert!(analyse(&prog).is_empty()); } + #[test] + fn printf_without_stdio_header_emits_missing_header_error() { + let prog = program(vec![Stmt::ExprStmt( + call( + "printf", + vec![ + Expr::Literal(Literal::String("%d".into()), span()), + int_lit(1), + ], + ), + span(), + )]); + + let mut analyser = SemanticAnalyser::with_builtins(BuiltinsLibs::default()); + analyser.analyse_program(&prog); + + assert!(analyser.diagnostics.iter().any(|e| matches!( + e, + CompilerError::Semantic(se) + if matches!(&se.kind, SemanticErrorKind::MissingLibraryHeader { header, symbol } + if header == "stdio.h" && symbol == "printf") + ))); + } + + #[test] + fn printf_with_stdio_header_accepts_basic_call() { + let prog = program(vec![Stmt::ExprStmt( + call( + "printf", + vec![ + Expr::Literal(Literal::String("%d".into()), span()), + int_lit(1), + ], + ), + span(), + )]); + + let mut analyser = SemanticAnalyser::with_builtins(BuiltinsLibs { + stdbool: false, + stdio: true, + }); + analyser.analyse_program(&prog); + + assert!( + analyser.diagnostics.is_empty(), + "printf com stdio.h deve ser aceito" + ); + } + // ── Assign: verificação de compatibilidade de tipo ──────────────────────── #[test] @@ -524,11 +578,13 @@ mod tests { #[test] fn unused_variable_emits_warning() { - // int x = 5; return 0; -> x declarada mas nunca lida - let prog = program(vec![ - Stmt::VarDecl(qty(Type::Int), "x".into(), Some(int_lit(5)), span()), - Stmt::Return(Some(int_lit(0)), span()), - ]); + // int x = 5; -> x declarada mas nunca lida + let prog = program(vec![Stmt::VarDecl( + qty(Type::Int), + "x".into(), + Some(int_lit(5)), + span(), + )]); let diags = analyse(&prog); assert!(errors(&diags).is_empty(), "não deve haver erros"); assert!( @@ -545,7 +601,7 @@ mod tests { fn used_variable_emits_no_warning() { let prog = program(vec![ Stmt::VarDecl(qty(Type::Int), "x".into(), Some(int_lit(5)), span()), - Stmt::Return(Some(ident("x")), span()), + Stmt::ExprStmt(ident("x"), span()), ]); let diags = analyse(&prog); assert!( @@ -579,10 +635,10 @@ mod tests { #[test] fn uninitialized_use_emits_warning() { - // int x; return x; -> x lida sem inicialização + // int x; _ = x; -> x lida sem inicialização let prog = program(vec![ Stmt::VarDecl(qty(Type::Int), "x".into(), None, span()), - Stmt::Return(Some(ident("x")), span()), + Stmt::ExprStmt(ident("x"), span()), ]); let diags = analyse(&prog); assert!(errors(&diags).is_empty(), "não deve haver erros"); @@ -598,10 +654,10 @@ mod tests { #[test] fn initialized_then_used_emits_no_warning() { - // int x = 0; return x; -> x inicializada, nenhum warning + // int x = 0; _ = x; -> x inicializada, nenhum warning let prog = program(vec![ Stmt::VarDecl(qty(Type::Int), "x".into(), Some(int_lit(0)), span()), - Stmt::Return(Some(ident("x")), span()), + Stmt::ExprStmt(ident("x"), span()), ]); assert!(analyse(&prog).is_empty()); } @@ -616,7 +672,7 @@ mod tests { qty(Type::Int), "f".into(), vec![(qty(Type::Int), "a".into())], - vec![], + vec![Stmt::Return(Some(ident("a")), span())], span(), ), Decl::Function( @@ -711,7 +767,7 @@ mod tests { .sym .declare(crate::analyser::symbol_table::Symbol { name: name.into(), - ty: qty(Type::Array(Box::new(inner))), + ty: qty(Type::Array(Box::new(inner), Some(4))), mutable: true, params: None, decl_span: span(), @@ -998,7 +1054,7 @@ mod tests { #[test] fn prototype_valid_forward_call_is_ok() { // int soma(int a, int b); - // int main() { return soma(1, 2); } + // void main() { soma(1, 2); } // int soma(int a, int b) { return a + b; } let prog = Program { decls: vec![ @@ -1009,15 +1065,15 @@ mod tests { span(), ), Decl::Function( - qty(Type::Int), + qty(Type::Void), "main".into(), vec![], - vec![Stmt::Return( - Some(Expr::Call( + vec![Stmt::ExprStmt( + Expr::Call( Box::new(ident("soma")), vec![int_lit(1), int_lit(2)], span(), - )), + ), span(), )], span(), @@ -1026,7 +1082,7 @@ mod tests { qty(Type::Int), "soma".into(), vec![(qty(Type::Int), "a".into()), (qty(Type::Int), "b".into())], - vec![], + vec![Stmt::Return(Some(ident("a")), span())], span(), ), ], @@ -1178,4 +1234,313 @@ mod tests { "atribuição antes do uso inicializa a variável" ); } + + // ── return: verificação de tipo com coerção numérica ────────────────────── + + #[test] + fn return_int_from_long_fn_is_ok() { + // long f() { return 1; } → coerção int→long permitida + let prog = Program { + decls: vec![Decl::Function( + qty(Type::Long), + "f".into(), + vec![], + vec![Stmt::Return(Some(int_lit(1)), span())], + span(), + )], + }; + assert!( + errors(&analyse(&prog)).is_empty(), + "return int em função long deve ser válido (coerção numérica)" + ); + } + + #[test] + fn return_string_from_int_fn_emits_error() { + // int f() { return "oi"; } → incompatível + let prog = Program { + decls: vec![Decl::Function( + qty(Type::Int), + "f".into(), + vec![], + vec![Stmt::Return( + Some(Expr::Literal(Literal::String("oi".into()), span())), + span(), + )], + span(), + )], + }; + let diags = analyse(&prog); + assert!( + errors(&diags).iter().any(|e| matches!( + e, + CompilerError::Semantic(se) + if matches!(&se.kind, SemanticErrorKind::TypeMismatch { .. }) + )), + "return char* em função int deve emitir TypeMismatch" + ); + } + + #[test] + fn return_missing_in_non_void_fn_emits_warning() { + // int f() { } → aviso de missing return + let prog = Program { + decls: vec![Decl::Function( + qty(Type::Int), + "f".into(), + vec![], + vec![], + span(), + )], + }; + let diags = analyse(&prog); + assert!( + errors(&diags).is_empty(), + "missing return não deve ser um erro" + ); + assert!( + diags.iter().any(|d| matches!( + d, + Diagnostic::Warning(crate::common::errors::types::CompilerWarning::Semantic(w)) + if matches!(&w.kind, SemanticWarningKind::MissingReturn(n) if n == "f") + )), + "função int sem return deve emitir MissingReturn" + ); + } + + #[test] + fn return_void_fn_no_return_is_ok() { + // void f() { } → sem aviso de missing return + let prog = Program { + decls: vec![Decl::Function( + qty(Type::Void), + "f".into(), + vec![], + vec![], + span(), + )], + }; + let diags = analyse(&prog); + assert!( + !diags.iter().any(|d| matches!( + d, + Diagnostic::Warning(crate::common::errors::types::CompilerWarning::Semantic(w)) + if matches!(&w.kind, SemanticWarningKind::MissingReturn(_)) + )), + "função void sem return não deve emitir MissingReturn" + ); + } + + #[test] + fn return_with_explicit_return_suppresses_missing_return_warning() { + // int f() { return 0; } → sem aviso + let prog = Program { + decls: vec![Decl::Function( + qty(Type::Int), + "f".into(), + vec![], + vec![Stmt::Return(Some(int_lit(0)), span())], + span(), + )], + }; + let diags = analyse(&prog); + assert!( + !diags.iter().any(|d| matches!( + d, + Diagnostic::Warning(crate::common::errors::types::CompilerWarning::Semantic(w)) + if matches!(&w.kind, SemanticWarningKind::MissingReturn(_)) + )), + "função com return explícito não deve emitir MissingReturn" + ); + } + + // ── switch: tipo do discriminante ───────────────────────────────────────── + + #[test] + fn switch_int_expr_is_ok() { + let prog = program(vec![Stmt::Switch(int_lit(1), vec![], span())]); + assert!( + errors(&analyse(&prog)).is_empty(), + "switch(int) deve ser válido" + ); + } + + #[test] + fn switch_float_expr_emits_invalid_switch_type() { + let prog = program(vec![Stmt::Switch( + Expr::Literal(Literal::Double(1.5), span()), + vec![], + span(), + )]); + let diags = analyse(&prog); + assert!( + errors(&diags).iter().any(|e| matches!( + e, + CompilerError::Semantic(se) + if matches!(&se.kind, SemanticErrorKind::InvalidSwitchType { .. }) + )), + "switch(double) deve emitir InvalidSwitchType" + ); + } + + #[test] + fn switch_case_exprs_are_analysed() { + // switch(x) { case y: break; } onde y é indefinido → UndefinedVariable + let prog = program(vec![ + Stmt::VarDecl(qty(Type::Int), "x".into(), Some(int_lit(0)), span()), + Stmt::Switch( + ident("x"), + vec![crate::common::ast::stmt::SwitchCase { + label: crate::common::ast::stmt::SwitchLabel::Case(ident("y_undefined")), + stmts: vec![Stmt::Break(span())], + span: span(), + }], + span(), + ), + ]); + let diags = analyse(&prog); + assert!( + errors(&diags).iter().any(|e| matches!( + e, + CompilerError::Semantic(se) + if matches!(&se.kind, SemanticErrorKind::UndefinedVariable(n) if n == "y_undefined") + )), + "expressão de case indefinida deve emitir UndefinedVariable" + ); + } + + // ── break / continue fora de contexto ──────────────────────────────────── + + #[test] + fn break_inside_while_is_ok() { + let prog = program(vec![Stmt::While( + int_lit(1), + Box::new(Stmt::Break(span())), + span(), + )]); + assert!( + errors(&analyse(&prog)).is_empty(), + "break dentro de while deve ser válido" + ); + } + + #[test] + fn break_inside_switch_is_ok() { + let prog = program(vec![Stmt::Switch( + int_lit(0), + vec![crate::common::ast::stmt::SwitchCase { + label: crate::common::ast::stmt::SwitchLabel::Default, + stmts: vec![Stmt::Break(span())], + span: span(), + }], + span(), + )]); + assert!( + errors(&analyse(&prog)).is_empty(), + "break dentro de switch deve ser válido" + ); + } + + #[test] + fn break_outside_loop_emits_error() { + let prog = program(vec![Stmt::Break(span())]); + let diags = analyse(&prog); + assert!( + errors(&diags).iter().any(|e| matches!( + e, + CompilerError::Semantic(se) + if matches!(&se.kind, SemanticErrorKind::BreakOutsideLoop) + )), + "break fora de loop/switch deve emitir BreakOutsideLoop" + ); + } + + #[test] + fn continue_inside_for_is_ok() { + let prog = program(vec![Stmt::For( + None, + Some(int_lit(1)), + None, + Box::new(Stmt::Continue(span())), + span(), + )]); + assert!( + errors(&analyse(&prog)).is_empty(), + "continue dentro de for deve ser válido" + ); + } + + #[test] + fn continue_outside_loop_emits_error() { + let prog = program(vec![Stmt::Continue(span())]); + let diags = analyse(&prog); + assert!( + errors(&diags).iter().any(|e| matches!( + e, + CompilerError::Semantic(se) + if matches!(&se.kind, SemanticErrorKind::ContinueOutsideLoop) + )), + "continue fora de loop deve emitir ContinueOutsideLoop" + ); + } + + #[test] + fn continue_inside_switch_emits_error() { + // `continue` dentro de switch sem loop externo é inválido em C + let prog = program(vec![Stmt::Switch( + int_lit(0), + vec![crate::common::ast::stmt::SwitchCase { + label: crate::common::ast::stmt::SwitchLabel::Default, + stmts: vec![Stmt::Continue(span())], + span: span(), + }], + span(), + )]); + let diags = analyse(&prog); + assert!( + errors(&diags).iter().any(|e| matches!( + e, + CompilerError::Semantic(se) + if matches!(&se.kind, SemanticErrorKind::ContinueOutsideLoop) + )), + "continue dentro de switch (sem loop externo) deve emitir ContinueOutsideLoop" + ); + } + + // ── VarDecl local: verificação de tipo do inicializador ─────────────────── + + #[test] + fn local_var_init_type_mismatch_emits_error() { + // int x = "hello"; → TypeMismatch + let prog = program(vec![Stmt::VarDecl( + qty(Type::Int), + "x".into(), + Some(Expr::Literal(Literal::String("hello".into()), span())), + span(), + )]); + let diags = analyse(&prog); + assert!( + errors(&diags).iter().any(|e| matches!( + e, + CompilerError::Semantic(se) + if matches!(&se.kind, SemanticErrorKind::TypeMismatch { .. }) + )), + "int x = string deve emitir TypeMismatch" + ); + } + + #[test] + fn local_var_init_numeric_coercion_is_ok() { + // int x = 1.5; → coerção double→int, válido em C + let prog = program(vec![Stmt::VarDecl( + qty(Type::Int), + "x".into(), + Some(Expr::Literal(Literal::Double(1.5), span())), + span(), + )]); + assert!( + errors(&analyse(&prog)).is_empty(), + "int x = 1.5 deve ser válido (coerção numérica implícita)" + ); + } } diff --git a/src/tests/tac_test.rs b/src/tests/tac_test.rs new file mode 100644 index 0000000..a200e43 --- /dev/null +++ b/src/tests/tac_test.rs @@ -0,0 +1,32 @@ +use crate::common::ast::ast::Type; +use crate::common::ast::expr::BinOp; +use crate::ir::tac::{LabelGen, LabelId, Operand, TacInstr, TempGen, TempId}; + +#[test] +fn tac_instr_display_binop() { + let instr = TacInstr::BinOp { + dst: TempId(0), + op: BinOp::Add, + lhs: Operand::Temp(TempId(1)), + rhs: Operand::Temp(TempId(2)), + ty: Type::Int, + }; + + assert_eq!(instr.to_string(), "t0 = t1 + t2"); +} + +#[test] +fn temp_gen_increments() { + let mut gen = TempGen::new(); + + assert_eq!(gen.fresh(), TempId(0)); + assert_eq!(gen.fresh(), TempId(1)); +} + +#[test] +fn label_gen_unique() { + let mut gen = LabelGen::new(); + + assert_eq!(gen.fresh(), LabelId(0)); + assert_eq!(gen.fresh(), LabelId(1)); +} diff --git a/tests/codegen_smoke.rs b/tests/codegen_smoke.rs new file mode 100644 index 0000000..fb2af3f --- /dev/null +++ b/tests/codegen_smoke.rs @@ -0,0 +1,357 @@ +//! Testes de smoke do backend x86-64. +//! +//! Estes testes montam e (quando possivel) executam a saida do codegen com o +//! `gcc` do sistema para garantir que o assembly gerado nao apenas e +//! sintaticamente valido, mas tambem produz o resultado esperado em tempo de +//! execucao. Se o `gcc` nao estiver disponivel no ambiente, os testes sao +//! ignorados (skip) em vez de falhar. + +#![cfg_attr(not(unix), allow(unused_variables))] + +use std::path::PathBuf; +use std::process::Command; + +use crusty::codegen::last::emit_program; +use crusty::common::ast::ast::Type; +use crusty::common::ast::expr::BinOp; +use crusty::ir::tac::{ConstValue, LabelId, Operand, TacFunction, TacInstr, TacProgram, TempId}; + +fn gcc_available() -> bool { + Command::new("gcc") + .arg("--version") + .output() + .map(|o| o.status.success()) + .unwrap_or(false) +} + +/// Ignora o teste quando nao ha `gcc` no ambiente. +macro_rules! require_gcc { + () => { + if !gcc_available() { + eprintln!("gcc indisponivel: pulando teste de smoke"); + return; + } + }; +} + +fn build_soma_program() -> TacProgram { + let soma = TacFunction { + name: "soma".to_string(), + params: vec!["a".to_string(), "b".to_string()], + instrs: vec![ + TacInstr::BinOp { + dst: TempId(0), + op: BinOp::Add, + lhs: Operand::Var("a".to_string()), + rhs: Operand::Var("b".to_string()), + ty: Type::Int, + }, + TacInstr::Return { + val: Some(Operand::Temp(TempId(0))), + ty: None, + }, + ], + var_sizes: Default::default(), + }; + + let main = TacFunction { + name: "main".to_string(), + params: Vec::new(), + instrs: vec![ + TacInstr::Call { + dst: Some(TempId(0)), + fn_name: "soma".to_string(), + args: vec![ + Operand::Const(ConstValue::Int(2)), + Operand::Const(ConstValue::Int(3)), + ], + }, + TacInstr::Return { + val: Some(Operand::Temp(TempId(0))), + ty: None, + }, + ], + var_sizes: Default::default(), + }; + + TacProgram { + globals: Vec::new(), + functions: vec![soma, main], + } +} + +fn write_temp_source(name: &str, contents: &str) -> PathBuf { + let mut path = std::env::temp_dir(); + path.push(format!("crusty_smoke_{name}_{}.s", std::process::id())); + std::fs::write(&path, contents).expect("falha ao escrever arquivo .s temporario"); + path +} + +#[test] +fn smoke_assembles_with_gcc() { + require_gcc!(); + + let asm = emit_program(&build_soma_program()).unwrap(); + let source = write_temp_source("assemble", &asm); + let object = source.with_extension("o"); + + let status = Command::new("gcc") + .args(["-c", "-x", "assembler-with-cpp"]) + .arg(&source) + .arg("-o") + .arg(&object) + .status() + .expect("falha ao invocar gcc"); + + assert!( + status.success(), + "gcc nao conseguiu montar a saida do codegen" + ); + + let _ = std::fs::remove_file(&source); + let _ = std::fs::remove_file(&object); +} + +#[test] +fn smoke_links_and_runs_with_expected_exit_code() { + require_gcc!(); + + let asm = emit_program(&build_soma_program()).unwrap(); + let source = write_temp_source("run", &asm); + let exe = source.with_extension("bin"); + + let link = Command::new("gcc") + .arg(&source) + .arg("-o") + .arg(&exe) + .status() + .expect("falha ao invocar gcc"); + assert!( + link.success(), + "gcc nao conseguiu linkar a saida do codegen" + ); + + let exit = Command::new(&exe) + .status() + .expect("falha ao executar o binario gerado"); + + // soma(2, 3) == 5: o valor de retorno de `main` vira o exit code. + #[cfg(unix)] + assert_eq!(exit.code(), Some(5), "esperava exit code 5 (soma de 2 + 3)"); + + let _ = std::fs::remove_file(&source); + let _ = std::fs::remove_file(&exe); +} + +#[test] +fn smoke_simple_return_const_runs() { + require_gcc!(); + + let prog = TacProgram { + globals: Vec::new(), + functions: vec![TacFunction { + name: "main".to_string(), + params: Vec::new(), + instrs: vec![TacInstr::Return { + val: Some(Operand::Const(ConstValue::Int(42))), + ty: None, + }], + var_sizes: Default::default(), + }], + }; + + let asm = emit_program(&prog).unwrap(); + let source = write_temp_source("const", &asm); + let exe = source.with_extension("bin"); + + let link = Command::new("gcc") + .arg(&source) + .arg("-o") + .arg(&exe) + .status() + .expect("falha ao invocar gcc"); + assert!(link.success()); + + let exit = Command::new(&exe).status().expect("falha ao executar"); + + #[cfg(unix)] + assert_eq!(exit.code(), Some(42)); + + let _ = std::fs::remove_file(&source); + let _ = std::fs::remove_file(&exe); +} + +#[test] +fn smoke_call_with_more_than_six_args_runs() { + require_gcc!(); + + // sum9(1..9) = 45, exercitando 3 argumentos passados pela stack (alem + // dos 6 em registrador) e o padding de alinhamento de 8 bytes. + let sum9 = TacFunction { + name: "sum9".to_string(), + params: (1..=9).map(|n| format!("a{n}")).collect(), + instrs: vec![ + TacInstr::BinOp { + dst: TempId(0), + op: BinOp::Add, + lhs: Operand::Var("a1".to_string()), + rhs: Operand::Var("a2".to_string()), + ty: Type::Int, + }, + TacInstr::BinOp { + dst: TempId(1), + op: BinOp::Add, + lhs: Operand::Temp(TempId(0)), + rhs: Operand::Var("a3".to_string()), + ty: Type::Int, + }, + TacInstr::BinOp { + dst: TempId(2), + op: BinOp::Add, + lhs: Operand::Temp(TempId(1)), + rhs: Operand::Var("a4".to_string()), + ty: Type::Int, + }, + TacInstr::BinOp { + dst: TempId(3), + op: BinOp::Add, + lhs: Operand::Temp(TempId(2)), + rhs: Operand::Var("a5".to_string()), + ty: Type::Int, + }, + TacInstr::BinOp { + dst: TempId(4), + op: BinOp::Add, + lhs: Operand::Temp(TempId(3)), + rhs: Operand::Var("a6".to_string()), + ty: Type::Int, + }, + TacInstr::BinOp { + dst: TempId(5), + op: BinOp::Add, + lhs: Operand::Temp(TempId(4)), + rhs: Operand::Var("a7".to_string()), + ty: Type::Int, + }, + TacInstr::BinOp { + dst: TempId(6), + op: BinOp::Add, + lhs: Operand::Temp(TempId(5)), + rhs: Operand::Var("a8".to_string()), + ty: Type::Int, + }, + TacInstr::BinOp { + dst: TempId(7), + op: BinOp::Add, + lhs: Operand::Temp(TempId(6)), + rhs: Operand::Var("a9".to_string()), + ty: Type::Int, + }, + TacInstr::Return { + val: Some(Operand::Temp(TempId(7))), + ty: None, + }, + ], + var_sizes: Default::default(), + }; + + let main = TacFunction { + name: "main".to_string(), + params: Vec::new(), + instrs: vec![ + TacInstr::Call { + dst: Some(TempId(0)), + fn_name: "sum9".to_string(), + args: (1..=9) + .map(|n| Operand::Const(ConstValue::Int(n))) + .collect(), + }, + TacInstr::Return { + val: Some(Operand::Temp(TempId(0))), + ty: None, + }, + ], + var_sizes: Default::default(), + }; + + let prog = TacProgram { + globals: Vec::new(), + functions: vec![sum9, main], + }; + + let asm = emit_program(&prog).unwrap(); + let source = write_temp_source("manyargs", &asm); + let exe = source.with_extension("bin"); + + let link = Command::new("gcc") + .arg(&source) + .arg("-o") + .arg(&exe) + .status() + .expect("falha ao invocar gcc"); + assert!(link.success()); + + let exit = Command::new(&exe).status().expect("falha ao executar"); + + #[cfg(unix)] + assert_eq!( + exit.code(), + Some(45), + "esperava exit code 45 (soma de 1..=9)" + ); + + let _ = std::fs::remove_file(&source); + let _ = std::fs::remove_file(&exe); +} + +#[test] +fn smoke_control_flow_if_else_runs() { + require_gcc!(); + + // main: if (1) return 10; else return 20; -> espera-se 10. + let prog = TacProgram { + globals: Vec::new(), + functions: vec![TacFunction { + name: "main".to_string(), + params: Vec::new(), + instrs: vec![ + TacInstr::CondJump { + cond: Operand::Const(ConstValue::Int(1)), + then_label: LabelId(0), + else_label: LabelId(1), + }, + TacInstr::Label(LabelId(0)), + TacInstr::Return { + val: Some(Operand::Const(ConstValue::Int(10))), + ty: None, + }, + TacInstr::Label(LabelId(1)), + TacInstr::Return { + val: Some(Operand::Const(ConstValue::Int(20))), + ty: None, + }, + ], + var_sizes: Default::default(), + }], + }; + + let asm = emit_program(&prog).unwrap(); + let source = write_temp_source("ifelse", &asm); + let exe = source.with_extension("bin"); + + let link = Command::new("gcc") + .arg(&source) + .arg("-o") + .arg(&exe) + .status() + .expect("falha ao invocar gcc"); + assert!(link.success()); + + let exit = Command::new(&exe).status().expect("falha ao executar"); + + #[cfg(unix)] + assert_eq!(exit.code(), Some(10)); + + let _ = std::fs::remove_file(&source); + let _ = std::fs::remove_file(&exe); +} diff --git a/tests/double_codegen_test.rs b/tests/double_codegen_test.rs new file mode 100644 index 0000000..b0b00f0 --- /dev/null +++ b/tests/double_codegen_test.rs @@ -0,0 +1,332 @@ +//! Smoke tests ponta-a-ponta para o codegen x86-64 de `double` (issue #172). +//! +//! Escopo coberto, conforme delimitado na issue: literais `double`, +//! variaveis locais, aritmetica basica (`+ - * /`), comparacoes e `return`. +//! Argumentos/parametros e retorno de `double` por uma funcao chamada pelo +//! proprio codegen (em vez de por um `main` escrito a mao) permanecem fora +//! de escopo, pois exigiriam estender `codegen/last/abi.rs` para cobrir +//! `xmm0..xmm7` na convencao de chamada — deixado para uma proxima etapa. +//! +//! Como nem toda expressao com `double` pode ser verificada via exit code +//! (o valor de retorno de um processo e sempre truncado a um inteiro de 8 +//! bits), os testes usam duas estrategias: +//! - quando o resultado observavel e naturalmente inteiro (comparacoes, que +//! este backend ja normaliza para 0/1 em `%rax`), o programa inteiro e +//! gerado por este compilador e o exit code e verificado diretamente; +//! - quando o resultado e um `double` em si (ex.: o proprio criterio de +//! aceite da issue, `double x = 1.5; return x + 2.5;`), apenas a funcao +//! `double` e gerada por este compilador (`--emit=obj`); um pequeno +//! programa C convencional, compilado pelo `gcc` do sistema, chama essa +//! funcao e verifica o resultado — exercitando o lado "callee" da ABI +//! (retorno em `%xmm0`) sem depender do lado "caller" deste backend. +//! +//! Se `gcc` nao estiver disponivel no ambiente, os testes sao ignorados +//! (skip) em vez de falhar, espelhando `tests/exe_smoke_test.rs`. + +#![cfg_attr(not(unix), allow(unused_variables))] + +use std::path::PathBuf; +use std::process::{Command, ExitStatus}; + +use crusty::analyser::analyse_with_builtins; +use crusty::codegen::last::emit_program; +use crusty::common::input::source::SourceFile; +use crusty::ir::lower::lower_program; +use crusty::lexer::scanner::Scanner; +use crusty::parser::Parser; + +fn gcc_available() -> bool { + Command::new("gcc") + .arg("--version") + .output() + .map(|o| o.status.success()) + .unwrap_or(false) +} + +/// Ignora o teste quando nao ha `gcc` no ambiente. +macro_rules! require_gcc { + () => { + if !gcc_available() { + eprintln!("gcc indisponivel: pulando teste de smoke"); + return; + } + }; +} + +/// Roda o pipeline completo (lexer -> parser -> semantic -> IR -> codegen) +/// sobre `source` e retorna o assembly x86-64 gerado. Falha o teste (panic) +/// se qualquer estagio reportar diagnosticos, ja que os fixtures usados aqui +/// sao sempre programas C validos. +fn compile_to_asm(source: &str) -> String { + let mut scanner = Scanner::new(SourceFile::from_string(source)); + scanner.scan(); + assert!( + scanner.diagnostics.is_empty(), + "erros de lexer inesperados: {:?}", + scanner.diagnostics + ); + + let mut parser = Parser::new(scanner.tokens); + let program = parser + .parse_program() + .unwrap_or_else(|errors| panic!("erros de parser inesperados: {errors:?}")); + + let sem_diagnostics = analyse_with_builtins(&program, scanner.builtins); + let sem_errors: Vec<_> = sem_diagnostics.iter().filter(|d| d.is_error()).collect(); + assert!( + sem_errors.is_empty(), + "erros semanticos inesperados: {sem_errors:?}" + ); + + let tac_program = lower_program(&program).unwrap(); + emit_program(&tac_program).unwrap() +} + +/// Compila `source` (C) ate um executavel real via `gcc` e o executa, +/// retornando o `ExitStatus` do processo filho. Limpa os arquivos +/// temporarios (.s e binario) ao final. +fn compile_and_run(name: &str, source: &str) -> ExitStatus { + let asm = compile_to_asm(source); + + let mut asm_path = std::env::temp_dir(); + asm_path.push(format!( + "crusty_double_smoke_{name}_{}.s", + std::process::id() + )); + std::fs::write(&asm_path, asm).expect("falha ao escrever .s temporario"); + let exe_path: PathBuf = asm_path.with_extension("bin"); + + let link = Command::new("gcc") + .arg(&asm_path) + .arg("-o") + .arg(&exe_path) + .status() + .expect("falha ao invocar gcc"); + assert!( + link.success(), + "gcc nao conseguiu linkar a saida do codegen" + ); + + let status = Command::new(&exe_path) + .status() + .expect("falha ao executar o binario gerado"); + + let _ = std::fs::remove_file(&asm_path); + let _ = std::fs::remove_file(&exe_path); + + status +} + +/// Compila uma funcao `double` isolada (`source`) com `--emit=obj`-equivalente +/// (aqui via `emit_program`, montado a `.o` pelo `gcc`), e linka com um +/// pequeno harness C escrito a mao que chama `compute()` e verifica o +/// resultado. Retorna o `ExitStatus` do harness. +fn compile_double_fn_and_check(name: &str, source: &str, harness_body: &str) -> ExitStatus { + let asm = compile_to_asm(source); + + let dir = std::env::temp_dir(); + let asm_path = dir.join(format!("crusty_double_fn_{name}_{}.s", std::process::id())); + let obj_path = dir.join(format!("crusty_double_fn_{name}_{}.o", std::process::id())); + let harness_path = dir.join(format!( + "crusty_double_harness_{name}_{}.c", + std::process::id() + )); + let exe_path = dir.join(format!( + "crusty_double_fn_{name}_{}.bin", + std::process::id() + )); + + std::fs::write(&asm_path, asm).expect("falha ao escrever .s temporario"); + + let assemble = Command::new("gcc") + .arg("-c") + .arg(&asm_path) + .arg("-o") + .arg(&obj_path) + .status() + .expect("falha ao invocar gcc -c"); + assert!( + assemble.success(), + "gcc nao conseguiu montar a saida do codegen" + ); + + std::fs::write(&harness_path, harness_body).expect("falha ao escrever harness .c"); + + let link = Command::new("gcc") + .arg(&harness_path) + .arg(&obj_path) + .arg("-o") + .arg(&exe_path) + .status() + .expect("falha ao invocar gcc para linkar harness + objeto"); + assert!( + link.success(), + "gcc nao conseguiu linkar o harness com o objeto gerado pelo codegen" + ); + + let status = Command::new(&exe_path) + .status() + .expect("falha ao executar o binario gerado"); + + let _ = std::fs::remove_file(&asm_path); + let _ = std::fs::remove_file(&obj_path); + let _ = std::fs::remove_file(&harness_path); + let _ = std::fs::remove_file(&exe_path); + + status +} + +/// Criterio de aceite da issue #172: `double x = 1.5; return x + 2.5;` +/// compila e roda via gcc com resultado correto (4.0). +#[test] +fn smoke_double_literal_local_and_addition_returns_correct_value() { + require_gcc!(); + + let status = compile_double_fn_and_check( + "literal_add", + "double compute() { double x = 1.5; return x + 2.5; }", + r#" + double compute(void); + int main() { + double r = compute(); + return (r == 4.0) ? 0 : 1; + } + "#, + ); + + #[cfg(unix)] + assert_eq!(status.code(), Some(0)); +} + +#[test] +fn smoke_double_subtraction_multiplication_division_return_correct_values() { + require_gcc!(); + + let status = compile_double_fn_and_check( + "arith", + "double compute() { \ + double a = 10.0; \ + double b = 4.0; \ + double sub = a - b; \ + double mul = sub * 2.0; \ + double div = mul / 3.0; \ + return div; \ + }", + r#" + double compute(void); + int main() { + double r = compute(); + return (r == 4.0) ? 0 : 1; + } + "#, + ); + + #[cfg(unix)] + assert_eq!(status.code(), Some(0)); +} + +#[test] +fn smoke_double_multiple_locals_runs() { + require_gcc!(); + + let status = compile_double_fn_and_check( + "multi_locals", + "double compute() { \ + double a = 1.5; \ + double b = 2.25; \ + double c = 0.25; \ + return a + b + c; \ + }", + r#" + double compute(void); + int main() { + double r = compute(); + return (r == 4.0) ? 0 : 1; + } + "#, + ); + + #[cfg(unix)] + assert_eq!(status.code(), Some(0)); +} + +/// Comparacoes entre `double` ja produzem um resultado inteiro (0/1) neste +/// backend, entao podem ser verificadas direto pelo exit code de um `int +/// main()` gerado integralmente por este compilador, sem depender de +/// conversao double<->int (ainda nao suportada). +#[test] +fn smoke_double_less_than_comparison_runs() { + require_gcc!(); + + let status = compile_and_run( + "double_less_than", + "int main() { \ + double a = 1.5; \ + double b = 2.5; \ + if (a < b) { return 1; } \ + return 0; \ + }", + ); + + #[cfg(unix)] + assert_eq!(status.code(), Some(1)); +} + +#[test] +fn smoke_double_equality_after_addition_runs() { + require_gcc!(); + + let status = compile_and_run( + "double_eq_after_add", + "int main() { \ + double a = 1.5; \ + double b = 2.5; \ + double c = a + b; \ + if (c == 4.0) { return 1; } \ + return 0; \ + }", + ); + + #[cfg(unix)] + assert_eq!(status.code(), Some(1)); +} + +#[test] +fn smoke_double_greater_or_equal_false_branch_runs() { + require_gcc!(); + + let status = compile_and_run( + "double_geq_false", + "int main() { \ + double a = 1.0; \ + double b = 9.0; \ + if (a >= b) { return 1; } \ + return 0; \ + }", + ); + + #[cfg(unix)] + assert_eq!(status.code(), Some(0)); +} + +/// Garante que o assembly gerado para `double` realmente usa o caminho de +/// ponto flutuante (registradores `xmm`/instrucoes `sd`) em vez do caminho +/// inteiro (`rax`/`rcx`), o que a issue #172 identificou como o gap real do +/// backend antes desta feature. +#[test] +fn double_codegen_emits_xmm_instructions() { + let asm = compile_to_asm("double compute() { double x = 1.5; return x + 2.5; }"); + + assert!( + asm.contains("movsd"), + "esperado mov de double via movsd no assembly gerado:\n{asm}" + ); + assert!( + asm.contains("addsd"), + "esperada soma de double via addsd no assembly gerado:\n{asm}" + ); + assert!( + asm.contains(".double"), + "esperada diretiva .double para os literais double na .rodata:\n{asm}" + ); +} diff --git a/tests/exe_smoke_test.rs b/tests/exe_smoke_test.rs new file mode 100644 index 0000000..8173814 --- /dev/null +++ b/tests/exe_smoke_test.rs @@ -0,0 +1,541 @@ +//! Smoke tests ponta-a-ponta da issue #130: fonte C real passa por todo o +//! pipeline (lexer -> parser -> semantic -> IR -> codegen x86-64) e o +//! assembly resultante e montado/linkado com `gcc` em um executavel ELF +//! real, que e entao executado para checar o exit code. +//! +//! Diferente de `tests/codegen_smoke.rs` (que monta `TacProgram` a mao), +//! aqui o ponto de entrada e codigo-fonte C, exercitando o front-end +//! completo. Se `gcc` nao estiver disponivel no ambiente, os testes sao +//! ignorados (skip) em vez de falhar. + +#![cfg_attr(not(unix), allow(unused_variables))] + +use std::path::PathBuf; +use std::process::{Command, ExitStatus}; + +use crusty::analyser::analyse_with_builtins; +use crusty::codegen::last::emit_program; +use crusty::common::input::source::SourceFile; +use crusty::ir::lower::lower_program; +use crusty::lexer::scanner::Scanner; +use crusty::parser::Parser; + +fn gcc_available() -> bool { + Command::new("gcc") + .arg("--version") + .output() + .map(|o| o.status.success()) + .unwrap_or(false) +} + +/// Ignora o teste quando nao ha `gcc` no ambiente. +macro_rules! require_gcc { + () => { + if !gcc_available() { + eprintln!("gcc indisponivel: pulando teste de smoke"); + return; + } + }; +} + +/// Roda o pipeline completo (lexer -> parser -> semantic -> IR -> codegen) +/// sobre `source` e retorna o assembly x86-64 gerado. Falha o teste (panic) +/// se qualquer estagio reportar diagnosticos, ja que os fixtures usados +/// aqui sao sempre programas C validos. +fn compile_to_asm(source: &str) -> String { + let mut scanner = Scanner::new(SourceFile::from_string(source)); + scanner.scan(); + assert!( + scanner.diagnostics.is_empty(), + "erros de lexer inesperados: {:?}", + scanner.diagnostics + ); + + let mut parser = Parser::new(scanner.tokens); + let program = parser + .parse_program() + .unwrap_or_else(|errors| panic!("erros de parser inesperados: {errors:?}")); + + let sem_diagnostics = analyse_with_builtins(&program, scanner.builtins); + let sem_errors: Vec<_> = sem_diagnostics.iter().filter(|d| d.is_error()).collect(); + assert!( + sem_errors.is_empty(), + "erros semanticos inesperados: {sem_errors:?}" + ); + + let tac_program = lower_program(&program).unwrap(); + emit_program(&tac_program).unwrap() +} + +/// Compila `source` (C) ate um executavel real via `gcc` e o executa, +/// retornando o `ExitStatus` do processo filho. Limpa os arquivos +/// temporarios (.s e binario) ao final. +fn compile_and_run(name: &str, source: &str) -> ExitStatus { + let asm = compile_to_asm(source); + + let mut asm_path = std::env::temp_dir(); + asm_path.push(format!("crusty_exe_smoke_{name}_{}.s", std::process::id())); + std::fs::write(&asm_path, asm).expect("falha ao escrever .s temporario"); + let exe_path: PathBuf = asm_path.with_extension("bin"); + + let link = Command::new("gcc") + .arg(&asm_path) + .arg("-o") + .arg(&exe_path) + .status() + .expect("falha ao invocar gcc"); + assert!( + link.success(), + "gcc nao conseguiu linkar a saida do codegen" + ); + + let status = Command::new(&exe_path) + .status() + .expect("falha ao executar o binario gerado"); + + let _ = std::fs::remove_file(&asm_path); + let _ = std::fs::remove_file(&exe_path); + + status +} + +#[test] +fn smoke_minimal_program_exit_0() { + require_gcc!(); + + let status = compile_and_run("exit0", "int main() { return 0; }"); + + #[cfg(unix)] + assert_eq!(status.code(), Some(0)); +} + +#[test] +fn smoke_minimal_program_exit_42() { + require_gcc!(); + + let status = compile_and_run("exit42", "int main() { return 42; }"); + + #[cfg(unix)] + assert_eq!(status.code(), Some(42)); +} + +#[test] +fn smoke_local_variable_arithmetic_runs() { + require_gcc!(); + + let status = compile_and_run( + "arith", + "int main() { int a = 10; int b = 32; return a + b; }", + ); + + #[cfg(unix)] + assert_eq!(status.code(), Some(42)); +} + +#[test] +fn smoke_function_call_runs() { + require_gcc!(); + + let status = compile_and_run( + "call", + "int soma(int a, int b) { return a + b; } int main() { return soma(19, 23); }", + ); + + #[cfg(unix)] + assert_eq!(status.code(), Some(42)); +} + +#[test] +fn smoke_blocks_and_local_scopes_run() { + require_gcc!(); + + let status = compile_and_run( + "blocks", + "int main() { \ + int total = 0; \ + { int a = 5; total = total + a; } \ + { int b = 7; total = total + b; } \ + return total; \ + }", + ); + + #[cfg(unix)] + assert_eq!(status.code(), Some(12)); +} + +#[test] +fn smoke_if_else_chain_runs() { + require_gcc!(); + + let status = compile_and_run( + "if_else_chain", + "int classify(int n) { \ + if (n < 0) { return -1; } \ + else if (n == 0) { return 0; } \ + else { return 1; } \ + } \ + int main() { return classify(-5) + classify(0) + classify(5) + 10; }", + ); + + #[cfg(unix)] + assert_eq!(status.code(), Some(10)); +} + +#[test] +fn smoke_while_loop_runs() { + require_gcc!(); + + let status = compile_and_run( + "while", + "int main() { \ + int i = 0; \ + int total = 0; \ + while (i < 5) { total = total + i; i = i + 1; } \ + return total; \ + }", + ); + + #[cfg(unix)] + assert_eq!(status.code(), Some(10)); +} + +#[test] +fn smoke_for_loop_runs() { + require_gcc!(); + + let status = compile_and_run( + "for", + "int main() { \ + int total = 0; \ + for (int i = 0; i < 5; i = i + 1) { total = total + i; } \ + return total; \ + }", + ); + + #[cfg(unix)] + assert_eq!(status.code(), Some(10)); +} + +#[test] +fn smoke_do_while_loop_runs() { + require_gcc!(); + + let status = compile_and_run( + "do_while", + "int main() { \ + int i = 0; \ + int total = 0; \ + do { total = total + i; i = i + 1; } while (i < 5); \ + return total; \ + }", + ); + + #[cfg(unix)] + assert_eq!(status.code(), Some(10)); +} + +#[test] +fn smoke_recursive_fibonacci_runs() { + require_gcc!(); + + let status = compile_and_run( + "fib", + "int fib(int n) { \ + if (n <= 1) { return n; } \ + return fib(n - 1) + fib(n - 2); \ + } \ + int main() { return fib(10); }", + ); + + #[cfg(unix)] + assert_eq!(status.code(), Some(55)); +} + +#[test] +fn smoke_switch_with_default_runs() { + require_gcc!(); + + let status = compile_and_run( + "switch_default", + "int classify(int n) { \ + int result = 0; \ + switch (n) { \ + case 1: result = 1; break; \ + case 2: result = 2; break; \ + default: result = -1; break; \ + } \ + return result; \ + } \ + int main() { return classify(1) + classify(2) + classify(9) + 100; }", + ); + + #[cfg(unix)] + assert_eq!(status.code(), Some(102)); +} + +#[test] +fn smoke_address_of_and_deref_read_runs() { + require_gcc!(); + + let status = compile_and_run( + "addrof_deref_read", + "int main() { \ + int x = 21; \ + int *p = &x; \ + int y = *p; \ + return y * 2; \ + }", + ); + + #[cfg(unix)] + assert_eq!(status.code(), Some(42)); +} + +#[test] +fn smoke_deref_assignment_writes_through_pointer() { + require_gcc!(); + + let status = compile_and_run( + "deref_assign", + "int main() { \ + int x = 10; \ + int *p = &x; \ + *p = 20; \ + return x; \ + }", + ); + + #[cfg(unix)] + assert_eq!(status.code(), Some(20)); +} + +#[test] +fn smoke_deref_compound_assign_through_function_param_runs() { + require_gcc!(); + + let status = compile_and_run( + "deref_compound", + "void inc(int *p) { *p = *p + 1; } \ + int main() { \ + int x = 10; \ + inc(&x); \ + inc(&x); \ + return x; \ + }", + ); + + #[cfg(unix)] + assert_eq!(status.code(), Some(12)); +} + +#[test] +fn smoke_deref_increment_operators_run() { + require_gcc!(); + + let status = compile_and_run( + "deref_incr", + "int main() { \ + int x = 5; \ + int *p = &x; \ + (*p)++; \ + *p += 10; \ + return x; \ + }", + ); + + #[cfg(unix)] + assert_eq!(status.code(), Some(16)); +} + +#[test] +fn smoke_pointer_index_read_and_write_runs() { + require_gcc!(); + + let status = compile_and_run( + "pointer_index", + "int sum_via_index(int *p) { \ + p[0] = 10; \ + return p[0] + 5; \ + } \ + int main() { \ + int x = 1; \ + return sum_via_index(&x); \ + }", + ); + + #[cfg(unix)] + assert_eq!(status.code(), Some(15)); +} + +#[test] +fn smoke_fixed_array_index_read_and_write_runs() { + require_gcc!(); + + let status = compile_and_run( + "fixed_array_index", + "int main() { \ + int arr[3]; \ + arr[0] = 1; \ + arr[1] = 2; \ + arr[2] = 3; \ + return arr[0] + arr[1] + arr[2]; \ + }", + ); + + #[cfg(unix)] + assert_eq!(status.code(), Some(6)); +} + +#[test] +fn smoke_struct_member_read_and_write_runs() { + require_gcc!(); + + let status = compile_and_run( + "struct_member", + "struct Point { int x; int y; }; \ + int main() { \ + struct Point p; \ + p.x = 3; \ + p.y = 4; \ + return p.x + p.y; \ + }", + ); + + #[cfg(unix)] + assert_eq!(status.code(), Some(7)); +} + +#[test] +fn smoke_struct_member_via_pointer_arrow_runs() { + require_gcc!(); + + let status = compile_and_run( + "struct_member_arrow", + "struct Point { int x; int y; }; \ + void move_point(struct Point *p, int dx, int dy) { \ + p->x = p->x + dx; \ + p->y = p->y + dy; \ + } \ + int main() { \ + struct Point p; \ + p.x = 1; \ + p.y = 2; \ + move_point(&p, 10, 20); \ + return p.x + p.y; \ + }", + ); + + #[cfg(unix)] + assert_eq!(status.code(), Some(33)); +} + +#[test] +fn smoke_struct_larger_than_eight_bytes_runs() { + require_gcc!(); + + let status = compile_and_run( + "struct_big", + "struct Big { long a; long b; long c; }; \ + int main() { \ + struct Big big; \ + big.a = 1; \ + big.b = 2; \ + big.c = 3; \ + return big.a + big.b + big.c; \ + }", + ); + + #[cfg(unix)] + assert_eq!(status.code(), Some(6)); +} + +#[test] +fn smoke_sizeof_of_variables_runs() { + require_gcc!(); + + let status = compile_and_run( + "sizeof_vars", + "int main() { \ + int x = 7; \ + long l = 3; \ + char c = 'a'; \ + int *p = &x; \ + return sizeof(x) + sizeof(l) + sizeof(c) + sizeof(p); \ + }", + ); + + // sizeof(int) + sizeof(long) + sizeof(char) + sizeof(int*) = 4+8+1+8. + #[cfg(unix)] + assert_eq!(status.code(), Some(21)); +} + +#[test] +fn smoke_switch_fallthrough_runs() { + require_gcc!(); + + let status = compile_and_run( + "switch_fallthrough", + "int main() { \ + int n = 2; \ + int total = 0; \ + switch (n) { \ + case 1: total = total + 1; \ + case 2: total = total + 2; \ + case 3: total = total + 3; break; \ + case 4: total = total + 100; \ + } \ + return total; \ + }", + ); + + #[cfg(unix)] + assert_eq!(status.code(), Some(5)); +} + +#[test] +fn smoke_zero_initialized_global_read_and_write_runs() { + require_gcc!(); + + let status = compile_and_run( + "global_counter", + "int counter; int main(void) { counter = 41; return counter + 1; }", + ); + + assert_eq!(status.code(), Some(42)); +} + +#[test] +fn smoke_constant_initialized_global_and_local_shadowing_run() { + require_gcc!(); + + let status = compile_and_run( + "global_init_shadow", + "int value = 8 * 5; int main(void) { int observed = 0; { int value = 11; observed = value; } return value + observed - 9; }", + ); + + assert_eq!(status.code(), Some(42)); +} + +#[test] +fn smoke_global_struct_fixture_runs() { + require_gcc!(); + + let status = compile_and_run( + "global_struct_fixture", + include_str!("integration/valid/structs.c"), + ); + + assert_eq!(status.code(), Some(1)); +} + +#[test] +fn smoke_global_typedef_struct_fixture_runs() { + require_gcc!(); + + let status = compile_and_run( + "global_typedef_fixture", + include_str!("integration/valid/typedef.c"), + ); + + assert_eq!(status.code(), Some(10)); +} diff --git a/tests/integration/invalid/arity_mismatch.c b/tests/integration/invalid/arity_mismatch.c new file mode 100644 index 0000000..039db0f --- /dev/null +++ b/tests/integration/invalid/arity_mismatch.c @@ -0,0 +1,7 @@ +int add(int a, int b) { + return a + b; +} + +int main(void) { + return add(1); +} diff --git a/tests/integration/invalid/assign_const.c b/tests/integration/invalid/assign_const.c new file mode 100644 index 0000000..f1b4f32 --- /dev/null +++ b/tests/integration/invalid/assign_const.c @@ -0,0 +1,5 @@ +int main(void) { + const int PI = 3; + PI = 4; + return PI; +} diff --git a/tests/integration/invalid/call_non_function.c b/tests/integration/invalid/call_non_function.c new file mode 100644 index 0000000..9dbfaae --- /dev/null +++ b/tests/integration/invalid/call_non_function.c @@ -0,0 +1,8 @@ +void f(void) { + int x = 1; + x(); +} + +int main(void) { + return 0; +} diff --git a/tests/integration/invalid/redeclaration.c b/tests/integration/invalid/redeclaration.c new file mode 100644 index 0000000..4d177ea --- /dev/null +++ b/tests/integration/invalid/redeclaration.c @@ -0,0 +1,5 @@ +int main(void) { + int x = 1; + int x = 2; + return x; +} diff --git a/tests/integration/invalid/return_in_void.c b/tests/integration/invalid/return_in_void.c new file mode 100644 index 0000000..6294c3b --- /dev/null +++ b/tests/integration/invalid/return_in_void.c @@ -0,0 +1,7 @@ +void f(void) { + return 1; +} + +int main(void) { + return 0; +} diff --git a/tests/integration/invalid/type_mismatch_assign.c b/tests/integration/invalid/type_mismatch_assign.c new file mode 100644 index 0000000..dcdda6c --- /dev/null +++ b/tests/integration/invalid/type_mismatch_assign.c @@ -0,0 +1,5 @@ +int main(void) { + int x; + x = "hello"; + return x; +} diff --git a/tests/integration/invalid/undeclared_var.c b/tests/integration/invalid/undeclared_var.c new file mode 100644 index 0000000..d1a1e3c --- /dev/null +++ b/tests/integration/invalid/undeclared_var.c @@ -0,0 +1,4 @@ +int main(void) { + int y = x; + return y; +} diff --git a/tests/integration/valid/arithmetic.c b/tests/integration/valid/arithmetic.c new file mode 100644 index 0000000..6203a16 --- /dev/null +++ b/tests/integration/valid/arithmetic.c @@ -0,0 +1,10 @@ +int main(void) { + int a = 10; + int b = 3; + int sum = a + b; + int diff = a - b; + int prod = a * b; + int quot = a / b; + int rem = a % b; + return sum + diff + prod + quot + rem; +} diff --git a/tests/integration/valid/bool_literals.c b/tests/integration/valid/bool_literals.c new file mode 100644 index 0000000..1141772 --- /dev/null +++ b/tests/integration/valid/bool_literals.c @@ -0,0 +1,9 @@ +#include + +int main() { + bool ok = true; + if (ok) { + return false; + } + return true; +} diff --git a/tests/integration/valid/control_flow.c b/tests/integration/valid/control_flow.c new file mode 100644 index 0000000..35453bf --- /dev/null +++ b/tests/integration/valid/control_flow.c @@ -0,0 +1,30 @@ +int classify(int n) { + int result = 0; + switch (n) { + case 1: + result = 1; + break; + case 2: + result = 2; + break; + default: + result = -1; + break; + } + return result; +} + +int main(void) { + int total = 0; + for (int i = 0; i < 3; i = i + 1) { + total = total + classify(i); + } + while (total > 0) { + total = total - 1; + } + if (total == 0) { + return 0; + } else { + return 1; + } +} diff --git a/tests/integration/valid/enum_usage.c b/tests/integration/valid/enum_usage.c new file mode 100644 index 0000000..63a4801 --- /dev/null +++ b/tests/integration/valid/enum_usage.c @@ -0,0 +1,13 @@ +enum Color { + RED, + GREEN, + BLUE +}; + +int main(void) { + int c = GREEN; + if (c == RED) { + return c; + } + return BLUE; +} diff --git a/tests/integration/valid/fibonacci.c b/tests/integration/valid/fibonacci.c new file mode 100644 index 0000000..d98f31b --- /dev/null +++ b/tests/integration/valid/fibonacci.c @@ -0,0 +1,10 @@ +int fib(int n) { + if (n <= 1) { + return n; + } + return fib(n - 1) + fib(n - 2); +} + +int main(void) { + return fib(10); +} diff --git a/tests/integration/valid/hello_world.c b/tests/integration/valid/hello_world.c new file mode 100644 index 0000000..9b6bdc2 --- /dev/null +++ b/tests/integration/valid/hello_world.c @@ -0,0 +1,3 @@ +int main(void) { + return 0; +} diff --git a/tests/integration/valid/pointers.c b/tests/integration/valid/pointers.c new file mode 100644 index 0000000..413aa19 --- /dev/null +++ b/tests/integration/valid/pointers.c @@ -0,0 +1,10 @@ +int main(void) { + int x = 5; + int *p = &x; + int *q = p; + p[0] = x + 1; + int offset = 1; + int *r = q + offset; + int v = r[0]; + return v + p[0] + offset; +} diff --git a/tests/integration/valid/structs.c b/tests/integration/valid/structs.c new file mode 100644 index 0000000..faa87d7 --- /dev/null +++ b/tests/integration/valid/structs.c @@ -0,0 +1,13 @@ +struct Point { + int x; + int y; +}; + +struct Point origin; + +int main(void) { + origin.x = 0; + origin.y = 0; + origin.x = origin.x + 1; + return origin.x + origin.y; +} diff --git a/tests/integration/valid/typedef.c b/tests/integration/valid/typedef.c new file mode 100644 index 0000000..d32310d --- /dev/null +++ b/tests/integration/valid/typedef.c @@ -0,0 +1,16 @@ +typedef int MyInt; + +struct Point { + int x; + int y; +}; + +typedef struct Point Point; + +Point origin; + +int main(void) { + MyInt a = 5; + origin.x = a; + return origin.x + a; +} diff --git a/tests/integration_test.rs b/tests/integration_test.rs new file mode 100644 index 0000000..c5ef9ac --- /dev/null +++ b/tests/integration_test.rs @@ -0,0 +1,165 @@ +use std::path::PathBuf; + +use crusty::analyser::analyse_with_builtins; +use crusty::common::errors::types::{CompilerError, Diagnostic, SemanticErrorKind}; +use crusty::common::input::source::SourceFile; +use crusty::lexer::scanner::Scanner; +use crusty::parser::Parser; + +struct CompileResult { + diagnostics: Vec, +} + +fn compile_file(rel: &str) -> CompileResult { + let path = PathBuf::from(env!("CARGO_MANIFEST_DIR")).join(rel); + let src = SourceFile::from_path(path) + .unwrap_or_else(|e| panic!("failed to read fixture '{rel}': {:?}", e.to_report())); + + let mut scanner = Scanner::new(src); + scanner.scan(); + let lex_diags = std::mem::take(&mut scanner.diagnostics); + let tokens = std::mem::take(&mut scanner.tokens); + + let mut diagnostics: Vec = lex_diags.into_iter().map(Diagnostic::Error).collect(); + + let mut parser = Parser::new(tokens); + match parser.parse_program() { + Ok(program) => diagnostics.extend(analyse_with_builtins(&program, scanner.builtins)), + Err(parse_errs) => diagnostics.extend(parse_errs.into_iter().map(Diagnostic::Error)), + } + + CompileResult { diagnostics } +} + +fn has_semantic_error(diags: &[Diagnostic], pred: impl Fn(&SemanticErrorKind) -> bool) -> bool { + diags.iter().any(|d| match d { + Diagnostic::Error(CompilerError::Semantic(se)) => pred(&se.kind), + _ => false, + }) +} + +fn compile_valid(name: &str) { + let result = compile_file(&format!("tests/integration/valid/{name}.c")); + assert!( + result.diagnostics.is_empty(), + "valid program '{name}' should produce zero diagnostics, got: {:#?}", + result.diagnostics + ); +} + +fn assert_invalid(name: &str, pred: impl Fn(&SemanticErrorKind) -> bool, label: &str) { + let result = compile_file(&format!("tests/integration/invalid/{name}.c")); + assert!( + has_semantic_error(&result.diagnostics, pred), + "invalid program '{name}' should emit {label}, got: {:#?}", + result.diagnostics + ); +} + +#[test] +fn valid_hello_world() { + compile_valid("hello_world"); +} + +#[test] +fn valid_bool_literals() { + compile_valid("bool_literals"); +} + +#[test] +fn valid_arithmetic() { + compile_valid("arithmetic"); +} + +#[test] +fn valid_fibonacci() { + compile_valid("fibonacci"); +} + +#[test] +fn valid_structs() { + compile_valid("structs"); +} + +#[test] +fn valid_pointers() { + compile_valid("pointers"); +} + +#[test] +fn valid_typedef() { + compile_valid("typedef"); +} + +#[test] +fn valid_enum_usage() { + compile_valid("enum_usage"); +} + +#[test] +fn valid_control_flow() { + compile_valid("control_flow"); +} + +#[test] +fn invalid_undeclared_var_emits_error() { + assert_invalid( + "undeclared_var", + |k| matches!(k, SemanticErrorKind::UndefinedVariable(_)), + "UndefinedVariable", + ); +} + +#[test] +fn invalid_type_mismatch_assign_emits_error() { + assert_invalid( + "type_mismatch_assign", + |k| matches!(k, SemanticErrorKind::TypeMismatch { .. }), + "TypeMismatch", + ); +} + +#[test] +fn invalid_assign_const_emits_error() { + assert_invalid( + "assign_const", + |k| matches!(k, SemanticErrorKind::AssignToConst(_)), + "AssignToConst", + ); +} + +#[test] +fn invalid_redeclaration_emits_error() { + assert_invalid( + "redeclaration", + |k| matches!(k, SemanticErrorKind::Redeclaration(_)), + "Redeclaration", + ); +} + +#[test] +fn invalid_return_in_void_emits_error() { + assert_invalid( + "return_in_void", + |k| matches!(k, SemanticErrorKind::ReturnInVoid), + "ReturnInVoid", + ); +} + +#[test] +fn invalid_arity_mismatch_emits_error() { + assert_invalid( + "arity_mismatch", + |k| matches!(k, SemanticErrorKind::ArityMismatch { .. }), + "ArityMismatch", + ); +} + +#[test] +fn invalid_call_non_function_emits_error() { + assert_invalid( + "call_non_function", + |k| matches!(k, SemanticErrorKind::CallNonFunction(_)), + "CallNonFunction", + ); +} diff --git a/tests/licm_test.rs b/tests/licm_test.rs new file mode 100644 index 0000000..dc0f72a --- /dev/null +++ b/tests/licm_test.rs @@ -0,0 +1,175 @@ +use crusty::codegen::inter::opt::{LoopInvariantCodeMotionPass, OptPass}; +use crusty::codegen::inter::{BasicBlock, BinaryOp, Cfg, Instruction, Value}; + +#[test] +fn test_licm_integration_basic() { + let mut cfg = Cfg::new(); + + // Block 0: entry + let mut entry = BasicBlock::new("entry"); + entry.instructions.push(Instruction::Assign { + dst: "x".to_string(), + value: Value::Int(10), + }); + entry.successors.push(1); + cfg.add_block(entry); + + // Block 1: loop header + let mut header = BasicBlock::new("header"); + header.instructions.push(Instruction::Binary { + dst: "t1".to_string(), + op: BinaryOp::Add, + lhs: Value::Int(2), + rhs: Value::Int(3), + }); + header.instructions.push(Instruction::Binary { + dst: "t2".to_string(), + op: BinaryOp::Add, + lhs: Value::Temp("t1".to_string()), + rhs: Value::Temp("x".to_string()), + }); + header.instructions.push(Instruction::Binary { + dst: "t3".to_string(), + op: BinaryOp::Add, + lhs: Value::Temp("t2".to_string()), + rhs: Value::Temp("y".to_string()), + }); + header.successors.push(2); + header.successors.push(3); + cfg.add_block(header); + + // Block 2: loop body + let mut body = BasicBlock::new("body"); + body.instructions.push(Instruction::Assign { + dst: "y".to_string(), + value: Value::Int(5), + }); + body.successors.push(1); + cfg.add_block(body); + + // Block 3: exit + let mut exit = BasicBlock::new("exit"); + exit.instructions.push(Instruction::Assign { + dst: "res".to_string(), + value: Value::Temp("y".to_string()), + }); + cfg.add_block(exit); + + let pass = LoopInvariantCodeMotionPass; + let mutated = pass.run(&mut cfg); + + assert!(mutated); + assert_eq!(cfg.blocks.len(), 5); + + // Preheader should be at index 1 + assert_eq!(cfg.blocks[1].label, "header_preheader"); + assert_eq!(cfg.blocks[1].instructions.len(), 2); + assert_eq!(cfg.blocks[1].successors, vec![2]); // points to header (now index 2) + + // Entry block should point to preheader (index 1) + assert_eq!(cfg.blocks[0].successors, vec![1]); + + // Loop body should point to header (index 2) + assert_eq!(cfg.blocks[3].successors, vec![2]); +} + +#[test] +fn test_licm_loop_with_invariant() { + let mut cfg = Cfg::new(); + + // Block 0: entry + let mut entry = BasicBlock::new("entry"); + entry.instructions.push(Instruction::Assign { + dst: "a".to_string(), + value: Value::Int(5), + }); + entry.instructions.push(Instruction::Assign { + dst: "b".to_string(), + value: Value::Int(10), + }); + entry.instructions.push(Instruction::Assign { + dst: "i".to_string(), + value: Value::Int(0), + }); + entry.instructions.push(Instruction::Assign { + dst: "result".to_string(), + value: Value::Int(0), + }); + entry.successors.push(1); + cfg.add_block(entry); + + // Block 1: loop header + let mut header = BasicBlock::new("header"); + header.instructions.push(Instruction::Binary { + dst: "t1".to_string(), + op: BinaryOp::Add, + lhs: Value::Temp("a".to_string()), + rhs: Value::Temp("b".to_string()), + }); + header.instructions.push(Instruction::Binary { + dst: "t2".to_string(), + op: BinaryOp::Mul, + lhs: Value::Temp("i".to_string()), + rhs: Value::Temp("t1".to_string()), + }); + header.instructions.push(Instruction::Binary { + dst: "result".to_string(), + op: BinaryOp::Add, + lhs: Value::Temp("result".to_string()), + rhs: Value::Temp("t2".to_string()), + }); + header.instructions.push(Instruction::Binary { + dst: "i".to_string(), + op: BinaryOp::Add, + lhs: Value::Temp("i".to_string()), + rhs: Value::Int(1), + }); + header.successors.push(2); + header.successors.push(3); + cfg.add_block(header); + + // Block 2: loop body/latch + let mut body = BasicBlock::new("body"); + body.instructions.push(Instruction::Nop); + body.successors.push(1); + cfg.add_block(body); + + // Block 3: exit + let mut exit = BasicBlock::new("exit"); + exit.instructions.push(Instruction::Assign { + dst: "res".to_string(), + value: Value::Temp("result".to_string()), + }); + cfg.add_block(exit); + + let pass = LoopInvariantCodeMotionPass; + let mutated = pass.run(&mut cfg); + + assert!(mutated); + assert_eq!(cfg.blocks.len(), 5); + + // Preheader should contain "t1 = a + b" + assert_eq!(cfg.blocks[1].label, "header_preheader"); + assert_eq!(cfg.blocks[1].instructions.len(), 1); + assert_eq!( + cfg.blocks[1].instructions[0], + Instruction::Binary { + dst: "t1".to_string(), + op: BinaryOp::Add, + lhs: Value::Temp("a".to_string()), + rhs: Value::Temp("b".to_string()), + } + ); + + // Header block (index 2) should no longer contain "t1 = a + b" + assert_eq!(cfg.blocks[2].instructions.len(), 3); + assert_eq!( + cfg.blocks[2].instructions[0], + Instruction::Binary { + dst: "t2".to_string(), + op: BinaryOp::Mul, + lhs: Value::Temp("i".to_string()), + rhs: Value::Temp("t1".to_string()), + } + ); +}