From d43fe6980d37e0eec0db2ea7d45592b1b712b98b Mon Sep 17 00:00:00 2001 From: g4titanx Date: Tue, 22 Jul 2025 08:35:28 +0100 Subject: [PATCH 1/7] chore(docs): correct transforms README and include README for api/ --- crates/api/README.md | 3 +++ 1 file changed, 3 insertions(+) create mode 100644 crates/api/README.md diff --git a/crates/api/README.md b/crates/api/README.md new file mode 100644 index 0000000..94cf316 --- /dev/null +++ b/crates/api/README.md @@ -0,0 +1,3 @@ +# azoth-api + +HTTP API server for bytecode obfuscation using Azoth's transformation pipeline. Exposes a REST endpoint at `/obfuscate` that accepts EVM bytecode and returns obfuscated versions with configurable transforms. The API provides detailed metrics including gas cost analysis, size impact, and obfuscation metadata for integration into development workflows and automated deployment pipelines. \ No newline at end of file From a8baed08ab1d8262ec7a49dce9347a716576f0c2 Mon Sep 17 00:00:00 2001 From: g4titanx Date: Tue, 28 Oct 2025 20:44:59 +0100 Subject: [PATCH 2/7] feat(docs): mdbook init --- docs/book.toml | 5 +++++ docs/src/SUMMARY.md | 3 +++ docs/src/chapter_1.md | 1 + 3 files changed, 9 insertions(+) create mode 100644 docs/book.toml create mode 100644 docs/src/SUMMARY.md create mode 100644 docs/src/chapter_1.md diff --git a/docs/book.toml b/docs/book.toml new file mode 100644 index 0000000..7b218da --- /dev/null +++ b/docs/book.toml @@ -0,0 +1,5 @@ +[book] +authors = ["g4titanx"] +language = "en" +src = "src" +title = "Azoth" diff --git a/docs/src/SUMMARY.md b/docs/src/SUMMARY.md new file mode 100644 index 0000000..7390c82 --- /dev/null +++ b/docs/src/SUMMARY.md @@ -0,0 +1,3 @@ +# Summary + +- [Chapter 1](./chapter_1.md) diff --git a/docs/src/chapter_1.md b/docs/src/chapter_1.md new file mode 100644 index 0000000..b743fda --- /dev/null +++ b/docs/src/chapter_1.md @@ -0,0 +1 @@ +# Chapter 1 From c2d33370f963f14e1dd279a7016c03f0738d3518 Mon Sep 17 00:00:00 2001 From: g4titanx Date: Tue, 28 Oct 2025 20:55:00 +0100 Subject: [PATCH 3/7] wip: rm api/ --- crates/api/README.md | 3 --- 1 file changed, 3 deletions(-) delete mode 100644 crates/api/README.md diff --git a/crates/api/README.md b/crates/api/README.md deleted file mode 100644 index 94cf316..0000000 --- a/crates/api/README.md +++ /dev/null @@ -1,3 +0,0 @@ -# azoth-api - -HTTP API server for bytecode obfuscation using Azoth's transformation pipeline. Exposes a REST endpoint at `/obfuscate` that accepts EVM bytecode and returns obfuscated versions with configurable transforms. The API provides detailed metrics including gas cost analysis, size impact, and obfuscation metadata for integration into development workflows and automated deployment pipelines. \ No newline at end of file From 0d7f86f82ca69aff2eba5e67481cef1b7a1905c3 Mon Sep 17 00:00:00 2001 From: g4titanx Date: Tue, 28 Oct 2025 22:22:38 +0100 Subject: [PATCH 4/7] wip: update file structure --- docs/src/SUMMARY.md | 18 +++++++++++++++++- docs/src/analysis.md | 0 docs/src/appendix/README.md | 1 + docs/src/chapter_1.md | 1 - docs/src/cli.md | 0 docs/src/contributing/README.md | 0 docs/src/contributing/architecture.md | 0 docs/src/contributing/bounties.md | 1 + docs/src/contributing/building.md | 1 + docs/src/contributing/shortcuts.md | 0 docs/src/contributing/testing.md | 0 docs/src/core/README.md | 0 docs/src/core/cfg_ir.md | 0 docs/src/core/decoder.md | 0 docs/src/core/detection.md | 0 docs/src/core/encoder.md | 0 docs/src/core/strip.md | 0 docs/src/examples.md | 0 docs/src/formal_verification.md | 0 docs/src/introduction.md | 3 +++ docs/src/transforms/README.md | 0 docs/src/transforms/dispatcher.rs | 0 docs/src/transforms/shuffle.rs | 0 23 files changed, 23 insertions(+), 2 deletions(-) create mode 100644 docs/src/analysis.md create mode 100644 docs/src/appendix/README.md delete mode 100644 docs/src/chapter_1.md create mode 100644 docs/src/cli.md create mode 100644 docs/src/contributing/README.md create mode 100644 docs/src/contributing/architecture.md create mode 100644 docs/src/contributing/bounties.md create mode 100644 docs/src/contributing/building.md create mode 100644 docs/src/contributing/shortcuts.md create mode 100644 docs/src/contributing/testing.md create mode 100644 docs/src/core/README.md create mode 100644 docs/src/core/cfg_ir.md create mode 100644 docs/src/core/decoder.md create mode 100644 docs/src/core/detection.md create mode 100644 docs/src/core/encoder.md create mode 100644 docs/src/core/strip.md create mode 100644 docs/src/examples.md create mode 100644 docs/src/formal_verification.md create mode 100644 docs/src/introduction.md create mode 100644 docs/src/transforms/README.md create mode 100644 docs/src/transforms/dispatcher.rs create mode 100644 docs/src/transforms/shuffle.rs diff --git a/docs/src/SUMMARY.md b/docs/src/SUMMARY.md index 7390c82..154f0dc 100644 --- a/docs/src/SUMMARY.md +++ b/docs/src/SUMMARY.md @@ -1,3 +1,19 @@ # Summary -- [Chapter 1](./chapter_1.md) +- [Introduction](./introduction.md) +- [Core Implementation](./core/README.md) + - [Control Flow Graph Intermediate Representation](./core/cfg_ir.md) + - [Decoder](./core/decoder.md) + - [Detection](./core/detection.md) + - [Encoder](./core/encoder.md) + - [Strip Pipeline](./core/strip.md) +- [Transform Passes](./transforms/README.md) +- [Program Analysis](./analysis.md) +- [Formal Verification](./formal_verification.md) +- [Command Line Interface](./cli.md) +- [Examples](./examples.md) +- [Contributing](./contributing/README.md) + - [Architecture](./contributing/architecture.md) + - [Shortcuts](./contributing/shortcuts.md) + - [Testing](./contributing/testing.md) +- [Appendix](./appendix/README.md) diff --git a/docs/src/analysis.md b/docs/src/analysis.md new file mode 100644 index 0000000..e69de29 diff --git a/docs/src/appendix/README.md b/docs/src/appendix/README.md new file mode 100644 index 0000000..fad5ae4 --- /dev/null +++ b/docs/src/appendix/README.md @@ -0,0 +1 @@ +# Appendix diff --git a/docs/src/chapter_1.md b/docs/src/chapter_1.md deleted file mode 100644 index b743fda..0000000 --- a/docs/src/chapter_1.md +++ /dev/null @@ -1 +0,0 @@ -# Chapter 1 diff --git a/docs/src/cli.md b/docs/src/cli.md new file mode 100644 index 0000000..e69de29 diff --git a/docs/src/contributing/README.md b/docs/src/contributing/README.md new file mode 100644 index 0000000..e69de29 diff --git a/docs/src/contributing/architecture.md b/docs/src/contributing/architecture.md new file mode 100644 index 0000000..e69de29 diff --git a/docs/src/contributing/bounties.md b/docs/src/contributing/bounties.md new file mode 100644 index 0000000..a51ddb7 --- /dev/null +++ b/docs/src/contributing/bounties.md @@ -0,0 +1 @@ +# Bounties diff --git a/docs/src/contributing/building.md b/docs/src/contributing/building.md new file mode 100644 index 0000000..e9a07cf --- /dev/null +++ b/docs/src/contributing/building.md @@ -0,0 +1 @@ +# Building diff --git a/docs/src/contributing/shortcuts.md b/docs/src/contributing/shortcuts.md new file mode 100644 index 0000000..e69de29 diff --git a/docs/src/contributing/testing.md b/docs/src/contributing/testing.md new file mode 100644 index 0000000..e69de29 diff --git a/docs/src/core/README.md b/docs/src/core/README.md new file mode 100644 index 0000000..e69de29 diff --git a/docs/src/core/cfg_ir.md b/docs/src/core/cfg_ir.md new file mode 100644 index 0000000..e69de29 diff --git a/docs/src/core/decoder.md b/docs/src/core/decoder.md new file mode 100644 index 0000000..e69de29 diff --git a/docs/src/core/detection.md b/docs/src/core/detection.md new file mode 100644 index 0000000..e69de29 diff --git a/docs/src/core/encoder.md b/docs/src/core/encoder.md new file mode 100644 index 0000000..e69de29 diff --git a/docs/src/core/strip.md b/docs/src/core/strip.md new file mode 100644 index 0000000..e69de29 diff --git a/docs/src/examples.md b/docs/src/examples.md new file mode 100644 index 0000000..e69de29 diff --git a/docs/src/formal_verification.md b/docs/src/formal_verification.md new file mode 100644 index 0000000..e69de29 diff --git a/docs/src/introduction.md b/docs/src/introduction.md new file mode 100644 index 0000000..7390c82 --- /dev/null +++ b/docs/src/introduction.md @@ -0,0 +1,3 @@ +# Summary + +- [Chapter 1](./chapter_1.md) diff --git a/docs/src/transforms/README.md b/docs/src/transforms/README.md new file mode 100644 index 0000000..e69de29 diff --git a/docs/src/transforms/dispatcher.rs b/docs/src/transforms/dispatcher.rs new file mode 100644 index 0000000..e69de29 diff --git a/docs/src/transforms/shuffle.rs b/docs/src/transforms/shuffle.rs new file mode 100644 index 0000000..e69de29 From db27feff6ad9bc59bd710c2f2e848d0ddc608157 Mon Sep 17 00:00:00 2001 From: g4titanx Date: Tue, 28 Oct 2025 22:23:20 +0100 Subject: [PATCH 5/7] wip: intro. --- docs/src/introduction.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/src/introduction.md b/docs/src/introduction.md index 7390c82..49678e7 100644 --- a/docs/src/introduction.md +++ b/docs/src/introduction.md @@ -1,3 +1,5 @@ -# Summary +Azoth is a deterministic EVM bytecode obfuscator designed to make Mirage execution contracts indistinguishable from ordinary, unverified deployments on Ethereum. The project takes its name from the alchemical "universal solvent", reflecting its goal of transforming bytecode while preserving the intent of the original program. -- [Chapter 1](./chapter_1.md) +The toolchain dissects contract bytecode, reconstructs control-flow, and applies deterministic rewrites that reshape structure without inflating gas usage or breaking deployability. + +Within this documentation you will find guidance on the command-line interface, core architecture, transforms, analysis, and verification systems, and the source for the book lives alongside Azoth on GitHub at https://github.com/MiragePrivacy/azoth/tree/master/docs From 7db34c7ebd55d41f2b001de3ff14780da3178065 Mon Sep 17 00:00:00 2001 From: g4titanx Date: Tue, 28 Oct 2025 22:24:01 +0100 Subject: [PATCH 6/7] wip: update intro. --- docs/src/introduction.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/introduction.md b/docs/src/introduction.md index 49678e7..88a916f 100644 --- a/docs/src/introduction.md +++ b/docs/src/introduction.md @@ -2,4 +2,4 @@ Azoth is a deterministic EVM bytecode obfuscator designed to make Mirage executi The toolchain dissects contract bytecode, reconstructs control-flow, and applies deterministic rewrites that reshape structure without inflating gas usage or breaking deployability. -Within this documentation you will find guidance on the command-line interface, core architecture, transforms, analysis, and verification systems, and the source for the book lives alongside Azoth on GitHub at https://github.com/MiragePrivacy/azoth/tree/master/docs +Within this documentation you will find guidance on the command-line interface, core architecture, transforms, analysis, and verification systems, and the source for the book [lives alongside](https://github.com/MiragePrivacy/azoth/tree/master/docs) Azoth on GitHub From 8277e0a212bf699702db0f4abb073e9d88e49cb4 Mon Sep 17 00:00:00 2001 From: g4titanx Date: Tue, 28 Oct 2025 23:04:59 +0100 Subject: [PATCH 7/7] wip: docs --- docs/src/analysis.md | 19 +++++++++++++++ docs/src/cli.md | 25 ++++++++++++++++++++ docs/src/contributing/README.md | 11 +++++++++ docs/src/contributing/architecture.md | 12 ++++++++++ docs/src/contributing/bounties.md | 9 ++++++++ docs/src/contributing/building.md | 33 +++++++++++++++++++++++++++ docs/src/contributing/shortcuts.md | 10 ++++++++ docs/src/contributing/testing.md | 21 +++++++++++++++++ docs/src/core/README.md | 13 +++++++++++ docs/src/core/cfg_ir.md | 10 ++++++++ docs/src/core/decoder.md | 10 ++++++++ docs/src/core/detection.md | 7 ++++++ docs/src/core/encoder.md | 5 ++++ docs/src/core/strip.md | 5 ++++ docs/src/examples.md | 20 ++++++++++++++++ docs/src/formal_verification.md | 21 +++++++++++++++++ docs/src/transforms/README.md | 27 ++++++++++++++++++++++ 17 files changed, 258 insertions(+) diff --git a/docs/src/analysis.md b/docs/src/analysis.md index e69de29..6622ef2 100644 --- a/docs/src/analysis.md +++ b/docs/src/analysis.md @@ -0,0 +1,19 @@ +# Program Analysis + +`azoth-analysis` computes quantitative metrics before and after each obfuscation pass so we can reason about size, CFG complexity, and stack pressure. Transforms and the CLI use these measurements to decide whether a candidate rewrite is worth keeping. + +## Metrics collected + +`metrics::collect_metrics` consumes a `CfgIrBundle` plus the `CleanReport` from `strip::strip_bytecode` and returns: + +- `byte_len` – length of the cleaned runtime. +- `block_cnt` / `edge_cnt` – number of body blocks and edges in the CFG (entry/exit excluded). +- `max_stack_peak` – maximum recorded stack height across blocks. +- `dom_overlap` – fraction of nodes whose immediate dominator equals their immediate post-dominator (lower overlap ⇒ less linear control flow). +- `potency` – heuristic score derived from block/edge counts and dominator overlap (based on Wroblewski’s potency metric). + +Consumers can compare two metric snapshots via `metrics::compare`, which highlights potency gains while accounting for bytecode growth. The CLI’s `obfuscate` subcommand prints these deltas after each run. + +## Dominator utilities + +The crate also exposes helpers such as `dominator_pairs`, `dom_overlap`, and `max_stack_per_block`. They are useful when you need deeper inspection during transform development or custom acceptance heuristics (e.g., rejecting passes that blow past a stack threshold). diff --git a/docs/src/cli.md b/docs/src/cli.md index e69de29..94019c6 100644 --- a/docs/src/cli.md +++ b/docs/src/cli.md @@ -0,0 +1,25 @@ +# Command Line Interface + +The `azoth` binary (crate `azoth-cli`) exposes the full pipeline from the terminal. Install with `cargo install --path crates/cli` or run in-place via `cargo run -p azoth-cli -- `. + +## Shared conventions + +- Inputs accept either a hex literal (with or without `0x`) or a path to a `.hex`/binary file. +- Output is printed to stdout unless an explicit `--output`/`--emit` flag is provided. +- All commands normalise whitespace and underscores in hex payloads. + +## Subcommands + +- `azoth decode ` + Runs the Heimdall disassembler and prints annotated assembly plus the structured instruction list. Useful for quick inspection or seeding tests. + +- `azoth strip [--raw]` + Removes init code, constructor args, padding, and auxdata. By default it emits a JSON payload mirroring `strip::CleanReport`; pass `--raw` to dump just the cleaned runtime hex. + +- `azoth cfg [--output ]` + Builds the runtime CFG and writes a Graphviz `.dot` representation (stdout by default). Pair with `dot -Tpng` for visualisation. + +- `azoth obfuscate [--seed HEX] [--passes list] [--emit path] [--emit-debug path]` + Executes the unified obfuscation pipeline. The optional `--seed` fixes RNG output (deterministic replays); `--passes` controls the user-facing transforms (default `shuffle,jump_transform,opaque_pred`). When a Solidity dispatcher is detected the hardened dispatcher transform runs automatically. `--emit` writes gas/size metrics to JSON, and `--emit-debug` exports the recorded CFG trace. + +Each command returns a non-zero exit code on failure, so you can integrate the CLI into scripts or CI jobs easily. diff --git a/docs/src/contributing/README.md b/docs/src/contributing/README.md index e69de29..ef7fcc1 100644 --- a/docs/src/contributing/README.md +++ b/docs/src/contributing/README.md @@ -0,0 +1,11 @@ +# Contributing + +Thanks for helping harden Azoth. This section documents how the codebase is organised, how to build and test it locally, and where to focus contributions. + +- [Architecture](./architecture.md) — crate layout and data flow. +- [Building](./building.md) — setting up toolchains and compiling the workspace. +- [Testing](./testing.md) — running unit/integration suites and linting. +- [Shortcuts](./shortcuts.md) — handy commands for day-to-day development. +- [Bounties](./bounties.md) — guidance for proposal-driven contributions. + +Before opening a pull request, please skim `CONTRIBUTING.md` at the repository root and match the Rust style guidelines enforced there (formatting, Clippy, etc.). diff --git a/docs/src/contributing/architecture.md b/docs/src/contributing/architecture.md index e69de29..9f9f0c7 100644 --- a/docs/src/contributing/architecture.md +++ b/docs/src/contributing/architecture.md @@ -0,0 +1,12 @@ +# Architecture + +Azoth is organised as a collection of focused crates that share a common `azoth_core` foundation: + +- `azoth_core` — decoding, section detection, bytecode stripping, CFG construction, and re-encoding. Every other crate consumes the data structures defined here. +- `azoth_transform` — transform trait implementations plus the orchestration pipeline (`obfuscator.rs`). Passes operate on `CfgIrBundle` and rely on `azoth_analysis` metrics to gauge impact. +- `azoth_analysis` — metrics (size, CFG complexity, stack peaks) and helper utilities (dominators). Used by transforms and the CLI to evaluate rewrites. +- `azoth_cli` — command-line interface exposing decode/strip/cfg/obfuscate workflows. It stitches the other crates together for end users. +- `azoth_verification` — formal verification scaffold that builds SMT queries to prove equivalence between original and obfuscated bytecode. +- `examples` — executable that demonstrates a full Mirage escrow obfuscation run, useful as an integration test bed. + +`target/` isn’t checked in, so CI and local builds share the same cargo workspace semantics. New functionality typically lands in `azoth_core` first, then bubbles up through transforms, CLI, and docs. diff --git a/docs/src/contributing/bounties.md b/docs/src/contributing/bounties.md index a51ddb7..34d235b 100644 --- a/docs/src/contributing/bounties.md +++ b/docs/src/contributing/bounties.md @@ -1 +1,10 @@ # Bounties + +We occasionally tag issues in the main repository with `bounty` when we want focused help on a feature, optimisation, or documentation push. To participate: + +1. **Claim the issue** — comment on the ticket so we know you are working on it. Maintainers will confirm assignment or coordinate duplicates. +2. **Discuss scope early** — if the acceptance criteria are unclear, open a short design note in the issue before writing code. This avoids surprises when it is time to review. +3. **Work in the open** — draft PRs are encouraged; the team can give feedback on direction, dependencies, or missing context. +4. **Submit a PR** — reference the bounty issue number, describe the implemented approach, and include test/run notes (commands, outputs, benchmarks). + +If you would like to propose a new bounty, open an issue prefixed with `[Bounty Proposal]` explaining the problem, expected deliverables, and suggested payout or contact the maintainers via the email listed in `CONTRIBUTING.md`. We'll review, tag, and announce it if it fits the roadmap. diff --git a/docs/src/contributing/building.md b/docs/src/contributing/building.md index e9a07cf..4dbc6f0 100644 --- a/docs/src/contributing/building.md +++ b/docs/src/contributing/building.md @@ -1 +1,34 @@ # Building + +Azoth uses a pinned stable toolchain (`rust-toolchain.toml` requests Rust 1.90.0 plus rustfmt and clippy). Install via `rustup` if you have not already: + +```bash +rustup toolchain install 1.90.0 --component rustfmt --component clippy +``` + +## Compile the workspace + +```bash +cargo build --workspace +``` + +This command builds every crate (core, transforms, analysis, CLI, verification, examples, tests). Add `--release` if you want optimised binaries. + +### Building individual crates + +- Core library only: `cargo build -p azoth-core` +- CLI binary: `cargo build -p azoth-cli --bin azoth` +- Example workflow: `cargo run -p azoth-examples` + +The build pulls dependencies such as Heimdall, REVM, and SMT utilities; make sure you have network access the first time you compile. + +## Formatting and linting + +Run formatting and clippy before submitting a change: + +```bash +cargo fmt --all +cargo clippy --workspace --all-targets -- -D warnings +``` + +Both commands use the pinned toolchain, keeping CI and local development consistent. diff --git a/docs/src/contributing/shortcuts.md b/docs/src/contributing/shortcuts.md index e69de29..0196acf 100644 --- a/docs/src/contributing/shortcuts.md +++ b/docs/src/contributing/shortcuts.md @@ -0,0 +1,10 @@ +# Shortcuts + +Handy commands while iterating locally: + +- `cargo run -p azoth-cli -- decode 0x...` — quick opcode inspection. +- `cargo run -p azoth-cli -- obfuscate --seed 0xdeadbeef --passes shuffle,jump_transform` — dry-run a transform combo deterministically. +- `cargo test -p azoth-core --lib` — run core unit tests without touching the rest of the workspace. +- `cargo test -p azoth-transform --all-targets` — exercise transform logic plus property tests. +- `cargo fmt --all && cargo clippy --workspace --all-targets -- -D warnings` — one-liner style/lint check before committing. +- `cargo doc --workspace --no-deps --open` — build rustdoc for all crates to cross-check API descriptions against this book. diff --git a/docs/src/contributing/testing.md b/docs/src/contributing/testing.md index e69de29..8d7f724 100644 --- a/docs/src/contributing/testing.md +++ b/docs/src/contributing/testing.md @@ -0,0 +1,21 @@ +# Testing + +Azoth ships with unit tests inside each crate plus an integration test crate under `tests/`. To run everything: + +```bash +cargo test --workspace +``` + +### Targeted suites + +- Core strip/detection/CFG tests: `cargo test -p azoth-core` +- Analysis metrics: `cargo test -p azoth-analysis` +- Transform behaviour: `cargo test -p azoth-transform` +- CLI smoke tests: `cargo test -p azoth-cli` +- Multi-crate integration (escrow fixtures, e2e): `cargo test -p azoth-tests` + +Several transform tests rely on determinism. When adding new passes, seed the RNG explicitly (`StdRng::seed_from_u64`) so assertions stay stable on CI. + +### Additional checks + +Use `cargo clippy --workspace --all-targets -- -D warnings` to catch lints and `cargo fmt --all` to enforce formatting prior to submitting a PR. diff --git a/docs/src/core/README.md b/docs/src/core/README.md index e69de29..93bedfb 100644 --- a/docs/src/core/README.md +++ b/docs/src/core/README.md @@ -0,0 +1,13 @@ +# Core Crate + +The Azoth core crate implements the deterministic pipeline that turns raw bytecode into a transformable control-flow graph and then reassembles the final artifact. It exposes reusable building blocks that other crates (CLI, transforms, verification) consume so every stage shares the same understanding of program layout. + +At a high level the crate: + +- Normalizes and decodes bytecode into structured instructions (`decoder`). +- Detects init/runtime/auxiliary regions and dispatcher metadata (`detection`). +- Produces a cleaned runtime blob while preserving reassembly data (`strip`). +- Constructs the control-flow graph intermediate representation that powers transforms (`cfg_ir`). +- Encodes modified instruction streams back into deployable bytecode (`encoder`). + +Each module is documented in the following pages to show how they interlock and which data structures they introduce. diff --git a/docs/src/core/cfg_ir.md b/docs/src/core/cfg_ir.md index e69de29..b089b38 100644 --- a/docs/src/core/cfg_ir.md +++ b/docs/src/core/cfg_ir.md @@ -0,0 +1,10 @@ +# Control-Flow Graph IR + +The `cfg_ir` module turns decoded runtime instructions into a stable directed graph that every transform mutates. `build_cfg_ir` receives the runtime slice, detected sections, and `CleanReport` metadata; it splits instructions into basic blocks, wires edges based on control flow, annotates jump encodings, and records runtime bounds so that transforms know which blocks belong to the deployed code. + +Key data structures include: +- `Block`/`BlockBody`: graph nodes that store the first program counter, copied instructions, stack height, and control descriptor. +- `BlockControl` and `JumpTarget`: describe how a block exits (fallthrough, branch, terminal) and whether immediates are absolute PCs, runtime-relative offsets, or symbolic. +- `CfgIrBundle`: the container returned to transforms, holding the graph, PC-to-node map, detected sections, original bytecode, runtime bounds, and a trace log of structural edits for downstream tooling. + +During assembly the module validates that every `JUMPDEST` begins a block, adds entry/exit sentinels, emits edges with semantic labels (`Fallthrough`, `Jump`, `BranchTrue`, `BranchFalse`), and assigns simple SSA-style identifiers for stack tracking. Helper routines snapshot graph state and compute diffs so transforms can report their mutations and the encoder can rebuild a coherent runtime after rewriting blocks. diff --git a/docs/src/core/decoder.md b/docs/src/core/decoder.md index e69de29..ae39a79 100644 --- a/docs/src/core/decoder.md +++ b/docs/src/core/decoder.md @@ -0,0 +1,10 @@ +# Decoder + +The decoder module is the first stop in the pipeline: it normalizes raw hex or file input, runs Heimdall to disassemble the byte stream, and turns the output into structured `Instruction` values. Each instruction records the program counter, parsed opcode (mapped to `eot::UnifiedOpcode`), and any immediate operand so later stages can reason about stack effects or rewrite PUSH data. + +`decode_bytecode` returns four artifacts in one call: +- the instruction stream, +- `DecodeInfo` metadata (length, Keccak-256 hash, and whether the source was inline hex or a file), +- the raw assembly text for debugging, and +- the original byte vector. +Parsing is intentionally strict—missing PCs, malformed opcodes, and empty output produce explicit errors—because downstream modules assume the stream is well formed. Unknown opcodes are tagged as `Opcode::UNKNOWN` or `Opcode::INVALID` placeholders; the encoder relies on the preserved program counter to recover the original byte when rebuilding. diff --git a/docs/src/core/detection.md b/docs/src/core/detection.md index e69de29..fac12c2 100644 --- a/docs/src/core/detection.md +++ b/docs/src/core/detection.md @@ -0,0 +1,7 @@ +# Detection + +The detection module classifies bytecode regions and extracts Solidity dispatcher metadata so later passes know which bytes belong to deployment scaffolding versus runtime logic. `locate_sections` walks the disassembled instructions, identifies auxdata, padding, init/runtime boundaries, and optional constructor arguments, and emits an ordered list of `Section { kind, offset, len }`. The logic combines strict deployment-pattern matching with heuristics (e.g. CODECOPY+RETURN, CALLDATASIZE prologues) to stay resilient against obfuscated inputs while still validating that sections are gap-free and inside bounds. + +For dispatcher analysis, `detect_function_dispatcher` tracks the stack across the function selector prologue and pairs PUSHed selectors with their jump destinations. The result is a `DispatcherInfo` structure that records extraction style (standard, alternative, fallback-only, etc.) plus the selector-to-target mapping, which drives both transform heuristics and verification. + +Utility helpers such as `extract_runtime_instructions` and `validate_sections` allow other modules (`strip`, `cfg_ir`) to operate exclusively on the runtime slice or assert structural soundness before mutating the program. diff --git a/docs/src/core/encoder.md b/docs/src/core/encoder.md index e69de29..a78116e 100644 --- a/docs/src/core/encoder.md +++ b/docs/src/core/encoder.md @@ -0,0 +1,5 @@ +# Encoder + +Once transforms have rewritten the CFG, the encoder module turns the updated instruction stream back into raw bytes. `encode` walks each `Instruction`, emits the opcode byte, and for PUSH instructions validates and appends the immediate payload. When the decoder previously marked an opcode as `INVALID` because Heimdall could not name it, the encoder preserves the original byte by looking up the program counter in the reference bytecode instead of emitting `0xfe`, ensuring round-trips do not corrupt unknown instructions. + +The module also exposes `rebuild`, a thin wrapper over `CleanReport::reassemble`, which stitches the modified runtime back together with the removed sections (init, constructor args, auxdata) recorded by `strip`. Together these functions guarantee that transforms can operate at the instruction level while still producing a deployable payload after rewriting control flow. diff --git a/docs/src/core/strip.md b/docs/src/core/strip.md index e69de29..f258344 100644 --- a/docs/src/core/strip.md +++ b/docs/src/core/strip.md @@ -0,0 +1,5 @@ +# Strip + +The strip module removes non-runtime data from the detected sections so transforms work on the smallest possible surface area. `strip_bytecode` receives the raw byte vector and `Section` list, copies only the runtime spans into a clean buffer, and records everything else as `Removed` entries. The companion `CleanReport` captures enough metadata to reassemble the original artifact: runtime offsets and lengths, stripped bytes with their offsets and kinds, hashes of the clean runtime, and a program-counter mapping that relates original PCs to the stripped layout. + +If no runtime section is found the function fails fast, preventing downstream passes from operating on empty input. After transforms produce a new runtime blob, `CleanReport::reassemble` (invoked through `encoder::rebuild`) restores the removed init code, constructor arguments, padding, and auxdata, updating CODECOPY/RETURN parameters when necessary so the deployment payload stays coherent with the modified runtime size. diff --git a/docs/src/examples.md b/docs/src/examples.md index e69de29..9b49dc3 100644 --- a/docs/src/examples.md +++ b/docs/src/examples.md @@ -0,0 +1,20 @@ +# Examples + +The `examples/` crate demonstrates Azoth in a Mirage Protocol workflow. It pulls the escrow contract bytecode, applies the standard transform stack with a fixed seed, and validates determinism, size/gas overhead, and placeholder functional checks. + +## Quick start + +```bash +cd examples +chmod +x run_escrow.sh # optional helper that refreshes the escrow submodule +cargo run # or ./run_escrow.sh to automate the steps +``` + +The binary loads `escrow-bytecode/artifacts/bytecode.hex`, runs `azoth_transform::obfuscator::obfuscate_bytecode` with shuffle/jump-address/opaque-predicate transforms, and writes a `mirage_report.json` summary containing: + +- sizes before/after, +- applied transforms and unknown opcode counts, +- gas estimates derived from byte length, and +- deterministic recompilation checks. + +Use this project as a template for integrating Azoth into larger build pipelines or for writing regression tests around specific contracts. diff --git a/docs/src/formal_verification.md b/docs/src/formal_verification.md index e69de29..b0fac74 100644 --- a/docs/src/formal_verification.md +++ b/docs/src/formal_verification.md @@ -0,0 +1,21 @@ +# Formal Verification + +The `azoth-verification` crate provides a scaffold for proving that an obfuscated contract behaves exactly like the original one. It models bytecode semantics, encodes equivalence properties in SMT-LIB, and delegates solving to an SMT backend (Z3-compatible). + +## Engine overview + +- `FormalVerifier::prove_equivalence` is the main entry point. It extracts semantics from both versions of the bytecode, then constructs proof obligations for: + - **Bisimulation** – step-by-step execution traces match. + - **State Equivalence** – final storage/balance state is identical for any transaction. + - **Property Preservation** – user-supplied security properties (`SecurityProperty`) continue to hold. + - **Gas Bounds** – obfuscated execution stays within an acceptable overhead. +- Each obligation is represented as a `ProofStatement` that records the SMT query, solver verdict, and runtime. +- Results aggregate into a `FormalProof` tagged with the combined proof types. + +## Current status + +The infrastructure builds the SMT problems and plumbing, but the actual solver calls are still stubbed (`TODO` markers set `proven = true`). Integrating concrete semantics and feeding them to the solver is in progress. Until then, treat the proofs as scaffolding suitable for development/testing rather than production guarantees. + +## Extending properties + +`properties.rs` defines reusable arithmetic/security predicates. You can add project-specific invariants by extending the enum and teaching `to_smt_formula` how to render them. Once the solver integration is complete these formulas become part of the combined proof output. diff --git a/docs/src/transforms/README.md b/docs/src/transforms/README.md index e69de29..ffa9878 100644 --- a/docs/src/transforms/README.md +++ b/docs/src/transforms/README.md @@ -0,0 +1,27 @@ +# Transform Passes + +Azoth’s transform crate wraps every obfuscation pass in a shared pipeline so we can mutate runtime bytecode deterministically and replay results with a fixed seed. The entry point is `obfuscator::obfuscate_bytecode`, which: + +1. decodes input into CFG IR and section metadata via `azoth_core`, +2. detects whether a Solidity-style dispatcher is present, +3. applies the requested passes (plus dispatcher hardening when available), +4. records `TraceEvent`s for each change, and +5. reassembles deployable bytecode while tracking size/stack metrics. + +Transforms implement the `Transform` trait (`fn name(&self) -> &'static str` and `fn apply(&self, ir: &mut CfgIrBundle, rng: &mut StdRng) -> Result`). A pass returns `true` when it actually changed the CFG, allowing the obfuscator to skip no-op metrics and keep acceptance logic simple. Errors bubble up through a shared `Error` enum that wraps core/encoder/metrics failures. + +## Available passes + +- **Function Dispatcher** (`function_dispatcher.rs`) + Automatically activated when the runtime contains a Solidity dispatcher. It remaps every selector to a keccak-derived token, updates jump tables and internal `PUSH4` call sites, and leaves overall layout untouched so downstream analysis still lines up with the original CFG structure. + +- **Shuffle** (`shuffle.rs`) + Reorders basic blocks inside the runtime while preserving entry/exit edges. Every jump target is recalculated from the CFG, so layout changes without affecting correctness. + +- **Jump Address Transformer** (`jump_address_transformer.rs`) + Replaces direct `PUSH ; JUMP/I` patterns with runtime arithmetic (e.g., split immediates plus `ADD`). This forces tooling to execute stack operations to recover destinations, complicating static recovery. + +- **Opaque Predicate** (`opaque_predicate.rs`) + Injects always-true branches built from random constants and cheap arithmetic, inflating node/edge counts to confuse control-flow analysis while preserving fallthrough behaviour. + +Additional passes can be added by implementing `Transform` and wiring them into the CLI/obfuscator config. Metrics from `azoth_analysis` are collected before and after each pass to gate acceptance thresholds in higher-level workflows.