Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/book.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[book]
authors = ["g4titanx"]
language = "en"
src = "src"
title = "Azoth"
19 changes: 19 additions & 0 deletions docs/src/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Summary

- [Introduction](./introduction.md)
- [Core Implementation](./core/README.md)
- [Control Flow Graph Intermediate Representation](./core/cfg_ir.md)
- [Decoder](./core/decoder.md)
- [Detection](./core/detection.md)
- [Encoder](./core/encoder.md)
- [Strip Pipeline](./core/strip.md)
- [Transform Passes](./transforms/README.md)
- [Program Analysis](./analysis.md)
- [Formal Verification](./formal_verification.md)
- [Command Line Interface](./cli.md)
- [Examples](./examples.md)
- [Contributing](./contributing/README.md)
- [Architecture](./contributing/architecture.md)
- [Shortcuts](./contributing/shortcuts.md)
- [Testing](./contributing/testing.md)
- [Appendix](./appendix/README.md)
19 changes: 19 additions & 0 deletions docs/src/analysis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Program Analysis

`azoth-analysis` computes quantitative metrics before and after each obfuscation pass so we can reason about size, CFG complexity, and stack pressure. Transforms and the CLI use these measurements to decide whether a candidate rewrite is worth keeping.

## Metrics collected

`metrics::collect_metrics` consumes a `CfgIrBundle` plus the `CleanReport` from `strip::strip_bytecode` and returns:

- `byte_len` – length of the cleaned runtime.
- `block_cnt` / `edge_cnt` – number of body blocks and edges in the CFG (entry/exit excluded).
- `max_stack_peak` – maximum recorded stack height across blocks.
- `dom_overlap` – fraction of nodes whose immediate dominator equals their immediate post-dominator (lower overlap ⇒ less linear control flow).
- `potency` – heuristic score derived from block/edge counts and dominator overlap (based on Wroblewski’s potency metric).

Consumers can compare two metric snapshots via `metrics::compare`, which highlights potency gains while accounting for bytecode growth. The CLI’s `obfuscate` subcommand prints these deltas after each run.

## Dominator utilities

The crate also exposes helpers such as `dominator_pairs`, `dom_overlap`, and `max_stack_per_block`. They are useful when you need deeper inspection during transform development or custom acceptance heuristics (e.g., rejecting passes that blow past a stack threshold).
1 change: 1 addition & 0 deletions docs/src/appendix/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Appendix
25 changes: 25 additions & 0 deletions docs/src/cli.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Command Line Interface

The `azoth` binary (crate `azoth-cli`) exposes the full pipeline from the terminal. Install with `cargo install --path crates/cli` or run in-place via `cargo run -p azoth-cli -- <command>`.

## Shared conventions

- Inputs accept either a hex literal (with or without `0x`) or a path to a `.hex`/binary file.
- Output is printed to stdout unless an explicit `--output`/`--emit` flag is provided.
- All commands normalise whitespace and underscores in hex payloads.

## Subcommands

- `azoth decode <INPUT>`
Runs the Heimdall disassembler and prints annotated assembly plus the structured instruction list. Useful for quick inspection or seeding tests.

- `azoth strip <INPUT> [--raw]`
Removes init code, constructor args, padding, and auxdata. By default it emits a JSON payload mirroring `strip::CleanReport`; pass `--raw` to dump just the cleaned runtime hex.

- `azoth cfg <INPUT> [--output <path>]`
Builds the runtime CFG and writes a Graphviz `.dot` representation (stdout by default). Pair with `dot -Tpng` for visualisation.

- `azoth obfuscate <INPUT> [--seed HEX] [--passes list] [--emit path] [--emit-debug path]`
Executes the unified obfuscation pipeline. The optional `--seed` fixes RNG output (deterministic replays); `--passes` controls the user-facing transforms (default `shuffle,jump_transform,opaque_pred`). When a Solidity dispatcher is detected the hardened dispatcher transform runs automatically. `--emit` writes gas/size metrics to JSON, and `--emit-debug` exports the recorded CFG trace.

Each command returns a non-zero exit code on failure, so you can integrate the CLI into scripts or CI jobs easily.
11 changes: 11 additions & 0 deletions docs/src/contributing/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Contributing

Thanks for helping harden Azoth. This section documents how the codebase is organised, how to build and test it locally, and where to focus contributions.

- [Architecture](./architecture.md) — crate layout and data flow.
- [Building](./building.md) — setting up toolchains and compiling the workspace.
- [Testing](./testing.md) — running unit/integration suites and linting.
- [Shortcuts](./shortcuts.md) — handy commands for day-to-day development.
- [Bounties](./bounties.md) — guidance for proposal-driven contributions.

Before opening a pull request, please skim `CONTRIBUTING.md` at the repository root and match the Rust style guidelines enforced there (formatting, Clippy, etc.).
12 changes: 12 additions & 0 deletions docs/src/contributing/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Architecture

Azoth is organised as a collection of focused crates that share a common `azoth_core` foundation:

- `azoth_core` — decoding, section detection, bytecode stripping, CFG construction, and re-encoding. Every other crate consumes the data structures defined here.
- `azoth_transform` — transform trait implementations plus the orchestration pipeline (`obfuscator.rs`). Passes operate on `CfgIrBundle` and rely on `azoth_analysis` metrics to gauge impact.
- `azoth_analysis` — metrics (size, CFG complexity, stack peaks) and helper utilities (dominators). Used by transforms and the CLI to evaluate rewrites.
- `azoth_cli` — command-line interface exposing decode/strip/cfg/obfuscate workflows. It stitches the other crates together for end users.
- `azoth_verification` — formal verification scaffold that builds SMT queries to prove equivalence between original and obfuscated bytecode.
- `examples` — executable that demonstrates a full Mirage escrow obfuscation run, useful as an integration test bed.

`target/` isn’t checked in, so CI and local builds share the same cargo workspace semantics. New functionality typically lands in `azoth_core` first, then bubbles up through transforms, CLI, and docs.
10 changes: 10 additions & 0 deletions docs/src/contributing/bounties.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Bounties

We occasionally tag issues in the main repository with `bounty` when we want focused help on a feature, optimisation, or documentation push. To participate:

1. **Claim the issue** — comment on the ticket so we know you are working on it. Maintainers will confirm assignment or coordinate duplicates.
2. **Discuss scope early** — if the acceptance criteria are unclear, open a short design note in the issue before writing code. This avoids surprises when it is time to review.
3. **Work in the open** — draft PRs are encouraged; the team can give feedback on direction, dependencies, or missing context.
4. **Submit a PR** — reference the bounty issue number, describe the implemented approach, and include test/run notes (commands, outputs, benchmarks).

If you would like to propose a new bounty, open an issue prefixed with `[Bounty Proposal]` explaining the problem, expected deliverables, and suggested payout or contact the maintainers via the email listed in `CONTRIBUTING.md`. We'll review, tag, and announce it if it fits the roadmap.
34 changes: 34 additions & 0 deletions docs/src/contributing/building.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Building

Azoth uses a pinned stable toolchain (`rust-toolchain.toml` requests Rust 1.90.0 plus rustfmt and clippy). Install via `rustup` if you have not already:

```bash
rustup toolchain install 1.90.0 --component rustfmt --component clippy
```

## Compile the workspace

```bash
cargo build --workspace
```

This command builds every crate (core, transforms, analysis, CLI, verification, examples, tests). Add `--release` if you want optimised binaries.

### Building individual crates

- Core library only: `cargo build -p azoth-core`
- CLI binary: `cargo build -p azoth-cli --bin azoth`
- Example workflow: `cargo run -p azoth-examples`

The build pulls dependencies such as Heimdall, REVM, and SMT utilities; make sure you have network access the first time you compile.

## Formatting and linting

Run formatting and clippy before submitting a change:

```bash
cargo fmt --all
cargo clippy --workspace --all-targets -- -D warnings
```

Both commands use the pinned toolchain, keeping CI and local development consistent.
10 changes: 10 additions & 0 deletions docs/src/contributing/shortcuts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Shortcuts

Handy commands while iterating locally:

- `cargo run -p azoth-cli -- decode 0x...` — quick opcode inspection.
- `cargo run -p azoth-cli -- obfuscate <HEX> --seed 0xdeadbeef --passes shuffle,jump_transform` — dry-run a transform combo deterministically.
- `cargo test -p azoth-core --lib` — run core unit tests without touching the rest of the workspace.
- `cargo test -p azoth-transform --all-targets` — exercise transform logic plus property tests.
- `cargo fmt --all && cargo clippy --workspace --all-targets -- -D warnings` — one-liner style/lint check before committing.
- `cargo doc --workspace --no-deps --open` — build rustdoc for all crates to cross-check API descriptions against this book.
21 changes: 21 additions & 0 deletions docs/src/contributing/testing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Testing

Azoth ships with unit tests inside each crate plus an integration test crate under `tests/`. To run everything:

```bash
cargo test --workspace
```

### Targeted suites

- Core strip/detection/CFG tests: `cargo test -p azoth-core`
- Analysis metrics: `cargo test -p azoth-analysis`
- Transform behaviour: `cargo test -p azoth-transform`
- CLI smoke tests: `cargo test -p azoth-cli`
- Multi-crate integration (escrow fixtures, e2e): `cargo test -p azoth-tests`

Several transform tests rely on determinism. When adding new passes, seed the RNG explicitly (`StdRng::seed_from_u64`) so assertions stay stable on CI.

### Additional checks

Use `cargo clippy --workspace --all-targets -- -D warnings` to catch lints and `cargo fmt --all` to enforce formatting prior to submitting a PR.
13 changes: 13 additions & 0 deletions docs/src/core/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Core Crate

The Azoth core crate implements the deterministic pipeline that turns raw bytecode into a transformable control-flow graph and then reassembles the final artifact. It exposes reusable building blocks that other crates (CLI, transforms, verification) consume so every stage shares the same understanding of program layout.

At a high level the crate:

- Normalizes and decodes bytecode into structured instructions (`decoder`).
- Detects init/runtime/auxiliary regions and dispatcher metadata (`detection`).
- Produces a cleaned runtime blob while preserving reassembly data (`strip`).
- Constructs the control-flow graph intermediate representation that powers transforms (`cfg_ir`).
- Encodes modified instruction streams back into deployable bytecode (`encoder`).

Each module is documented in the following pages to show how they interlock and which data structures they introduce.
10 changes: 10 additions & 0 deletions docs/src/core/cfg_ir.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Control-Flow Graph IR

The `cfg_ir` module turns decoded runtime instructions into a stable directed graph that every transform mutates. `build_cfg_ir` receives the runtime slice, detected sections, and `CleanReport` metadata; it splits instructions into basic blocks, wires edges based on control flow, annotates jump encodings, and records runtime bounds so that transforms know which blocks belong to the deployed code.

Key data structures include:
- `Block`/`BlockBody`: graph nodes that store the first program counter, copied instructions, stack height, and control descriptor.
- `BlockControl` and `JumpTarget`: describe how a block exits (fallthrough, branch, terminal) and whether immediates are absolute PCs, runtime-relative offsets, or symbolic.
- `CfgIrBundle`: the container returned to transforms, holding the graph, PC-to-node map, detected sections, original bytecode, runtime bounds, and a trace log of structural edits for downstream tooling.

During assembly the module validates that every `JUMPDEST` begins a block, adds entry/exit sentinels, emits edges with semantic labels (`Fallthrough`, `Jump`, `BranchTrue`, `BranchFalse`), and assigns simple SSA-style identifiers for stack tracking. Helper routines snapshot graph state and compute diffs so transforms can report their mutations and the encoder can rebuild a coherent runtime after rewriting blocks.
10 changes: 10 additions & 0 deletions docs/src/core/decoder.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Decoder

The decoder module is the first stop in the pipeline: it normalizes raw hex or file input, runs Heimdall to disassemble the byte stream, and turns the output into structured `Instruction` values. Each instruction records the program counter, parsed opcode (mapped to `eot::UnifiedOpcode`), and any immediate operand so later stages can reason about stack effects or rewrite PUSH data.

`decode_bytecode` returns four artifacts in one call:
- the instruction stream,
- `DecodeInfo` metadata (length, Keccak-256 hash, and whether the source was inline hex or a file),
- the raw assembly text for debugging, and
- the original byte vector.
Parsing is intentionally strict—missing PCs, malformed opcodes, and empty output produce explicit errors—because downstream modules assume the stream is well formed. Unknown opcodes are tagged as `Opcode::UNKNOWN` or `Opcode::INVALID` placeholders; the encoder relies on the preserved program counter to recover the original byte when rebuilding.
7 changes: 7 additions & 0 deletions docs/src/core/detection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Detection

The detection module classifies bytecode regions and extracts Solidity dispatcher metadata so later passes know which bytes belong to deployment scaffolding versus runtime logic. `locate_sections` walks the disassembled instructions, identifies auxdata, padding, init/runtime boundaries, and optional constructor arguments, and emits an ordered list of `Section { kind, offset, len }`. The logic combines strict deployment-pattern matching with heuristics (e.g. CODECOPY+RETURN, CALLDATASIZE prologues) to stay resilient against obfuscated inputs while still validating that sections are gap-free and inside bounds.

For dispatcher analysis, `detect_function_dispatcher` tracks the stack across the function selector prologue and pairs PUSHed selectors with their jump destinations. The result is a `DispatcherInfo` structure that records extraction style (standard, alternative, fallback-only, etc.) plus the selector-to-target mapping, which drives both transform heuristics and verification.

Utility helpers such as `extract_runtime_instructions` and `validate_sections` allow other modules (`strip`, `cfg_ir`) to operate exclusively on the runtime slice or assert structural soundness before mutating the program.
5 changes: 5 additions & 0 deletions docs/src/core/encoder.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Encoder

Once transforms have rewritten the CFG, the encoder module turns the updated instruction stream back into raw bytes. `encode` walks each `Instruction`, emits the opcode byte, and for PUSH instructions validates and appends the immediate payload. When the decoder previously marked an opcode as `INVALID` because Heimdall could not name it, the encoder preserves the original byte by looking up the program counter in the reference bytecode instead of emitting `0xfe`, ensuring round-trips do not corrupt unknown instructions.

The module also exposes `rebuild`, a thin wrapper over `CleanReport::reassemble`, which stitches the modified runtime back together with the removed sections (init, constructor args, auxdata) recorded by `strip`. Together these functions guarantee that transforms can operate at the instruction level while still producing a deployable payload after rewriting control flow.
5 changes: 5 additions & 0 deletions docs/src/core/strip.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Strip

The strip module removes non-runtime data from the detected sections so transforms work on the smallest possible surface area. `strip_bytecode` receives the raw byte vector and `Section` list, copies only the runtime spans into a clean buffer, and records everything else as `Removed` entries. The companion `CleanReport` captures enough metadata to reassemble the original artifact: runtime offsets and lengths, stripped bytes with their offsets and kinds, hashes of the clean runtime, and a program-counter mapping that relates original PCs to the stripped layout.

If no runtime section is found the function fails fast, preventing downstream passes from operating on empty input. After transforms produce a new runtime blob, `CleanReport::reassemble` (invoked through `encoder::rebuild`) restores the removed init code, constructor arguments, padding, and auxdata, updating CODECOPY/RETURN parameters when necessary so the deployment payload stays coherent with the modified runtime size.
20 changes: 20 additions & 0 deletions docs/src/examples.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Examples

The `examples/` crate demonstrates Azoth in a Mirage Protocol workflow. It pulls the escrow contract bytecode, applies the standard transform stack with a fixed seed, and validates determinism, size/gas overhead, and placeholder functional checks.

## Quick start

```bash
cd examples
chmod +x run_escrow.sh # optional helper that refreshes the escrow submodule
cargo run # or ./run_escrow.sh to automate the steps
```

The binary loads `escrow-bytecode/artifacts/bytecode.hex`, runs `azoth_transform::obfuscator::obfuscate_bytecode` with shuffle/jump-address/opaque-predicate transforms, and writes a `mirage_report.json` summary containing:

- sizes before/after,
- applied transforms and unknown opcode counts,
- gas estimates derived from byte length, and
- deterministic recompilation checks.

Use this project as a template for integrating Azoth into larger build pipelines or for writing regression tests around specific contracts.
21 changes: 21 additions & 0 deletions docs/src/formal_verification.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Formal Verification

The `azoth-verification` crate provides a scaffold for proving that an obfuscated contract behaves exactly like the original one. It models bytecode semantics, encodes equivalence properties in SMT-LIB, and delegates solving to an SMT backend (Z3-compatible).

## Engine overview

- `FormalVerifier::prove_equivalence` is the main entry point. It extracts semantics from both versions of the bytecode, then constructs proof obligations for:
- **Bisimulation** – step-by-step execution traces match.
- **State Equivalence** – final storage/balance state is identical for any transaction.
- **Property Preservation** – user-supplied security properties (`SecurityProperty`) continue to hold.
- **Gas Bounds** – obfuscated execution stays within an acceptable overhead.
- Each obligation is represented as a `ProofStatement` that records the SMT query, solver verdict, and runtime.
- Results aggregate into a `FormalProof` tagged with the combined proof types.

## Current status

The infrastructure builds the SMT problems and plumbing, but the actual solver calls are still stubbed (`TODO` markers set `proven = true`). Integrating concrete semantics and feeding them to the solver is in progress. Until then, treat the proofs as scaffolding suitable for development/testing rather than production guarantees.

## Extending properties

`properties.rs` defines reusable arithmetic/security predicates. You can add project-specific invariants by extending the enum and teaching `to_smt_formula` how to render them. Once the solver integration is complete these formulas become part of the combined proof output.
5 changes: 5 additions & 0 deletions docs/src/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Azoth is a deterministic EVM bytecode obfuscator designed to make Mirage execution contracts indistinguishable from ordinary, unverified deployments on Ethereum. The project takes its name from the alchemical "universal solvent", reflecting its goal of transforming bytecode while preserving the intent of the original program.

The toolchain dissects contract bytecode, reconstructs control-flow, and applies deterministic rewrites that reshape structure without inflating gas usage or breaking deployability.

Within this documentation you will find guidance on the command-line interface, core architecture, transforms, analysis, and verification systems, and the source for the book [lives alongside](https://github.com/MiragePrivacy/azoth/tree/master/docs) Azoth on GitHub
Loading