Skip to content

LLVM-based static analysis tool that detects stack overflows, unsafe stack operations, and recursion-related vulnerabilities in C and C++ code

Notifications You must be signed in to change notification settings

CoreTrace/coretrace-stack-analyzer

Use this GitHub action with your project
Add this Action to an existing workflow or create a new one
View on Marketplace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

405 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

coretrace-stack-analyzer

BUILD (macOS/Linux)

./build.sh

The build script auto-detects LLVM/Clang using Homebrew (macOS) or llvm-config (Linux). If detection fails, set LLVM_DIR and Clang_DIR.

Options:

  • --build-dir <dir> (default: build)
  • --type <Release|Debug|RelWithDebInfo> (default: Release)
  • --generator <Ninja|Unix Makefiles>
  • --jobs <n>
  • --llvm-dir <path> / --clang-dir <path>
  • --clean
  • --configure-only

Examples:

./build.sh --type Release
./build.sh --type Debug --build-dir out/build
LLVM_DIR=/opt/llvm/lib/cmake/llvm Clang_DIR=/opt/llvm/lib/cmake/clang ./build.sh --generator Ninja

CI/CD integration (GitHub Actions)

For CI usage as a code analyzer, use a two-layer setup:

  • stack_usage_analyzer remains the analysis engine.
  • scripts/ci/run_code_analysis.py is the CI adapter (report export + policy gate).

Why this architecture:

  • The analyzer stays CI-agnostic and reusable everywhere (CLI, local scripts, CI).
  • CI policy (fail-on=error|warning|none) is isolated in one place.
  • It provides stable artifacts for platforms (JSON + SARIF) without changing analyzer core logic.

Quick example (same repository):

python3 scripts/ci/run_code_analysis.py \
  --analyzer ./build/stack_usage_analyzer \
  --compdb ./build/compile_commands.json \
  --fail-on error \
  --json-out artifacts/stack-usage.json \
  --sarif-out artifacts/stack-usage.sarif

GitHub Actions consumer example is available at:

  • docs/ci/github-actions-consumer.yml
  • docs/ci/github-actions-module-consumer.yml (consume this repo directly via uses:)
  • Analyzer architecture notes: docs/architecture/analyzer-modules.md

Reusable GitHub Action module (for other repositories)

If you publish tags for this repository, other projects can consume it directly:

name: Stack Analysis

on:
  pull_request:
  workflow_dispatch:

jobs:
  analyze:
    runs-on: ubuntu-24.04
    permissions:
      contents: read
      security-events: write
    steps:
      - uses: actions/checkout@v4

      - name: Generate compile_commands.json
        run: cmake -S . -B build -DCMAKE_EXPORT_COMPILE_COMMANDS=ON

      - name: Run CoreTrace action module
        uses: CoreTrace/coretrace-stack-analyzer@v0
        with:
          compile-commands: build/compile_commands.json
          analysis-profile: fast
          resource-model: default
          resource-cache-memory-only: "true"
          fail-on: error
          sarif-file: artifacts/coretrace-stack-analysis.sarif
          json-file: artifacts/coretrace-stack-analysis.json
          upload-sarif: "true"

Notes:

  • SARIF is generated by default (sarif-file) and can be uploaded automatically (upload-sarif: "true").
  • If compile-commands is not provided, the action tries common locations: build/compile_commands.json, compile_commands.json, .coretrace/build-linux/compile_commands.json.
  • If no compile database is found, it can fallback to git-tracked sources (inputs-from-git-fallback, enabled by default).

Docker image for registry-based CI

When you want a reusable analyzer image in CI (instead of rebuilding the tool each run), build and publish:

  • Dockerfile: analyzer runtime image with sensible defaults for full-repo analysis.
  • Dockerfile.ci: CI gate image (entrypoint = run_code_analysis.py).

Default behavior of Dockerfile runtime entrypoint:

  • auto-detect compile_commands.json from /workspace/build/compile_commands.json (fallback: /workspace/compile_commands.json)
  • --analysis-profile=fast
  • --compdb-fast (drops heavy/platform-specific compile flags from compile DB)
  • --resource-summary-cache-memory-only
  • --resource-model=/models/resource-lifetime/generic.txt
  • if compile_commands.json contains stale absolute paths (e.g. /tmp/evan/...) while the repo is mounted at /workspace, a compatibility symlink is created automatically when safe (so analysis can still run without extra Docker flags)

Runtime image is intentionally analyzer-only (toolchain/runtime + analyzer models). Project-specific SDKs/headers must be installed in the target CI job or in a derived image.

Simple local run (analyze whole repo from compile database):

docker build -t coretrace-stack-analyzer .
docker run --rm -v "$PWD:/workspace" coretrace-stack-analyzer

Override defaults:

docker run --rm -v "$PWD:/workspace" coretrace-stack-analyzer \
  --analysis-profile=full \
  --warnings-only

Bypass defaults entirely:

docker run --rm -v "$PWD:/workspace" coretrace-stack-analyzer --raw --help

Build and push:

docker build -f Dockerfile.ci \
  --build-arg VERSION=0.1.0 \
  --build-arg VCS_REF="$(git rev-parse --short HEAD)" \
  -t ghcr.io/<org>/coretrace-stack-analyzer-ci:0.1.0 .

docker push ghcr.io/<org>/coretrace-stack-analyzer-ci:0.1.0

Run in CI (entrypoint already targets run_code_analysis.py):

docker run --rm \
  -u "$(id -u):$(id -g)" \
  -v "$PWD:/workspace" -w /workspace \
  ghcr.io/<org>/coretrace-stack-analyzer-ci:0.1.0 \
  --inputs-from-git --repo-root /workspace \
  --compdb /workspace/build/compile_commands.json \
  --analyzer-arg=--analysis-profile=fast \
  --analyzer-arg=--resource-summary-cache-memory-only \
  --analyzer-arg=--resource-model=/models/resource-lifetime/generic.txt \
  --exclude _deps/ \
  --base-dir /workspace \
  --fail-on error \
  --print-diagnostics warning \
  --json-out /workspace/artifacts/stack-usage.json \
  --sarif-out /workspace/artifacts/stack-usage.sarif

GitHub Actions: how to get compile_commands.json in CI

Most C/C++ repos generate it during CMake configure:

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=ON

Then run the Docker image with /workspace/build/compile_commands.json.

Important for CI reliability:

  • Generate compile_commands.json in the same OS/toolchain family as the analyzer run.
  • Reusing a macOS compile database in Linux CI often fails (-arch, -isysroot, Apple SDK paths).
  • --compdb-fast improves portability by dropping many heavy/platform-specific flags, but cannot replace missing third-party headers/SDKs.
  • If your project needs extra dependencies, extend the analyzer image:
FROM ghcr.io/<org>/coretrace-stack-analyzer:0.1.0
RUN apt-get update && apt-get install -y --no-install-recommends \
    <project-dev-packages> \
    && rm -rf /var/lib/apt/lists/*

Ready-to-adapt workflow examples:

  • non-Docker consumer: docs/ci/github-actions-consumer.yml
  • Docker consumer: docs/ci/github-actions-docker-consumer.yml

Code style (clang-format)

  • Target version: clang-format 20 (used in CI).
  • Format locally: ./scripts/format.sh
  • Check without modifying: ./scripts/format-check.sh
  • CMake: cmake --build build --target format or --target format-check
  • CI: the GitHub Actions clang-format job fails if a file is not formatted.

CORETRACE-STACK-USAGE CLI

./stack_usage_analyzer --mode=[abi/ir] test.[ll/c/cpp] other.[ll/c/cpp]
./stack_usage_analyzer main.cpp -I./include
./stack_usage_analyzer main.cpp -I./include --compile-arg=-I/opt/homebrew/opt/llvm@20/include
./stack_usage_analyzer main.cpp --compile-commands=build/compile_commands.json
./stack_usage_analyzer main.cpp -I./include --only-file=./main.cpp --only-function=main
./stack_usage_analyzer main.cpp --dump-ir=./debug/main.ll
./stack_usage_analyzer a.c b.c --dump-ir=./debug
--format=json|sarif|human
--analysis-profile=fast|full selects analysis precision/performance profile (default: full)
--quiet disables diagnostics entirely
--warnings-only hides info-level diagnostics; in human output it also lists only functions with warnings/errors
--stack-limit=<value> overrides stack limit (bytes, or KiB/MiB/GiB)
--compile-arg=<arg> passes an extra argument to the compiler
--compile-commands=<path> uses compile_commands.json (file or directory)
--compdb=<path> alias for --compile-commands
--compdb-fast drops heavy build flags for faster analysis
--include-compdb-deps includes `_deps` entries when inputs are auto-discovered from compile_commands.json
--jobs=<N> parallel jobs for multi-file loading/analysis and cross-TU resource summary build (default: 1)
--escape-model=<path> loads external noescape rules for stack pointer escape analysis (`noescape_arg`)
--resource-model=<path> loads external acquire/release rules for generic resource lifetime checks
--resource-cross-tu enables cross-TU resource summaries for resource lifetime analysis (default: on)
--no-resource-cross-tu disables cross-TU resource summaries
--resource-summary-cache-dir=<path> sets cache directory for cross-TU resource summaries (default: .cache/resource-lifetime)
--resource-summary-cache-memory-only keeps cross-TU summary cache in memory only (process-local, no files)
--timing prints compile/analysis timings to stderr
--dump-ir=<path> writes LLVM IR to a file (or directory for multiple inputs)
-I<dir> or -I <dir> adds an include directory
-D<name>[=value] or -D <name>[=value] defines a macro
--only-file=<path> or --only-file <path> filters by file
--only-dir=<path> or --only-dir <path> filters by directory
--exclude-dir=<dir0,dir1> excludes input files under one or more directories
--only-function=<name> or --only-function <name> filters by function
--only-func=<name> alias for --only-function
--STL includes STL/system library functions (default excludes them)
--dump-filter prints filter decisions (stderr)

To generate compile_commands.json with CMake, configure with -DCMAKE_EXPORT_COMPILE_COMMANDS=ON and point to the resulting file (often under build/).

If analysis feels slow, --compdb-fast disables heavy flags (optimizations, sanitizers, profiling) while keeping include paths and macros. For multi-file runs, --jobs=<N> parallelizes input loading; with resource lifetime cross-TU enabled it also parallelizes summary construction. When inputs are auto-discovered from compile_commands.json, _deps entries are skipped by default to keep analysis focused on project code; use --include-compdb-deps to opt back in.

Analysis profiles (fast vs full)

Use --analysis-profile=full (default) or --analysis-profile=fast.

Examples:

./build/stack_usage_analyzer --compile-commands=build/compile_commands.json --analysis-profile=fast
./build/stack_usage_analyzer --compile-commands=build/compile_commands.json --analysis-profile=full
  • fast:
    • For StackBufferOverflow and MultipleStores checks, functions bigger than 1200 IR instructions are skipped.
    • StackBufferOverflow analyzes at most 16 getelementptr sites per function.
    • MultipleStores analyzes at most 32 store sites per function.
    • Alias backtracking through pointer stores is disabled for these two checks.
    • Result: significantly faster runs, with possible false negatives on very large/complex functions.
  • full:
    • No instruction-count skip for these checks.
    • No per-function GEP/store budget limit.
    • Full alias backtracking is enabled for these checks.
    • Result: better coverage/precision, but potentially much slower on large translation units.

When inputs are auto-discovered from compile_commands.json and multiple files are analyzed, the CLI auto-selects fast unless you explicitly pass --analysis-profile=full.

Library mode: forward analyzer args from another CLI

If you embed the analyzer as a library and still want to reuse analyzer-style arguments (--mode=..., --jobs=..., etc.), use the CLI parser bridge:

  • ctrace::stack::cli::parseArguments(const std::vector<std::string>&)
  • ctrace::stack::cli::parseCommandLine(const std::string&)

Example:

#include "cli/ArgParser.hpp"

auto parsed = ctrace::stack::cli::parseCommandLine(
    "--mode=abi --analysis-profile=fast --warnings-only --jobs=4"
);
if (parsed.status == ctrace::stack::cli::ParseStatus::Error) {
    // handle parsed.error
}

ctrace::stack::AnalysisConfig cfg = parsed.parsed.config;

This keeps one single source of truth for option semantics between CLI and library consumers.

When --compile-commands is provided and no input file is passed on the CLI, the analyzer automatically uses compile_commands.json as the source of truth:

  • it analyzes supported entries (.c, .cc, .cpp, .cxx, .ll)
  • it skips unsupported entries (e.g. Objective-C .m) with an explicit status line
  • it skips _deps entries by default (override with --include-compdb-deps)
  • duplicate file entries are merged deterministically, preferring the most informative command
  • translation units with no analyzable functions are reported as informational skips (not fatal errors)
  • --exclude-dir is applied before analysis to skip selected directory trees (works with explicit inputs and compdb-driven inputs)

Generic resource lifetime analysis (model-driven)

The analyzer can detect:

  • missing release in a function (ResourceLifetime.MissingRelease, CWE-772)
  • double release in a function (ResourceLifetime.DoubleRelease, CWE-415)
  • constructor acquisition not released in destructor for class fields (ResourceLifetime.MissingDestructorRelease, CWE-772)

Why this architecture:

  • API ownership semantics are defined in an external model file instead of hardcoded rules.
  • The same analysis engine stays reusable across libraries (Vulkan, file handles, sockets, custom APIs).
  • Extending coverage does not require modifying analyzer core logic.
  • Cross-TU summaries propagate ownership effects across translation units without requiring whole-program linking.
  • Incremental summary caching keeps multi-file analysis scalable in CI by reusing unchanged module summaries.

Cross-TU summary behavior:

  • Active when --resource-model is provided and multiple input files are analyzed.
  • --resource-cross-tu keeps this behavior enabled (default).
  • --no-resource-cross-tu forces local-only (single-file) resource reasoning.
  • --resource-summary-cache-dir=<path> controls where per-module summary cache files are stored.
  • --resource-summary-cache-memory-only disables filesystem cache writes and uses an in-process cache only.
  • --jobs=<N> parallelizes module loading/compilation and per-module summary extraction during each fixpoint iteration.
  • The CLI prints an explicit status line to stderr to indicate whether resource inter-procedural analysis is enabled or unavailable/disabled (with reason).
  • If a local release depends on an unmodeled/external callee and no summary is available, the tool emits ResourceLifetime.IncompleteInterproc as a warning to make precision limits visible.

Model format (--resource-model=<path>):

acquire_out <function-pattern> <out-arg-index> <resource-kind>
acquire_ret <function-pattern> <resource-kind>
release_arg <function-pattern> <arg-index> <resource-kind>

Function pattern matching supports exact names and glob patterns (*, ?, [ ... ]) and is applied to symbol names and demangled names.

Example model:

acquire_out acquire_handle 0 GenericHandle
release_arg release_handle 0 GenericHandle

Example run:

./build/stack_usage_analyzer \
  test/resource-lifetime/local-missing-release.c \
  --resource-model=models/resource-lifetime/generic.txt \
  --warnings-only

./build/stack_usage_analyzer \
  test/resource-lifetime/cross-tu-wrapper-def.c \
  test/resource-lifetime/cross-tu-wrapper-use.c \
  --resource-model=models/resource-lifetime/generic.txt \
  --resource-summary-cache-memory-only \
  --warnings-only

./build/stack_usage_analyzer \
  test/resource-lifetime/cross-tu-wrapper-def.c \
  test/resource-lifetime/cross-tu-wrapper-use.c \
  --resource-model=models/resource-lifetime/generic.txt \
  --resource-summary-cache-dir=.cache/resource-lifetime \
  --warnings-only

For test files, run_test.py also supports per-file model selection with: // resource-model: <path>.

Example

Given this code:

#define SIZE_LARGE 8192000000
#define SIZE_SMALL (SIZE_LARGE / 2)

int main(void)
{
    char test[SIZE_LARGE];

    return 0;
}

You can pass either the .c file or the corresponding .ll file to the analyzer. You may receive the following output:

Language: C
Compiling source file to LLVM IR...
Mode: ABI

Function: main
  local stack: 4096000016 bytes
  max stack (including callees): 4096000016 bytes
  [!] potential stack overflow: exceeds limit of 8388608 bytes

Given this code:

int foo(void)
{
    char test[8192000000];
    return 0;
}

int bar(void)
{
    return 0;
}

int main(void)
{
    foo();
    bar();

    return 0;
}

Depending on the selected --mode, you may obtain the following results:

Language: C
Compiling source file to LLVM IR...
Mode: ABI

Function: foo
  local stack: 8192000000 bytes
  max stack (including callees): 8192000000 bytes
  [!] potential stack overflow: exceeds limit of 8388608 bytes

Function: bar
  local stack: 16 bytes
  max stack (including callees): 16 bytes

Function: main
  local stack: 32 bytes
  max stack (including callees): 8192000032 bytes
  [!] potential stack overflow: exceeds limit of 8388608 bytes
Language: C
Compiling source file to LLVM IR...
Mode: IR

Function: foo
  local stack: 8192000000 bytes
  max stack (including callees): 8192000000 bytes
  [!] potential stack overflow: exceeds limit of 8388608 bytes

Function: bar
  local stack: 0 bytes
  max stack (including callees): 0 bytes

Function: main
  local stack: 16 bytes
  max stack (including callees): 8192000016 bytes
  [!] potential stack overflow: exceeds limit of 8388608 bytes

9. Stack pointer leak detection

Examples:

char buf[10];
return buf;    // returns pointer to stack -> use-after-return

Or storing:

global = buf; // leaking address of stack variable

Stack escape API contracts (--escape-model=<path>)

  • Why this exists:
    • Some external APIs consume pointer arguments immediately during the call.
    • Their declarations often do not carry LLVM nocapture-like attributes.
    • A model lets you encode this behavior without hardcoding library names in analyzer code.
  • Resolution order used by the analyzer:
    • LLVM call-site attributes (nocapture / byval / byref)
    • Inter-procedural summary (for analyzed definitions)
    • External stack-escape model (noescape_arg)
    • Opaque external call without proof/model: no strong escape diagnostic is emitted.

Model format (--escape-model=<path>):

noescape_arg <function-pattern> <arg-index>

Function pattern matching supports exact names and glob patterns (*, ?, [ ... ]) and is applied to symbol names and demangled names.

Example model:

noescape_arg vkUpdateDescriptorSets 2
noescape_arg vkUpdateDescriptorSets 4

For test files, run_test.py supports per-file selection with: // escape-model: <path>.


Actually done:

    1. Multi-file CLI inputs with deterministic ordering and aggregated output.
    1. Per-result file attribution in JSON/SARIF and diagnostics.
    1. Filters: --only-file, --only-dir, --exclude-dir, --only-function/--only-func, plus --dump-filter.
    1. Compile args passthrough: -I, -D, --compile-arg.
    1. Dynamic alloca / VLA detection, including user-controlled sizes, upper-bound inference, and recursion-aware severity (errors for infinite recursion or oversized allocations, warnings for other dynamic sizes).
    1. Deriving human-friendly names for unnamed allocas in diagnostics.
    1. Detection of memcpy/memset overflows on stack buffers.
    1. Warning when a function performs multiple stores into the same stack buffer.
    1. Deeper traversal analysis: constraint propagation.
    1. Detection of deep indirection in aliasing.
    1. Detection of overflow in a struct containing an internal array.
    1. Detection of stack pointer leaks:
    • store_unknown -> storing the pointer in a non-local location (typically out-parameter, heap, etc.)
    • call_callback -> passing it to a callback (indirect call)
    • call_arg -> passing it as an argument to a direct function, potentially capturable
    1. Generic resource lifetime analysis using external API models (acquire_out, acquire_ret, release_arg), including missing release, double release, and constructor/destructor lifecycle mismatches.

About

LLVM-based static analysis tool that detects stack overflows, unsafe stack operations, and recursion-related vulnerabilities in C and C++ code

Resources

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors