Skip to content

Architecture Overview

Hugo edited this page Feb 26, 2026 · 1 revision

Architecture Overview

High-Level Design

The analyzer is structured in four layers:

 CLI / Library API
       |
   App Orchestrator (AnalyzerApp)
       |
   Analysis Pipeline (AnalysisPipeline)
       |
   Analysis Modules (stack safety + lifetime + quality checks)

Layer 1: Entry Points

  • CLI (main.cpp + cli/ArgParser.cpp): parses command-line arguments into ParsedArguments
  • Library API (StackUsageAnalyzer.hpp): exposes analyzeModule(), analyzeFile(), toJson(), toSarif()

Layer 2: App Orchestrator

src/app/AnalyzerApp.cpp handles:

  • Input discovery (explicit files + compile database auto-discovery)
  • Compilation database loading
  • Multi-file parallel loading with --jobs
  • Cross-TU summary index construction (resource + uninitialized)
  • Output strategy selection (human, JSON, SARIF)
  • Filter application and result merging

Layer 3: Analysis Pipeline

src/analyzer/AnalysisPipeline.cpp coordinates:

  1. Module preparation (metadata computation)
  2. Analysis pass execution
  3. Reachability filtering
  4. Diagnostic emission

Layer 4: Analysis Modules

src/analysis/ contains check implementations and shared analysis utilities. Diagnostic-oriented checks are emitted through DiagnosticEmitter, which keeps reporting concerns centralized.


Module Map

include/
  StackUsageAnalyzer.hpp          # Public API
  analysis/                        # analysis check/service headers
  analyzer/                        # Pipeline + services
    AnalysisPipeline.hpp
    DiagnosticEmitter.hpp
    LocationResolver.hpp
    ModulePreparationService.hpp
  app/AnalyzerApp.hpp              # App orchestrator
  cli/ArgParser.hpp                # CLI parser

src/
  StackUsageAnalyzer.cpp           # Public API implementation
  main.cpp                         # CLI entry point
  app/AnalyzerApp.cpp              # Orchestrator
  analyzer/
    AnalysisPipeline.cpp           # Facade over analysis services
    DiagnosticEmitter.cpp          # Findings -> Diagnostics
    LocationResolver.cpp           # LLVM debug info -> source locations
    ModulePreparationService.cpp   # Builds ModuleAnalysisContext
  analysis/
    StackBufferAnalysis.cpp        # Buffer overflow detection
    DynamicAlloca.cpp              # VLA / dynamic alloca
    StackPointerEscape.cpp         # Stack address leaks
    StackPointerEscapeModel.cpp    # External escape model parsing
    StackPointerEscapeResolver.cpp # Escape resolution pipeline
    ResourceLifetimeAnalysis.cpp   # Resource leak detection
    UninitializedVarAnalysis.cpp   # Uninitialized reads
    MemIntrinsicOverflow.cpp       # memcpy/memset overflow
    StackComputation.cpp           # Call graph + stack sizes
    Reachability.cpp               # Dead code filtering
    DuplicateIfCondition.cpp       # Duplicate if-else
    ConstParamAnalysis.cpp         # Const parameter detection
    SizeMinusKWrites.cpp           # Off-by-one detection
    InvalidBaseReconstruction.cpp  # Pointer arithmetic issues
    AllocaUsage.cpp                # Alloca usage classification
    IntRanges.cpp                  # Integer range analysis
    IRValueUtils.cpp               # LLVM IR utilities
    AnalyzerUtils.cpp              # Shared utilities
    CompileCommands.cpp            # compile_commands.json parser
    FunctionFilter.cpp             # STL/system function filter
    InputPipeline.cpp              # Module loading pipeline
  report/ReportSerialization.cpp   # JSON + SARIF output
  passes/ModulePasses.cpp          # LLVM module passes
  mangle.cpp                       # Symbol demangling

Data Flow

Source files (.c/.cpp)
    |
    v
[InputPipeline] -- compile, load LLVM module
    |
    v
[ModulePreparationService] -- compute local stack sizes, call graph, recursion metadata
    |                          produces: PreparedModule / ModuleAnalysisContext
    v
[Analysis Passes] -- StackBufferAnalysis, DynamicAlloca, ResourceLifetime, etc.
    |                 produces: raw findings
    v
[Reachability] -- filter unreachable findings
    |
    v
[LocationResolver] -- LLVM debug info -> source file/line/column
    |
    v
[DiagnosticEmitter] -- findings -> Diagnostic structs (ruleId, severity, message)
    |
    v
[Output] -- human text, JSON, or SARIF

Key Design Patterns

Facade: AnalysisPipeline

AnalysisPipeline is the single entry point for module-level analysis. It coordinates preparation, passes, and emission without exposing internal services.

Application Service: ModulePreparationService

Produces PreparedModule containing computed metadata (stack sizes, call graph, recursion flags). Pure module state derivation with no diagnostic side effects.

Domain Service: LocationResolver

Stateless service that converts LLVM debug locations to source coordinates. Handles multiple LLVM-specific fallbacks (debug intrinsics, alloca metadata).

Adapter: DiagnosticEmitter

Bridges between analysis model objects (findings) and output/report models (Diagnostic structs). Central place for rule IDs, severities, and formatting.

Policy: Reachability

Isolated reachability heuristics for filtering findings. Can evolve independently of analysis passes.

Strategy: Execution + Output

AnalyzerApp uses the Strategy pattern for:

  • Execution: SharedModuleLoadingExecutionStrategy (cross-TU) vs DirectModuleLoadingExecutionStrategy (independent files)
  • Output: JsonOutputStrategy, SarifOutputStrategy, HumanOutputStrategy

Cross-TU Analysis

For multi-file analysis with cross-translation-unit features:

  1. All modules are loaded in parallel (using --jobs)
  2. A global summary index is built via fixed-point iteration (max 12 rounds):
    • Each module extracts local summaries
    • Summaries are merged into a global index
    • Process repeats until convergence
  3. The converged global index is shared with all analysis passes
  4. Individual module analysis uses the global index for inter-procedural reasoning

This applies to both resource lifetime and uninitialized variable cross-TU analysis.

Summary caching is available for resource lifetime analysis (filesystem or memory-only). Uninitialized cross-TU currently rebuilds summaries in-memory each run.


Dependencies

Dependency Purpose
LLVM/Clang 19+ IR parsing, module loading, debug info
coretrace-compiler C/C++ to LLVM IR compilation
coretrace-logger Structured logging

Clone this wiki locally