Skip to content

Analysis Pipeline

Hugo edited this page Feb 26, 2026 · 1 revision

Analysis Pipeline

The analysis pipeline is the core orchestration layer that coordinates how a single LLVM module is analyzed.


Entry Point

The public API analyzeModule(llvm::Module&, const AnalysisConfig&) in StackUsageAnalyzer.cpp delegates to AnalysisPipeline.

AnalysisResult analyzeModule(llvm::Module& mod, const AnalysisConfig& config);

Pipeline Stages

Stage 1: Module Preparation

File: src/analyzer/ModulePreparationService.cpp

Builds ModuleAnalysisContext (or PreparedModule) containing:

  • Local stack sizes for every function
  • Filtered call graph (edges between analyzed functions)
  • Recursion metadata (cycle detection, infinite self-recursion heuristic via DominatorTree)
  • Function-level metadata (has dynamic alloca, is recursive, etc.)

This stage is pure computation with no diagnostic side effects. The PreparedModule is reusable and testable independently.

Stage 2: Analysis Pass Execution

Each analysis module is invoked on the prepared module. Passes are independent and can run on any function:

Pass What it does
StackComputation Computes max stack including callees, detects overflow
StackBufferAnalysis Detects buffer overflow via GEP + store analysis
DynamicAlloca Detects VLAs, user-controlled alloca, oversized alloca
AllocaUsage Classifies alloca usage patterns
MemIntrinsicOverflow Checks memcpy/memset sizes against buffer bounds
StackPointerEscape Detects stack address leaks (store, callback, return)
ResourceLifetimeAnalysis Model-driven acquire/release checking
UninitializedVarAnalysis Detects reads before writes
DuplicateIfCondition Detects duplicate conditions in if-else chains
ConstParamAnalysis Detects parameters that could be const
SizeMinusKWrites Detects off-by-one patterns
InvalidBaseReconstruction Detects unsafe pointer arithmetic
IntRanges Integer range analysis (used by other passes)

Stage 3: Reachability Filtering

File: src/analysis/Reachability.cpp

Filters out findings in unreachable code. Uses static reachability heuristics to annotate findings:

  • Detached basic blocks (no predecessors)
  • Blocks dominated by unreachable terminator paths
  • Multi-predecessor blocks where all paths are unreachable

Stage 4: Location Resolution

File: src/analyzer/LocationResolver.cpp

Converts LLVM debug locations (!dbg metadata) to source coordinates:

  • Primary: instruction debug location
  • Fallback: debug intrinsics (dbg.declare, dbg.value) for allocas
  • Normalization: line/column numbers, file paths

Stage 5: Diagnostic Emission

File: src/analyzer/DiagnosticEmitter.cpp

Converts raw analysis findings into Diagnostic structs:

  • Assigns rule IDs (e.g., StackBufferOverflow, ResourceLifetime.MissingRelease)
  • Sets severity (Info, Warning, Error)
  • Formats human-readable messages with alias paths, index values, etc.
  • Assigns CWE identifiers
  • Computes SARIF-compatible source ranges

Function Filtering

Before analysis, the pipeline applies function filtering:

  1. STL filter (FunctionFilter.cpp): excludes standard library, system, and third-party functions by default (override with --STL)
  2. Name/path filters: --only-function, --only-file, --only-dir are applied post-analysis during result filtering

Analysis Profile Impact

The fast profile modifies pass behavior:

  • StackBufferAnalysis: skips functions > 1200 IR instructions, limits to 16 GEP sites/function, disables alias backtracking
  • MultipleStores (within StackBufferAnalysis): limits to 32 store sites/function

All other passes run identically in both profiles.


Parallelism

Within a single module, analysis is sequential. Parallelism occurs at the multi-file level:

  • Module loading is parallelized with --jobs
  • Cross-TU summary extraction is parallelized per module
  • Without cross-TU, each file can be analyzed independently in parallel

Clone this wiki locally