Architecture Overview
This document provides a comprehensive overview of the Lotus architecture, describing how the different components interact and how they are organized.
System Architecture
Lotus is organized into several major subsystems that work together to provide a comprehensive program analysis framework:
┌──────────────────────────────────────────────────────────────┐
│ Command-Line Tools │
│ (aser-aa, dyck-aa, lotus-aa, lotus-kint, clam, etc.) │
└────────────────┬─────────────────────────────────────────────┘
│
┌────────────────┴─────────────────────────────────────────────┐
│ Analysis Applications │
│ - Bug Checkers (Kint, Taint, Concurrency, Pulse) │
│ - Abstract Interpreters (CLAM, SymAbsAI) │
│ - Fuzzing Support (Titan) │
└────┬────────────┬────────────┬────────────┬─────────────────┘
│ │ │ │
┌────┴────┐ ┌───┴────┐ ┌───┴────┐ ┌────┴─────┐
│ Alias │ │ IR │ │ Data │ │ Solvers │
│ Analysis│ │ (PDG, │ │ Flow │ │ (SMT, │
│ │ │ VFG) │ │ │ │ BDD) │
└────┬────┘ └───┬────┘ └───┬────┘ └────┬─────┘
│ │ │ │
┌────┴───────────┴───────────┴────────────┴─────────────────┐
│ LLVM Infrastructure │
│ (Module, Function, BasicBlock, etc.) │
└────────────────────────────────────────────────────────────┘
Core Components
Alias Analysis (
lib/Alias/,include/Alias/)Multiple pointer analysis algorithms with different precision-performance trade-offs:
DyckAA: Unification-based, exhaustive alias analysis with Dyck-CFL reachability
AserPTA: Constraint-based pointer analysis with multiple context sensitivities
LotusAA: Flow-sensitive, field-sensitive interprocedural analysis
Sea-DSA: Context-sensitive DSA-based memory analysis
SparrowAA: Inclusion-based pointer analysis
FPA: Function pointer analysis (FLTA, MLTA, MLTADF, KELP)
SRAA: Strict Relation Alias Analysis using range analysis
OriginAA: Origin-sensitive analysis for thread creation tracking
DynAA: Dynamic validation of static alias analysis results
Intermediate Representations (
lib/IR/,include/IR/)Specialized graph representations for program analysis:
PDG (Program Dependence Graph): Captures data and control dependencies
DyckVFG (Value Flow Graph): Tracks value flow for Dyck-based analyses
Call Graphs: Multiple call graph implementations (PDG, Dyck, AserPTA)
Data Flow Analysis (
lib/Dataflow/,include/Dataflow/)IFDS/IDE framework for interprocedural data flow problems:
Taint Analysis: Tracks tainted data from sources to sinks
Reaching Definitions: Identifies definition-use chains
Constraint Solving (
lib/Solvers/,include/Solvers/)Multiple solver backends for different analysis needs:
SMT Solver: Z3-based reasoning for precise constraint solving
BDD Solver: CUDD-based symbolic set operations
WPDS: Weighted pushdown systems for interprocedural reachability
Abstract Interpretation (
lib/Verification/clam/,lib/Verification/SymAbsAI/)Numerical and symbolic abstract domains:
CLAM: Abstract interpretation with multiple domains (intervals, zones, octagons, polyhedra)
SymAbsAI: Configurable abstract interpretation with domain composition
Analysis Utilities (
lib/Analysis/,include/Analysis/)Supporting analysis components:
CFG Analysis: Control flow graph utilities
Concurrency Analysis: Thread-aware analysis (MHP, lock sets)
Null Pointer Analysis: Multiple null checking algorithms
Range Analysis: Value range analysis for integers
Bug Detection (
lib/Checker/)Security and safety bug detection:
Kint: Integer overflow, division by zero, array bounds checking
Taint: Information flow and taint-style vulnerabilities
Concurrency: Race conditions and deadlock detection
CFL Reachability (
lib/CFL/,include/CFL/)Context-Free Language reachability engines:
CSR: graph indexing for CFL Reachability
Analysis Flow
Typical Analysis Pipeline
LLVM IR Input: Start with LLVM bitcode (
.bc) or IR (.ll)Preprocessing (Optional):
Lower unsupported constructs
Canonicalize representations
Devirtualization
Memory operation normalization
Core Analysis:
Build call graph
Perform alias analysis
Construct intermediate representations (PDG, VFG)
Run specialized analyses (taint, null pointer, etc.)
Bug Detection/Verification:
Apply checkers
Validate properties
Generate reports
Output:
Text reports
JSON/SARIF output
DOT graphs for visualization
Example: Null Pointer Detection
Input: program.bc
↓
[Preprocessing]
↓
[Alias Analysis] → Points-to sets
↓
[Build VFG] → Value flow graph
↓
[Null Flow Analysis] → Track null values
↓
[Check Dereferences] → Identify potential NPDs
↓
Output: Bug report
Pointer Analysis Architectures
AserPTA Architecture
AserPTA uses a constraint-based approach with multiple solver strategies:
LLVM IR
↓
[Preprocessing]
├─ Canonicalize GEP
├─ Lower memcpy
├─ Remove exception handlers
└─ Normalize heap APIs
↓
[Constraint Collection]
├─ addr_of: p = &obj
├─ copy: p = q
├─ load: p = *q
├─ store: *p = q
└─ offset: p = &obj->field
↓
[Constraint Graph]
├─ Nodes: Pointers and Objects
└─ Edges: Constraints
↓
[Solver] (WavePropagation / DeepPropagation)
├─ SCC detection
├─ Differential propagation
└─ Cycle elimination
↓
Points-to Sets
LotusAA Architecture
LotusAA provides flow-sensitive, field-sensitive analysis:
LLVM Module
↓
[Global Analysis]
↓
[Iterative Analysis Loop]
├─ [Bottom-Up Traversal]
│ ├─ Process each function
│ ├─ Build summaries
│ └─ Collect interfaces
│
├─ [Call Graph Construction]
│ ├─ Resolve indirect calls
│ └─ Update call graph
│
└─ [Fixpoint Check]
├─ Detect changes
└─ Reanalyze if needed
↓
Final Points-to Sets + Call Graph
DyckAA Architecture
DyckAA uses Dyck-CFL reachability for precise alias analysis:
LLVM IR
↓
[Build Dyck Graph]
├─ Field edges: *[field]*
├─ Dereference edges: *(*
└─ Assignment edges: *()*
↓
[Dyck-CFL Reachability]
├─ Compute reachability
└─ Unify nodes
↓
[Alias Sets]
↓
[Applications]
├─ Call graph construction
├─ ModRef analysis
└─ Value flow analysis
Abstract Interpretation Flow (CLAM)
LLVM Bitcode
↓
[clam-pp Preprocessing]
├─ Lower unsigned comparisons
├─ Lower select instructions
└─ Devirtualization
↓
[CrabIR Translation]
├─ Convert LLVM IR to CrabIR
├─ Apply memory abstraction
└─ Integrate heap analysis (Sea-DSA)
↓
[Abstract Interpretation]
├─ Select domain (int, zones, oct, pk)
├─ Compute fixpoint
└─ Generate invariants
↓
[Property Checking]
├─ Check assertions
├─ Check null pointers
├─ Check buffer bounds
└─ Check use-after-free
↓
[Output]
├─ Invariants (text/JSON)
├─ Verification results
└─ Warnings/errors
Data Structures
Constraint Graphs
Nodes: Represent pointers and objects
Edges: Represent constraints (copy, load, store, offset)
Super Nodes: Collapsed SCCs for efficiency
Value Flow Graphs
Nodes: Values and memory locations
Edges: Data flow and pointer propagation
Contexts: Call-site or object sensitivity
Program Dependence Graphs
Nodes: Instructions, functions, parameters
Edges: Data dependencies, control dependencies, parameter bindings
Trees: Hierarchical parameter representations for field sensitivity
Standalone Tools
Each major component has standalone command-line tools:
aser-aa,dyck-aa,lotus-aa: Alias analysislotus-kint,lotus-taint,lotus-concur,lotus-pulse: Bug detectionclam,clam-pp,clam-diff: Abstract interpretationpdg-query: PDG queries
Python Integration
Python scripts provide high-level interfaces:
clam.py: Full analysis pipeline with compilationclam-yaml.py: Configuration-based analysisseapy: SeaHorn integration
Performance Considerations
Scalability Strategies
Context Pruning: Limit context depth (K in K-CFA)
SCC Collapsing: Merge strongly connected components
Lazy Evaluation: Compute results on demand
Extension Points
Adding New Analyses
Create LLVM ModulePass or FunctionPass
Implement
runOnModule()orrunOnFunction()Query existing analyses (AA, PDG, etc.)
Register pass in
CMakeLists.txtAdd tool in
tools/directory
Adding New Checkers
Extend
BugDetectorPassbase classImplement checker logic
Report bugs via
BugReportMgrAdd to
lotus-kintor create new tool
Adding New Abstract Domains
Extend CRAB domain interface
Implement lattice operations
Register with CLAM
Add command-line option
See Developer Guide for detailed instructions.