Architecture Overview

This document provides a comprehensive overview of the Lotus architecture, describing how the different components interact and how they are organized.

System Architecture

Lotus is organized into several major subsystems that work together to provide a comprehensive program analysis framework:

┌──────────────────────────────────────────────────────────────┐
│                      Command-Line Tools                       │
│  (lotus-alias-aser-aa, lotus-alias-dyck-aa, lotus-check, clam, etc.) │
└────────────────┬─────────────────────────────────────────────┘
                 │
┌────────────────┴─────────────────────────────────────────────┐
│                    Analysis Applications                      │
│  - Bug Checkers (Kint, Taint, Concurrency, Pulse, SymEx)    │
│  - Abstract Interpreters (CLAM, SymAbsAI)                               │
│  - Fuzzing Support (Titan)                                   │
└────┬────────────┬────────────┬────────────┬─────────────────┘
     │            │            │            │
┌────┴────┐  ┌───┴────┐  ┌───┴────┐  ┌────┴─────┐
│ Alias   │  │  IR    │  │ Data   │  │ Solvers  │
│ Analysis│  │ (PDG,  │  │ Flow   │  │ (SMT,    │
│         │  │  VFG)  │  │        │  │  BDD)    │
└────┬────┘  └───┬────┘  └───┬────┘  └────┬─────┘
     │           │           │            │
┌────┴───────────┴───────────┴────────────┴─────────────────┐
│                    LLVM Infrastructure                      │
│            (Module, Function, BasicBlock, etc.)            │
└────────────────────────────────────────────────────────────┘

Core Components

Alias Analysis (lib/Alias/, include/Alias/)

Multiple pointer analysis algorithms with different precision-performance trade-offs:
- DyckAA: Unification-based, exhaustive alias analysis with Dyck-CFL reachability
- AserPTA: Constraint-based pointer analysis with multiple context sensitivities (including k-callsite and k-origin/thread-creation sensitivity)
- LotusAA: Flow-sensitive, field-sensitive interprocedural analysis
- Sea-DSA: Context-sensitive DSA-based memory analysis
- SparrowAA: Inclusion-based pointer analysis
- FPA: Function pointer analysis (FLTA, MLTA, MLTADF, KELP)
- SRAA: Strict Relation Alias Analysis using range analysis
- DynAA: Dynamic validation of static alias analysis results
Intermediate Representations (lib/IR/, include/IR/)

Specialized graph representations for program analysis:
- PDG (Program Dependence Graph): Captures data and control dependencies
- DyckVFG (Value Flow Graph): Tracks value flow for Dyck-based analyses
- Call Graphs: Multiple call graph implementations (PDG, Dyck, AserPTA)
Data Flow Analysis (lib/Dataflow/, include/Dataflow/)

IFDS/IDE framework for interprocedural data flow problems:
- Taint Analysis: Tracks tainted data from sources to sinks
- Reaching Definitions: Identifies definition-use chains
Constraint Solving (lib/Solvers/, include/Solvers/)

Multiple solver backends for different analysis needs:
- SMT Solver: Z3-based reasoning for precise constraint solving
- BDD Solver: CUDD-based symbolic set operations
- WPDS: Weighted pushdown systems for interprocedural reachability
Abstract Interpretation (lib/Verification/clam/, lib/Verification/SymAbsAI/)

Numerical and symbolic abstract domains:
- CLAM: Abstract interpretation with multiple domains (intervals, zones, octagons, polyhedra)
- SymAbsAI: Configurable abstract interpretation with domain composition
Symbolic Execution (lib/SymbolicExecution/, include/SymbolicExecution/)

Path-sensitive symbolic execution infrastructure used by the symex checker frontend. See Symbolic Execution for the engine documentation.
- AnalysisDriver: Coordinates whole-module symbolic execution and summaries
- AnalysisState: Tracks symbolic memory, guarded facts, and path conditions
- PathCondSolver: Discharges symbolic path feasibility queries through SMT
- SymbolicExecutionWrapper: Connects the engine to the LLVM pass pipeline
Analysis Utilities (lib/Analysis/, include/Analysis/)

Supporting analysis components:
- CFG Analysis: Control flow graph utilities
- Concurrency Analysis: Thread-aware analysis (MHP, lock sets)
- Null Pointer Analysis: Multiple null checking algorithms
- Range Analysis: Value range analysis for integers
Bug Detection (lib/Checker/)

Security and safety bug detection:
- Kint: Integer overflow, division by zero, array bounds checking
- Taint: Information flow and taint-style vulnerabilities
- Concurrency: Race conditions and deadlock detection
- SymEx: Symbolic-execution checks through the top-level SymbolicExecution engine
CFL Reachability (lib/CFL/, include/CFL/)

Context-Free Language reachability engines:
- CSR: graph indexing for CFL Reachability

Analysis Flow

Typical Analysis Pipeline

LLVM IR Input: Start with LLVM bitcode (.bc) or IR (.ll)
Preprocessing (Optional):
- Lower unsupported constructs
- Canonicalize representations
- Devirtualization
- Memory operation normalization
Core Analysis:
- Build call graph
- Perform alias analysis
- Construct intermediate representations (PDG, VFG)
- Run specialized analyses (taint, null pointer, etc.)
Bug Detection/Verification:
- Apply checkers
- Validate properties
- Generate reports
Output:
- Text reports
- JSON/SARIF output
- DOT graphs for visualization

Example: Null Pointer Detection

Input: program.bc
   ↓
[Preprocessing]
   ↓
[Alias Analysis] → Points-to sets
   ↓
[Build VFG] → Value flow graph
   ↓
[Null Flow Analysis] → Track null values
   ↓
[Check Dereferences] → Identify potential NPDs
   ↓
Output: Bug report

Pointer Analysis Architectures

AserPTA Architecture

AserPTA uses a constraint-based approach with multiple solver strategies:

LLVM IR
   ↓
[Preprocessing]
   ├─ Canonicalize GEP
   ├─ Lower memcpy
   ├─ Remove exception handlers
   └─ Normalize heap APIs
   ↓
[Constraint Collection]
   ├─ addr_of: p = &obj
   ├─ copy: p = q
   ├─ load: p = *q
   ├─ store: *p = q
   └─ offset: p = &obj->field
   ↓
[Constraint Graph]
   ├─ Nodes: Pointers and Objects
   └─ Edges: Constraints
   ↓
[Solver] (WavePropagation / DeepPropagation)
   ├─ SCC detection
   ├─ Differential propagation
   └─ Cycle elimination
   ↓
Points-to Sets

LotusAA Architecture

LotusAA provides flow-sensitive, field-sensitive analysis:

LLVM Module
   ↓
[Global Analysis]
   ↓
[Iterative Analysis Loop]
   ├─ [Bottom-Up Traversal]
   │     ├─ Process each function
   │     ├─ Build summaries
   │     └─ Collect interfaces
   │
   ├─ [Call Graph Construction]
   │     ├─ Resolve indirect calls
   │     └─ Update call graph
   │
   └─ [Fixpoint Check]
         ├─ Detect changes
         └─ Reanalyze if needed
   ↓
Final Points-to Sets + Call Graph

DyckAA Architecture

DyckAA uses Dyck-CFL reachability for precise alias analysis:

LLVM IR
   ↓
[Build Dyck Graph]
   ├─ Field edges: *[field]*
   ├─ Dereference edges: *(*
   └─ Assignment edges: *()*
   ↓
[Dyck-CFL Reachability]
   ├─ Compute reachability
   └─ Unify nodes
   ↓
[Alias Sets]
   ↓
[Applications]
   ├─ Call graph construction
   ├─ ModRef analysis
   └─ Value flow analysis

Abstract Interpretation Flow (CLAM)

LLVM Bitcode
   ↓
[clam-pp Preprocessing]
   ├─ Lower unsigned comparisons
   ├─ Lower select instructions
   └─ Devirtualization
   ↓
[CrabIR Translation]
   ├─ Convert LLVM IR to CrabIR
   ├─ Apply memory abstraction
   └─ Integrate heap analysis (Sea-DSA)
   ↓
[Abstract Interpretation]
   ├─ Select domain (int, zones, oct, pk)
   ├─ Compute fixpoint
   └─ Generate invariants
   ↓
[Property Checking]
   ├─ Check assertions
   ├─ Check null pointers
   ├─ Check buffer bounds
   └─ Check use-after-free
   ↓
[Output]
   ├─ Invariants (text/JSON)
   ├─ Verification results
   └─ Warnings/errors

Data Structures

Constraint Graphs

Nodes: Represent pointers and objects
Edges: Represent constraints (copy, load, store, offset)
Super Nodes: Collapsed SCCs for efficiency

Value Flow Graphs

Nodes: Values and memory locations
Edges: Data flow and pointer propagation
Contexts: Call-site or object sensitivity

Program Dependence Graphs

Nodes: Instructions, functions, parameters
Edges: Data dependencies, control dependencies, parameter bindings
Trees: Hierarchical parameter representations for field sensitivity

Standalone Tools

Each major component has standalone command-line tools. The bug-detection frontends have been unified under a single lotus-check binary with subcommands:

Alias analysis: - lotus-alias-aser-aa, lotus-alias-dyck-aa, lotus-alias-lotus-aa

Unified checker frontend (``lotus-check``): - lotus-check kint — Integer overflow, division by zero, array bounds - lotus-check taint — Interprocedural taint analysis - lotus-check concur — Concurrency bug detection (races, deadlocks, OpenMP, MPI) - lotus-check pulse — Biabductive memory-safety analysis - lotus-check fitx — Typestate-based bug detection - lotus-check saber — Source-sink bug detection (leaks, double-free) - lotus-check ae — Abstract-execution-based checking - lotus-check symex — Symbolic-execution bug checking

PDG query: - lotus-ir-pdg-query — PDG Cypher queries, slicing, chopping, resource-flow analysis

Abstract interpretation: - clam, clam-pp, clam-diff

Python Integration

Python scripts provide high-level interfaces:

clam.py: Full analysis pipeline with compilation
clam-yaml.py: Configuration-based analysis
seapy: SeaHorn integration

Performance Considerations

Scalability Strategies

Context Pruning: Limit context depth (K in K-CFA)
SCC Collapsing: Merge strongly connected components
Lazy Evaluation: Compute results on demand

Extension Points

Adding New Analyses

Create LLVM ModulePass or FunctionPass
Implement runOnModule() or runOnFunction()
Query existing analyses (AA, PDG, etc.)
Register pass in CMakeLists.txt
Add tool in tools/ directory