Architecture Overview ===================== This document provides a comprehensive overview of the Lotus architecture, describing how the different components interact and how they are organized. System Architecture ------------------- Lotus is organized into several major subsystems that work together to provide a comprehensive program analysis framework: .. code-block:: text ┌──────────────────────────────────────────────────────────────┐ │ Command-Line Tools │ │ (aser-aa, dyck-aa, lotus-aa, lotus-kint, clam, etc.) │ └────────────────┬─────────────────────────────────────────────┘ │ ┌────────────────┴─────────────────────────────────────────────┐ │ Analysis Applications │ │ - Bug Checkers (Kint, Taint, Concurrency, Pulse) │ │ - Abstract Interpreters (CLAM, SymAbsAI) │ │ - Fuzzing Support (Titan) │ └────┬────────────┬────────────┬────────────┬─────────────────┘ │ │ │ │ ┌────┴────┐ ┌───┴────┐ ┌───┴────┐ ┌────┴─────┐ │ Alias │ │ IR │ │ Data │ │ Solvers │ │ Analysis│ │ (PDG, │ │ Flow │ │ (SMT, │ │ │ │ VFG) │ │ │ │ BDD) │ └────┬────┘ └───┬────┘ └───┬────┘ └────┬─────┘ │ │ │ │ ┌────┴───────────┴───────────┴────────────┴─────────────────┐ │ LLVM Infrastructure │ │ (Module, Function, BasicBlock, etc.) │ └────────────────────────────────────────────────────────────┘ Core Components --------------- 1. **Alias Analysis** (``lib/Alias/``, ``include/Alias/``) Multiple pointer analysis algorithms with different precision-performance trade-offs: - **DyckAA**: Unification-based, exhaustive alias analysis with Dyck-CFL reachability - **AserPTA**: Constraint-based pointer analysis with multiple context sensitivities - **LotusAA**: Flow-sensitive, field-sensitive interprocedural analysis - **Sea-DSA**: Context-sensitive DSA-based memory analysis - **SparrowAA**: Inclusion-based pointer analysis - **FPA**: Function pointer analysis (FLTA, MLTA, MLTADF, KELP) - **SRAA**: Strict Relation Alias Analysis using range analysis - **OriginAA**: Origin-sensitive analysis for thread creation tracking - **DynAA**: Dynamic validation of static alias analysis results 2. **Intermediate Representations** (``lib/IR/``, ``include/IR/``) Specialized graph representations for program analysis: - **PDG (Program Dependence Graph)**: Captures data and control dependencies - **DyckVFG (Value Flow Graph)**: Tracks value flow for Dyck-based analyses - **Call Graphs**: Multiple call graph implementations (PDG, Dyck, AserPTA) 3. **Data Flow Analysis** (``lib/Dataflow/``, ``include/Dataflow/``) IFDS/IDE framework for interprocedural data flow problems: - **Taint Analysis**: Tracks tainted data from sources to sinks - **Reaching Definitions**: Identifies definition-use chains 4. **Constraint Solving** (``lib/Solvers/``, ``include/Solvers/``) Multiple solver backends for different analysis needs: - **SMT Solver**: Z3-based reasoning for precise constraint solving - **BDD Solver**: CUDD-based symbolic set operations - **WPDS**: Weighted pushdown systems for interprocedural reachability 5. **Abstract Interpretation** (``lib/Verification/clam/``, ``lib/Verification/SymAbsAI/``) Numerical and symbolic abstract domains: - **CLAM**: Abstract interpretation with multiple domains (intervals, zones, octagons, polyhedra) - **SymAbsAI**: Configurable abstract interpretation with domain composition 6. **Analysis Utilities** (``lib/Analysis/``, ``include/Analysis/``) Supporting analysis components: - **CFG Analysis**: Control flow graph utilities - **Concurrency Analysis**: Thread-aware analysis (MHP, lock sets) - **Null Pointer Analysis**: Multiple null checking algorithms - **Range Analysis**: Value range analysis for integers 7. **Bug Detection** (``lib/Checker/``) Security and safety bug detection: - **Kint**: Integer overflow, division by zero, array bounds checking - **Taint**: Information flow and taint-style vulnerabilities - **Concurrency**: Race conditions and deadlock detection 8. **CFL Reachability** (``lib/CFL/``, ``include/CFL/``) Context-Free Language reachability engines: - **CSR**: graph indexing for CFL Reachability Analysis Flow ------------- Typical Analysis Pipeline ~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. **LLVM IR Input**: Start with LLVM bitcode (``.bc``) or IR (``.ll``) 2. **Preprocessing** (Optional): - Lower unsupported constructs - Canonicalize representations - Devirtualization - Memory operation normalization 3. **Core Analysis**: - Build call graph - Perform alias analysis - Construct intermediate representations (PDG, VFG) - Run specialized analyses (taint, null pointer, etc.) 4. **Bug Detection/Verification**: - Apply checkers - Validate properties - Generate reports 5. **Output**: - Text reports - JSON/SARIF output - DOT graphs for visualization Example: Null Pointer Detection ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: text Input: program.bc ↓ [Preprocessing] ↓ [Alias Analysis] → Points-to sets ↓ [Build VFG] → Value flow graph ↓ [Null Flow Analysis] → Track null values ↓ [Check Dereferences] → Identify potential NPDs ↓ Output: Bug report Pointer Analysis Architectures ------------------------------- AserPTA Architecture ~~~~~~~~~~~~~~~~~~~~ AserPTA uses a constraint-based approach with multiple solver strategies: .. code-block:: text LLVM IR ↓ [Preprocessing] ├─ Canonicalize GEP ├─ Lower memcpy ├─ Remove exception handlers └─ Normalize heap APIs ↓ [Constraint Collection] ├─ addr_of: p = &obj ├─ copy: p = q ├─ load: p = *q ├─ store: *p = q └─ offset: p = &obj->field ↓ [Constraint Graph] ├─ Nodes: Pointers and Objects └─ Edges: Constraints ↓ [Solver] (WavePropagation / DeepPropagation) ├─ SCC detection ├─ Differential propagation └─ Cycle elimination ↓ Points-to Sets LotusAA Architecture ~~~~~~~~~~~~~~~~~~~~ LotusAA provides flow-sensitive, field-sensitive analysis: .. code-block:: text LLVM Module ↓ [Global Analysis] ↓ [Iterative Analysis Loop] ├─ [Bottom-Up Traversal] │ ├─ Process each function │ ├─ Build summaries │ └─ Collect interfaces │ ├─ [Call Graph Construction] │ ├─ Resolve indirect calls │ └─ Update call graph │ └─ [Fixpoint Check] ├─ Detect changes └─ Reanalyze if needed ↓ Final Points-to Sets + Call Graph DyckAA Architecture ~~~~~~~~~~~~~~~~~~~ DyckAA uses Dyck-CFL reachability for precise alias analysis: .. code-block:: text LLVM IR ↓ [Build Dyck Graph] ├─ Field edges: *[field]* ├─ Dereference edges: *(* └─ Assignment edges: *()* ↓ [Dyck-CFL Reachability] ├─ Compute reachability └─ Unify nodes ↓ [Alias Sets] ↓ [Applications] ├─ Call graph construction ├─ ModRef analysis └─ Value flow analysis Abstract Interpretation Flow (CLAM) ------------------------------------ .. code-block:: text LLVM Bitcode ↓ [clam-pp Preprocessing] ├─ Lower unsigned comparisons ├─ Lower select instructions └─ Devirtualization ↓ [CrabIR Translation] ├─ Convert LLVM IR to CrabIR ├─ Apply memory abstraction └─ Integrate heap analysis (Sea-DSA) ↓ [Abstract Interpretation] ├─ Select domain (int, zones, oct, pk) ├─ Compute fixpoint └─ Generate invariants ↓ [Property Checking] ├─ Check assertions ├─ Check null pointers ├─ Check buffer bounds └─ Check use-after-free ↓ [Output] ├─ Invariants (text/JSON) ├─ Verification results └─ Warnings/errors Data Structures --------------- Constraint Graphs ~~~~~~~~~~~~~~~~~ - **Nodes**: Represent pointers and objects - **Edges**: Represent constraints (copy, load, store, offset) - **Super Nodes**: Collapsed SCCs for efficiency Value Flow Graphs ~~~~~~~~~~~~~~~~~ - **Nodes**: Values and memory locations - **Edges**: Data flow and pointer propagation - **Contexts**: Call-site or object sensitivity Program Dependence Graphs ~~~~~~~~~~~~~~~~~~~~~~~~~~ - **Nodes**: Instructions, functions, parameters - **Edges**: Data dependencies, control dependencies, parameter bindings - **Trees**: Hierarchical parameter representations for field sensitivity Standalone Tools ~~~~~~~~~~~~~~~~ Each major component has standalone command-line tools: - ``aser-aa``, ``dyck-aa``, ``lotus-aa``: Alias analysis - ``lotus-kint``, ``lotus-taint``, ``lotus-concur``, ``lotus-pulse``: Bug detection - ``clam``, ``clam-pp``, ``clam-diff``: Abstract interpretation - ``pdg-query``: PDG queries Python Integration ~~~~~~~~~~~~~~~~~~ Python scripts provide high-level interfaces: - ``clam.py``: Full analysis pipeline with compilation - ``clam-yaml.py``: Configuration-based analysis - ``seapy``: SeaHorn integration Performance Considerations -------------------------- Scalability Strategies ~~~~~~~~~~~~~~~~~~~~~~ 1. **Context Pruning**: Limit context depth (K in K-CFA) 2. **SCC Collapsing**: Merge strongly connected components 3. **Lazy Evaluation**: Compute results on demand Extension Points ---------------- Adding New Analyses ~~~~~~~~~~~~~~~~~~~ 1. Create LLVM ModulePass or FunctionPass 2. Implement ``runOnModule()`` or ``runOnFunction()`` 3. Query existing analyses (AA, PDG, etc.) 4. Register pass in ``CMakeLists.txt`` 5. Add tool in ``tools/`` directory Adding New Checkers ~~~~~~~~~~~~~~~~~~~ 1. Extend ``BugDetectorPass`` base class 2. Implement checker logic 3. Report bugs via ``BugReportMgr`` 4. Add to ``lotus-kint`` or create new tool Adding New Abstract Domains ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. Extend CRAB domain interface 2. Implement lattice operations 3. Register with CLAM 4. Add command-line option See :doc:`../developer/developer_guide` for detailed instructions.