Architecture Overview
=====================

This document provides a comprehensive overview of the Lotus architecture, describing how the different components interact and how they are organized.

System Architecture
-------------------

Lotus is organized into several major subsystems that work together to provide a comprehensive program analysis framework:

.. code-block:: text

   ┌──────────────────────────────────────────────────────────────┐
   │                      Command-Line Tools                       │
   │  (aser-aa, dyck-aa, lotus-aa, lotus-kint, clam, etc.)       │
   └────────────────┬─────────────────────────────────────────────┘
                    │
   ┌────────────────┴─────────────────────────────────────────────┐
   │                    Analysis Applications                      │
   │  - Bug Checkers (Kint, Taint, Concurrency, Pulse)           │
   │  - Abstract Interpreters (CLAM, SymAbsAI)                               │
   │  - Fuzzing Support (Titan)                                   │
   └────┬────────────┬────────────┬────────────┬─────────────────┘
        │            │            │            │
   ┌────┴────┐  ┌───┴────┐  ┌───┴────┐  ┌────┴─────┐
   │ Alias   │  │  IR    │  │ Data   │  │ Solvers  │
   │ Analysis│  │ (PDG,  │  │ Flow   │  │ (SMT,    │
   │         │  │  VFG)  │  │        │  │  BDD)    │
   └────┬────┘  └───┬────┘  └───┬────┘  └────┬─────┘
        │           │           │            │
   ┌────┴───────────┴───────────┴────────────┴─────────────────┐
   │                    LLVM Infrastructure                      │
   │            (Module, Function, BasicBlock, etc.)            │
   └────────────────────────────────────────────────────────────┘

Core Components
---------------

1. **Alias Analysis** (``lib/Alias/``, ``include/Alias/``)
   
   Multiple pointer analysis algorithms with different precision-performance trade-offs:
   
   - **DyckAA**: Unification-based, exhaustive alias analysis with Dyck-CFL reachability
   - **AserPTA**: Constraint-based pointer analysis with multiple context sensitivities
   - **LotusAA**: Flow-sensitive, field-sensitive interprocedural analysis
   - **Sea-DSA**: Context-sensitive DSA-based memory analysis
   - **SparrowAA**: Inclusion-based pointer analysis
   - **FPA**: Function pointer analysis (FLTA, MLTA, MLTADF, KELP)
   - **SRAA**: Strict Relation Alias Analysis using range analysis
   - **OriginAA**: Origin-sensitive analysis for thread creation tracking
   - **DynAA**: Dynamic validation of static alias analysis results

2. **Intermediate Representations** (``lib/IR/``, ``include/IR/``)
   
   Specialized graph representations for program analysis:
   
   - **PDG (Program Dependence Graph)**: Captures data and control dependencies
   - **DyckVFG (Value Flow Graph)**: Tracks value flow for Dyck-based analyses
   - **Call Graphs**: Multiple call graph implementations (PDG, Dyck, AserPTA)

3. **Data Flow Analysis** (``lib/Dataflow/``, ``include/Dataflow/``)
   
   IFDS/IDE framework for interprocedural data flow problems:
   
   - **Taint Analysis**: Tracks tainted data from sources to sinks
   - **Reaching Definitions**: Identifies definition-use chains
4. **Constraint Solving** (``lib/Solvers/``, ``include/Solvers/``)
   
   Multiple solver backends for different analysis needs:
   
   - **SMT Solver**: Z3-based reasoning for precise constraint solving
   - **BDD Solver**: CUDD-based symbolic set operations
   - **WPDS**: Weighted pushdown systems for interprocedural reachability

5. **Abstract Interpretation** (``lib/Verification/clam/``, ``lib/Verification/SymAbsAI/``)
   
   Numerical and symbolic abstract domains:
   
   - **CLAM**: Abstract interpretation with multiple domains (intervals, zones, octagons, polyhedra)
   - **SymAbsAI**: Configurable abstract interpretation with domain composition

6. **Analysis Utilities** (``lib/Analysis/``, ``include/Analysis/``)
   
   Supporting analysis components:
   
   - **CFG Analysis**: Control flow graph utilities
   - **Concurrency Analysis**: Thread-aware analysis (MHP, lock sets)
   - **Null Pointer Analysis**: Multiple null checking algorithms
   - **Range Analysis**: Value range analysis for integers

7. **Bug Detection** (``lib/Checker/``)
   
   Security and safety bug detection:
   
   - **Kint**: Integer overflow, division by zero, array bounds checking
   - **Taint**: Information flow and taint-style vulnerabilities
   - **Concurrency**: Race conditions and deadlock detection

8. **CFL Reachability** (``lib/CFL/``, ``include/CFL/``)
   
   Context-Free Language reachability engines:
   
   - **CSR**: graph indexing for CFL Reachability

Analysis Flow
-------------

Typical Analysis Pipeline
~~~~~~~~~~~~~~~~~~~~~~~~~~

1. **LLVM IR Input**: Start with LLVM bitcode (``.bc``) or IR (``.ll``)

2. **Preprocessing** (Optional):
   
   - Lower unsupported constructs
   - Canonicalize representations
   - Devirtualization
   - Memory operation normalization

3. **Core Analysis**:
   
   - Build call graph
   - Perform alias analysis
   - Construct intermediate representations (PDG, VFG)
   - Run specialized analyses (taint, null pointer, etc.)

4. **Bug Detection/Verification**:
   
   - Apply checkers
   - Validate properties
   - Generate reports

5. **Output**:
   
   - Text reports
   - JSON/SARIF output
   - DOT graphs for visualization

Example: Null Pointer Detection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: text

   Input: program.bc
      ↓
   [Preprocessing]
      ↓
   [Alias Analysis] → Points-to sets
      ↓
   [Build VFG] → Value flow graph
      ↓
   [Null Flow Analysis] → Track null values
      ↓
   [Check Dereferences] → Identify potential NPDs
      ↓
   Output: Bug report

Pointer Analysis Architectures
-------------------------------

AserPTA Architecture
~~~~~~~~~~~~~~~~~~~~

AserPTA uses a constraint-based approach with multiple solver strategies:

.. code-block:: text

   LLVM IR
      ↓
   [Preprocessing]
      ├─ Canonicalize GEP
      ├─ Lower memcpy
      ├─ Remove exception handlers
      └─ Normalize heap APIs
      ↓
   [Constraint Collection]
      ├─ addr_of: p = &obj
      ├─ copy: p = q
      ├─ load: p = *q
      ├─ store: *p = q
      └─ offset: p = &obj->field
      ↓
   [Constraint Graph]
      ├─ Nodes: Pointers and Objects
      └─ Edges: Constraints
      ↓
   [Solver] (WavePropagation / DeepPropagation)
      ├─ SCC detection
      ├─ Differential propagation
      └─ Cycle elimination
      ↓
   Points-to Sets

LotusAA Architecture
~~~~~~~~~~~~~~~~~~~~

LotusAA provides flow-sensitive, field-sensitive analysis:

.. code-block:: text

   LLVM Module
      ↓
   [Global Analysis]
      ↓
   [Iterative Analysis Loop]
      ├─ [Bottom-Up Traversal]
      │     ├─ Process each function
      │     ├─ Build summaries
      │     └─ Collect interfaces
      │
      ├─ [Call Graph Construction]
      │     ├─ Resolve indirect calls
      │     └─ Update call graph
      │
      └─ [Fixpoint Check]
            ├─ Detect changes
            └─ Reanalyze if needed
      ↓
   Final Points-to Sets + Call Graph

DyckAA Architecture
~~~~~~~~~~~~~~~~~~~

DyckAA uses Dyck-CFL reachability for precise alias analysis:

.. code-block:: text

   LLVM IR
      ↓
   [Build Dyck Graph]
      ├─ Field edges: *[field]*
      ├─ Dereference edges: *(*
      └─ Assignment edges: *()*
      ↓
   [Dyck-CFL Reachability]
      ├─ Compute reachability
      └─ Unify nodes
      ↓
   [Alias Sets]
      ↓
   [Applications]
      ├─ Call graph construction
      ├─ ModRef analysis
      └─ Value flow analysis

Abstract Interpretation Flow (CLAM)
------------------------------------

.. code-block:: text

   LLVM Bitcode
      ↓
   [clam-pp Preprocessing]
      ├─ Lower unsigned comparisons
      ├─ Lower select instructions
      └─ Devirtualization
      ↓
   [CrabIR Translation]
      ├─ Convert LLVM IR to CrabIR
      ├─ Apply memory abstraction
      └─ Integrate heap analysis (Sea-DSA)
      ↓
   [Abstract Interpretation]
      ├─ Select domain (int, zones, oct, pk)
      ├─ Compute fixpoint
      └─ Generate invariants
      ↓
   [Property Checking]
      ├─ Check assertions
      ├─ Check null pointers
      ├─ Check buffer bounds
      └─ Check use-after-free
      ↓
   [Output]
      ├─ Invariants (text/JSON)
      ├─ Verification results
      └─ Warnings/errors

Data Structures
---------------


Constraint Graphs
~~~~~~~~~~~~~~~~~

- **Nodes**: Represent pointers and objects
- **Edges**: Represent constraints (copy, load, store, offset)
- **Super Nodes**: Collapsed SCCs for efficiency

Value Flow Graphs
~~~~~~~~~~~~~~~~~

- **Nodes**: Values and memory locations
- **Edges**: Data flow and pointer propagation
- **Contexts**: Call-site or object sensitivity

Program Dependence Graphs
~~~~~~~~~~~~~~~~~~~~~~~~~~

- **Nodes**: Instructions, functions, parameters
- **Edges**: Data dependencies, control dependencies, parameter bindings
- **Trees**: Hierarchical parameter representations for field sensitivity


Standalone Tools
~~~~~~~~~~~~~~~~

Each major component has standalone command-line tools:

- ``aser-aa``, ``dyck-aa``, ``lotus-aa``: Alias analysis
- ``lotus-kint``, ``lotus-taint``, ``lotus-concur``, ``lotus-pulse``: Bug detection
- ``clam``, ``clam-pp``, ``clam-diff``: Abstract interpretation
- ``pdg-query``: PDG queries

Python Integration
~~~~~~~~~~~~~~~~~~

Python scripts provide high-level interfaces:

- ``clam.py``: Full analysis pipeline with compilation
- ``clam-yaml.py``: Configuration-based analysis
- ``seapy``: SeaHorn integration

Performance Considerations
--------------------------

Scalability Strategies
~~~~~~~~~~~~~~~~~~~~~~

1. **Context Pruning**: Limit context depth (K in K-CFA)
2. **SCC Collapsing**: Merge strongly connected components
3. **Lazy Evaluation**: Compute results on demand


Extension Points
----------------

Adding New Analyses
~~~~~~~~~~~~~~~~~~~

1. Create LLVM ModulePass or FunctionPass
2. Implement ``runOnModule()`` or ``runOnFunction()``
3. Query existing analyses (AA, PDG, etc.)
4. Register pass in ``CMakeLists.txt``
5. Add tool in ``tools/`` directory

Adding New Checkers
~~~~~~~~~~~~~~~~~~~

1. Extend ``BugDetectorPass`` base class
2. Implement checker logic
3. Report bugs via ``BugReportMgr``
4. Add to ``lotus-kint`` or create new tool

Adding New Abstract Domains
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1. Extend CRAB domain interface
2. Implement lattice operations
3. Register with CLAM
4. Add command-line option

See :doc:`../developer/developer_guide` for detailed instructions.