=========================== TPA: Flow- and Contex-Sensitive Pointer Analysis =========================== Overview ======== TPA is an **inclusion-based**, **flow- and context-sensitive** pointer analysis framework with k-limiting support. It uses a **semi-sparse** program representation to achieve both precision and scalability for large C/C++ programs. * **Location**: ``lib/Alias/TPA`` Architecture ============ The analysis pipeline is organized as: .. code-block:: text LLVM IR ↓ [IR Transforms] ├─ Expand GEP instructions ├─ Expand constant expressions ├─ Expand indirect branches ├─ Fold int-to-ptr casts └─ Global cleanup ↓ [Front-End Processing] ├─ Type collection and analysis ├─ Array/pointer layout analysis ├─ Struct cast analysis ├─ CFG building and simplification ├─ Function translation └─ Semi-sparse program construction ↓ [Global Pointer Analysis] ├─ Initialize global variables ├─ Initialize functions └─ Set up special pointers (null, universal) ↓ [Semi-Sparse Analysis] ├─ Worklist-based propagation ├─ Transfer function evaluation ├─ Context management └─ Store pruning ↓ Points-to Sets (Env and Store) Core Components =============== 1. **Front-End** (``FrontEnd/``) - **TypeCollector**: Collects and analyzes types from the LLVM module - **TypeAnalysis**: Performs type-based analysis - **ArrayLayoutAnalysis**: Analyzes array memory layouts - **PointerLayoutAnalysis**: Analyzes pointer memory layouts - **StructCastAnalysis**: Handles structure casting - **CFGBuilder**: Builds control-flow graphs from LLVM functions - **CFGSimplifier**: Simplifies CFGs for efficiency - **FunctionTranslator**: Translates LLVM functions to CFG representation - **InstructionTranslator**: Translates LLVM instructions to CFG nodes - **SemiSparseProgramBuilder**: Constructs the semi-sparse program representation - **PriorityAssigner**: Assigns priorities to CFG nodes 2. **Program Representation** (``Program/``) - **SemiSparseProgram**: Main program representation containing CFGs - **CFG**: Control-flow graph with nodes for different instruction types - **CFGNode**: Base class for CFG nodes (Entry, Alloc, Copy, Offset, Load, Store, Call, Return) 3. **Memory Model** (``MemoryModel/``) - **MemoryManager**: Manages memory objects and allocations - **PointerManager**: Manages pointer values and their contexts - **TypeLayout**: Represents type layouts for memory modeling - **ArrayLayout**: Handles array memory layouts - **PointerLayout**: Handles pointer memory layouts 4. **Analysis Engine** (``Engine/``) - **SemiSparsePointerAnalysis**: Main analysis class - **GlobalPointerAnalysis**: Initializes global state - **TransferFunction**: Evaluates transfer functions for CFG nodes - **SemiSparsePropagator**: Propagates facts through the program - **Initializer**: Sets up initial analysis state - **StorePruner**: Prunes unnecessary store information - **ExternalCallAnalysis**: Handles external function calls - **Transfer/**: Specialized transfer functions: - **AllocTransfer**: Memory allocation - **CopyTransfer**: Pointer assignments - **OffsetTransfer**: Field/array offset operations - **LoadTransfer**: Pointer dereferences (loads) - **StoreTransfer**: Pointer stores - **CallTransfer**: Function calls - **ReturnTransfer**: Function returns - **EntryTransfer**: Function entry points 5. **Context Management** (``Context/``) - **Context**: Base context representation (call stack) - **KLimitContext**: K-limiting context sensitivity (limits call stack depth) - **AdaptiveContext**: Adaptive context sensitivity (selective tracking) 6. **Precision Tracking** (``Precision/``) - **PrecisionLossTracker**: Tracks precision loss during analysis - **ValueDependenceTracker**: Tracks value dependencies 7. **Output** (``Output/``) - **CFGNodePrinter**: Prints CFG nodes - **ContextPrinter**: Prints contexts - **MemoryPrinter**: Prints memory objects - **NodePrinter**: General node printing - **WriteDotFile**: Writes DOT graph files Transfer Functions ================== TPA uses transfer functions to model the effect of each CFG node on the points-to information: 1. **Entry**: Initialize function parameters 2. **Alloc**: Create new memory objects (heap/stack allocations) 3. **Copy**: Pointer assignment (``p = q``) 4. **Offset**: Field/array access (``p = &obj->field`` or ``p = &arr[i]``) 5. **Load**: Pointer dereference (``p = *q``) 6. **Store**: Store through pointer (``*p = q``) 7. **Call**: Function call handling with context sensitivity 8. **Return**: Return value propagation Context Sensitivity =================== TPA supports multiple context sensitivity models: 1. **Basic Context** (``Context``) - Full call stack tracking - Unbounded context depth - Most precise but potentially expensive 2. **K-Limited Context** (``KLimitContext``) - Limits call stack depth to K - Configurable limit (default can be set) - Balances precision and scalability 3. **Adaptive Context** (``AdaptiveContext``) - Selectively tracks important call sites - Reduces context explosion for less critical paths - Configurable tracking policy Memory Model ============ TPA uses a field-sensitive memory model: - **Type-based Layout**: Memory objects are organized by type - **Array Layout**: Arrays are modeled with element-level precision - **Pointer Layout**: Pointer fields are tracked separately - **Struct Layout**: Structure fields are modeled individually The memory model distinguishes between: - **Top-level pointers**: Variables and function parameters - **Memory objects**: Heap allocations, stack allocations, globals - **Store**: Maps memory objects to their points-to sets Semi-Sparse Representation =========================== The semi-sparse representation reduces the number of program points that need to be analyzed: - Only **def-use** chains are tracked (not all program points) - CFG nodes are classified by their effect on pointer information - The worklist algorithm processes only relevant nodes - Store pruning removes redundant information Analysis Algorithm ================== The analysis follows this workflow: 1. **Initialization**: - Build semi-sparse program from LLVM IR - Initialize global variables and functions - Set up special pointers (null, universal) 2. **Worklist Processing**: - Start from entry function with initial store - For each program point in worklist: - Evaluate transfer function - Update points-to information - Propagate to successors - Add changed successors to worklist 3. **Context Handling**: - At call sites: push new context - At returns: pop context and merge results - K-limiting or adaptive strategies control context growth 4. **Convergence**: - Analysis terminates when worklist is empty - Memoization prevents redundant computations Usage ===== TPA is typically used through the Lotus alias analysis wrapper. It can be configured via: * **Context sensitivity mode**: Basic, K-limited, or adaptive * **K-limit value**: For K-limited context sensitivity * **External pointer table**: For modeling external library functions * **Precision tracking**: Enable precision loss tracking Integration =========== TPA integrates with other Lotus components: * **Annotation System**: Uses external pointer annotations for library functions * **Alias Analysis Wrapper**: Provides unified interface for alias queries * **Type System**: Leverages LLVM type information for memory modeling The analysis results (points-to sets) can be queried through the ``PointerAnalysis`` base class interface, which provides methods to: - Get points-to sets for pointers - Resolve indirect call targets - Query alias relationships