TPA: Flow- and Contex-Sensitive Pointer Analysis
Overview
TPA is an inclusion-based, flow- and context-sensitive pointer analysis framework with k-limiting support. It uses a semi-sparse program representation to achieve both precision and scalability for large C/C++ programs.
Location:
lib/Alias/TPA
Architecture
The analysis pipeline is organized as:
LLVM IR
↓
[IR Transforms]
├─ Expand GEP instructions
├─ Expand constant expressions
├─ Expand indirect branches
├─ Fold int-to-ptr casts
└─ Global cleanup
↓
[Front-End Processing]
├─ Type collection and analysis
├─ Array/pointer layout analysis
├─ Struct cast analysis
├─ CFG building and simplification
├─ Function translation
└─ Semi-sparse program construction
↓
[Global Pointer Analysis]
├─ Initialize global variables
├─ Initialize functions
└─ Set up special pointers (null, universal)
↓
[Semi-Sparse Analysis]
├─ Worklist-based propagation
├─ Transfer function evaluation
├─ Context management
└─ Store pruning
↓
Points-to Sets (Env and Store)
Core Components
Front-End (
FrontEnd/) - TypeCollector: Collects and analyzes types from the LLVM module - TypeAnalysis: Performs type-based analysis - ArrayLayoutAnalysis: Analyzes array memory layouts - PointerLayoutAnalysis: Analyzes pointer memory layouts - StructCastAnalysis: Handles structure casting - CFGBuilder: Builds control-flow graphs from LLVM functions - CFGSimplifier: Simplifies CFGs for efficiency - FunctionTranslator: Translates LLVM functions to CFG representation - InstructionTranslator: Translates LLVM instructions to CFG nodes - SemiSparseProgramBuilder: Constructs the semi-sparse program representation - PriorityAssigner: Assigns priorities to CFG nodesProgram Representation (
Program/) - SemiSparseProgram: Main program representation containing CFGs - CFG: Control-flow graph with nodes for different instruction types - CFGNode: Base class for CFG nodes (Entry, Alloc, Copy, Offset, Load, Store, Call, Return)Memory Model (
MemoryModel/) - MemoryManager: Manages memory objects and allocations - PointerManager: Manages pointer values and their contexts - TypeLayout: Represents type layouts for memory modeling - ArrayLayout: Handles array memory layouts - PointerLayout: Handles pointer memory layoutsAnalysis Engine (
Engine/) - SemiSparsePointerAnalysis: Main analysis class - GlobalPointerAnalysis: Initializes global state - TransferFunction: Evaluates transfer functions for CFG nodes - SemiSparsePropagator: Propagates facts through the program - Initializer: Sets up initial analysis state - StorePruner: Prunes unnecessary store information - ExternalCallAnalysis: Handles external function calls - Transfer/: Specialized transfer functions:AllocTransfer: Memory allocation
CopyTransfer: Pointer assignments
OffsetTransfer: Field/array offset operations
LoadTransfer: Pointer dereferences (loads)
StoreTransfer: Pointer stores
CallTransfer: Function calls
ReturnTransfer: Function returns
EntryTransfer: Function entry points
Context Management (
Context/) - Context: Base context representation (call stack) - KLimitContext: K-limiting context sensitivity (limits call stack depth) - AdaptiveContext: Adaptive context sensitivity (selective tracking)Precision Tracking (
Precision/) - PrecisionLossTracker: Tracks precision loss during analysis - ValueDependenceTracker: Tracks value dependenciesOutput (
Output/) - CFGNodePrinter: Prints CFG nodes - ContextPrinter: Prints contexts - MemoryPrinter: Prints memory objects - NodePrinter: General node printing - WriteDotFile: Writes DOT graph files
Transfer Functions
TPA uses transfer functions to model the effect of each CFG node on the points-to information:
Entry: Initialize function parameters
Alloc: Create new memory objects (heap/stack allocations)
Copy: Pointer assignment (
p = q)Offset: Field/array access (
p = &obj->fieldorp = &arr[i])Load: Pointer dereference (
p = *q)Store: Store through pointer (
*p = q)Call: Function call handling with context sensitivity
Return: Return value propagation
Context Sensitivity
TPA supports multiple context sensitivity models:
Basic Context (
Context) - Full call stack tracking - Unbounded context depth - Most precise but potentially expensiveK-Limited Context (
KLimitContext) - Limits call stack depth to K - Configurable limit (default can be set) - Balances precision and scalabilityAdaptive Context (
AdaptiveContext) - Selectively tracks important call sites - Reduces context explosion for less critical paths - Configurable tracking policy
Memory Model
TPA uses a field-sensitive memory model:
Type-based Layout: Memory objects are organized by type
Array Layout: Arrays are modeled with element-level precision
Pointer Layout: Pointer fields are tracked separately
Struct Layout: Structure fields are modeled individually
The memory model distinguishes between: - Top-level pointers: Variables and function parameters - Memory objects: Heap allocations, stack allocations, globals - Store: Maps memory objects to their points-to sets
Semi-Sparse Representation
The semi-sparse representation reduces the number of program points that need to be analyzed:
Only def-use chains are tracked (not all program points)
CFG nodes are classified by their effect on pointer information
The worklist algorithm processes only relevant nodes
Store pruning removes redundant information
Analysis Algorithm
The analysis follows this workflow:
Initialization: - Build semi-sparse program from LLVM IR - Initialize global variables and functions - Set up special pointers (null, universal)
Worklist Processing: - Start from entry function with initial store - For each program point in worklist:
Evaluate transfer function
Update points-to information
Propagate to successors
Add changed successors to worklist
Context Handling: - At call sites: push new context - At returns: pop context and merge results - K-limiting or adaptive strategies control context growth
Convergence: - Analysis terminates when worklist is empty - Memoization prevents redundant computations
Usage
TPA is typically used through the Lotus alias analysis wrapper. It can be configured via:
Context sensitivity mode: Basic, K-limited, or adaptive
K-limit value: For K-limited context sensitivity
External pointer table: For modeling external library functions
Precision tracking: Enable precision loss tracking
Integration
TPA integrates with other Lotus components:
Annotation System: Uses external pointer annotations for library functions
Alias Analysis Wrapper: Provides unified interface for alias queries
Type System: Leverages LLVM type information for memory modeling
The analysis results (points-to sets) can be queried through the
PointerAnalysis base class interface, which provides methods to:
- Get points-to sets for pointers
- Resolve indirect call targets
- Query alias relationships