SVFG — Sparse Value-Flow Graph

Overview

Sparse Value-Flow Graph (SVFG) is a sparse representation of value flows across a whole program, enabling efficient interprocedural analysis. The SVFG combines Static Single Assignment (SSA) form with Memory SSA to track both top-level pointers (address-taken variables) and their value flows through memory operations.

The SVFG provides a unified graph structure where:

  • Top-level pointers (SSA variables) flow through copy, GEP, and arithmetic operations.

  • Address-taken variables (heap, stack, globals) flow through load/store via Memory SSA.

  • Interprocedural value flow is captured through parameter passing and return values.

Key Features

  • Sparse Value-Flow Representation: Only defines and uses are connected, avoiding exhaustive memory modeling.

  • Memory SSA Integration: Tracks value flows through memory via Mu (use) and Chi (def) nodes.

  • Points-To Set Association: Indirect edges carry points-to sets for precise alias analysis.

  • Interprocedural Flow: Complete call/return edges connecting actual and formal parameters.

  • On-the-Fly Call Graph Refinement: Supports demand-driven indirect call resolution.

  • AserPTA Integration: Uses AserPTA for field-sensitive pointer analysis.

Node Types

Statement Nodes (Top-Level Pointers)

These nodes represent operations on top-level pointers (SSA variables):

  • AddrSVFGNode: Address-taking instructions (alloca, malloc, address-of). Creates an abstract object and defines a pointer to it.

  • CopySVFGNode: Copy operations (bitcast, inttoptr, ptrtoint).

  • LoadSVFGNode: Load instructions. Reads from memory through a pointer.

  • StoreSVFGNode: Store instructions. Writes to memory through a pointer.

  • GepSVFGNode: GetElementPtr instructions. Computes derived pointers.

  • BinaryOpSVFGNode: Binary operations on pointer values.

  • CmpSVFGNode: Comparison instructions involving pointers.

  • BranchSVFGNode: Branch instructions with pointer-dependent conditions.

PHI Nodes

  • IntraPhiSVFGNode: Intra-procedural PHI nodes for merging values at control-flow joins.

  • InterPhiSVFGNode: Inter-procedural PHI nodes for parameter/return value aggregation.

Memory SSA Nodes (Address-Taken Variables)

These nodes track value flows through memory locations:

  • FormalInSVFGNode: Memory state at function entry (formal parameter memory).

  • FormalOutSVFGNode: Memory state at function exit (formal return memory).

  • ActualInSVFGNode: Memory state before call site (actual parameter memory).

  • ActualOutSVFGNode: Memory state after call site (actual return memory).

  • LoadMuSVFGNode: Memory use (load from address-taken location).

  • StoreChiSVFGNode: Memory def (store to address-taken location).

  • CallMuSVFGNode: Memory use at call site (for functions that may read memory).

  • CallChiSVFGNode: Memory def at call site (for functions that may write memory).

  • RetMuSVFGNode: Memory use at function return.

  • EntryChiSVFGNode: Memory def at function entry.

Memory PHI Nodes

  • IntraMSSAPhiSVFGNode: Intra-procedural memory PHI for merging memory states.

  • InterMSSAPhiSVFGNode: Inter-procedural memory PHI node kind (currently not materialized by default builder construction).

Parameter Nodes

  • FormalParmSVFGNode: Formal parameter at function entry.

  • ActualParmSVFGNode: Actual parameter at call site.

  • FormalRetSVFGNode: Formal return value at function exit.

  • ActualRetSVFGNode: Actual return value at call site.

Edge Types

Intra-Procedural Edges

  • IntraDirect: Direct value flow (copy, bitcast).

  • IntraCopy: Copy of pointer value.

  • IntraLoad: Load through pointer (defines loaded value).

  • IntraStore: Store through pointer (uses stored value).

  • IntraGep: GEP operation (derived pointer).

  • IntraPhi: PHI node value flow.

  • IntraMu: Memory use edge (LoadMu).

  • IntraChi: Memory def edge (StoreChi).

  • IntraIndirect: Indirect value flow with points-to guard.

Inter-Procedural Edges

  • CallDir: Direct call parameter passing.

  • CallInd: Indirect call parameter passing (with points-to guard).

  • CallAIn: Actual-in to formal-in edge (memory).

  • CallFIn: Formal-in edge (memory).

  • RetDir: Direct return value flow.

  • RetInd: Indirect return value flow (with points-to guard).

  • RetAOut: Actual-out to formal-out edge (memory).

  • RetFOut: Formal-out edge (memory).

  • ParamCall: Parameter passing edge without callsite context.

  • ParamRet: Return value edge without callsite context.

Thread Edges

  • ThreadMHPIndirectVF: May-happen-in-parallel indirect value flow for concurrent program analysis.

Architecture

The SVFG builder follows a phased construction approach:

  1. Pointer Analysis Phase

    Run AserPTA to compute points-to sets for all pointers:

    • Field-sensitive or field-insensitive memory model.

    • Andersen’s algorithm, Wave Propagation, or Deep Propagation.

    • On-the-fly call graph construction for indirect calls.

  2. Node Construction Phase

    Create SVFG nodes for all program values:

    • Top-level pointer nodes (Addr, Copy, Load, Store, Gep, etc.).

    • Parameter nodes (FormalParm, ActualParm, FormalRet, ActualRet).

    • Memory SSA nodes (FormalIn/Out, ActualIn/Out, LoadMu, StoreChi).

  3. Edge Construction Phase

    Connect nodes based on value flow:

    • Intra-procedural direct edges.

    • Intra-procedural indirect edges (with points-to sets).

    • Inter-procedural call/return edges.

  4. Memory SSA Phase

    Build memory SSA form for address-taken variables:

    • Memory region identification from points-to sets.

    • SSA versioning for memory locations.

    • Memory PHI nodes at control-flow merges.

    • Interprocedural memory flow through call/return edges (CallAIn/RetAOut).

Usage

Basic Construction

#include "IR/SVFG/SVFG.h"
#include "IR/SVFG/SVFGBuilder.h"

// Build SVFG from ICFG
SVFGBuilder builder;
SVFG* svfg = builder.build(icfg);

// Iterate over nodes
for (auto& pair : *svfg) {
    SVFGNode* node = pair.second;
    // Process node...
}

Configuration

#include "IR/SVFG/SVFGBuilder.h"

SVFGBuilderConfig config;
config.buildMSSA = true;
config.usePointerAnalysis = true;
config.resolveIndirectCalls = true;
config.solverType = SVFGBuilderConfig::SolverType::Andersen;
config.memModelType = SVFGBuilderConfig::MemModelType::FieldSensitive;

SVFGBuilder builder(config);
SVFG* svfg = builder.build(icfg);

Querying Value Flows

#include "IR/SVFG/SVFG.h"

// Get definition site for an instruction
SVFGNode* def = svfg->getDef(instruction);

// Get successors (forward reachability)
SVFGNodeSet succs = svfg->getSuccs(node);

// Get predecessors (backward reachability)
SVFGNodeSet preds = svfg->getPreds(node);

// Check path existence
bool hasPath = svfg->hasPath(srcNode, dstNode);

Memory SSA Queries

#include "IR/SVFG/SVFG.h"

// Get LoadMu nodes for a load instruction
const SVFGNodeSet& mus = svfg->getLoadMus(loadInst);

// Get StoreChi nodes for a store instruction
const SVFGNodeSet& chis = svfg->getStoreChis(storeInst);

// Get points-to set for a memory node
const SVFGNodeBS* pts = memNode->getPointsTo();

Interprocedural Queries

#include "IR/SVFG/SVFG.h"

// Get formal parameters for a function
const SVFGNodeSet& formalParms = svfg->getFormalParms(func);

// Get actual parameters for a call site
const SVFGNodeSet& actualParms = svfg->getActualParms(callSite);

// Get formal return node
const SVFGNodeSet& formalRets = svfg->getFormalRets(func);

// Get actual return node
const SVFGNodeSet& actualRets = svfg->getActualRets(callSite);

On-the-Fly Call Graph Refinement

#include "IR/SVFG/SVFGBuilder.h"

// Configure SVFG to defer indirect call resolution
SVFGBuilderConfig config;
config.resolveIndirectCalls = false;

SVFGBuilder builder(config);
SVFG* svfg = builder.build(icfg);

// Later, when DDA discovers an indirect call target:
std::vector<SVFGEdge*> newEdges;
builder.connectCallSiteToCalleeOnTheFly(svfg, callSite, callee, newEdges);

Type-Safe Node Casting

#include "IR/SVFG/SVFGNode.h"

SVFGNode* node = ...;

if (auto* addrNode = llvm::dyn_cast<AddrSVFGNode>(node)) {
    uint32_t objId = addrNode->getObjectId();
}

if (auto* loadNode = llvm::dyn_cast<LoadSVFGNode>(node)) {
    uint32_t ptrId = loadNode->getLoadFromPtr();
}

if (auto* mssaNode = llvm::dyn_cast<MSSASVFGNode>(node)) {
    uint32_t memReg = mssaNode->getMemReg();
    uint32_t version = mssaNode->getSSAVersion();
    const SVFGNodeBS* pts = mssaNode->getPointsTo();
}

Serialization

#include "IR/SVFG/SVFG.h"

// Write to file
svfg->writeToFile("output.svfg");

// Read from file
SVFG* loaded = new SVFG();
loaded->readFromFile("output.svfg");

// Dump to DOT format
svfg->dump("svfg.dot");

Integration

AserPTA Integration

The SVFG builder uses AserPTA as its pointer analysis engine:

  • Location: lib/Alias/AserPTA/

  • Solvers: Andersen, WavePropagation, DeepPropagation, PartialUpdate

  • Memory Models: Field-sensitive (FSMemModel), Field-insensitive (FIMemModel)

  • Context Sensitivity: Context-insensitive (NoCtx), k-call-site sensitive (KCallSite<N>)

ICFG Integration

The SVFG is built on top of the Interprocedural Control-Flow Graph:

  • Each SVFG node is associated with an ICFGNode.

  • Call sites and function entries/exits are derived from ICFG.

DDA Integration

The Demand-Driven Alias Analysis (DDA) uses SVFG for:

  • Value-flow backtracing: Follow SVFG edges backward to find definitions.

  • Indirect call resolution: On-the-fly edge creation for discovered callees.

  • Object tracking: Points-to sets on edges enable precise alias reasoning.

Components

Core Files

  • SVFG.h/cpp: Main graph class with node/edge management.

  • SVFGBase.h: Node/edge kind enumerations and type-checking helpers.

  • SVFGNode.h: Complete node type hierarchy.

  • SVFGEdge.h: Complete edge type hierarchy.

  • SVFGBuilder.h/cpp: Builder with AserPTA integration.

  • SVFGBuilderNodes.cpp: Node construction logic.

  • SVFGBuilderEdges.cpp: Edge construction logic.

  • SVFGBuilderMemorySSA.cpp: Memory SSA construction.

  • SVFGOPT.h/cpp: Optimization passes (dead node elimination, etc.).

  • SVFGSerializer.h/cpp: Serialization support.

  • SVFGStats.h/cpp: Statistics collection.

  • GraphTraits.h: LLVM graph traits for traversal algorithms.

  • PointsToSetHash.h: Hash functions for points-to sets.

Dependencies

  • CanaryICFG: Control-flow graph infrastructure.

  • AserPTA: Pointer analysis engine.

  • LLVM Core/Support: LLVM IR and utilities.

References

  • Yulei Sui, Ding Ye, Jingling Xue. “Detecting Memory Leaks Statically with Full-Sparse Value-Flow Analysis”. IEEE Transactions on Software Engineering (TSE), 2014.

  • Yulei Sui, Jingling Xue. “On-Demand Strong Update Analysis via Value-Flow Refinement”. ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE), 2016.

  • SVF Framework: https://github.com/SVF-tools/SVF