Newtonian Program Analysis (NPA)
Overview
The Newtonian Program Analysis (NPA) engine in lib/Dataflow/NPA
implements advanced, research-oriented techniques for compositional and
recurrence-based data-flow reasoning.
It is designed for numeric and relational analyses that go
beyond classical bit-vector or IFDS/IDE formulations.
Location:
lib/Dataflow/NPA(implementation),include/Dataflow/NPA(headers)Purpose: library infrastructure for Newton-style program analyses
Conceptual Background
NPA is based on a sequence of works that recast program analysis as a form of Newton iteration over suitable abstract domains:
Newtonian Program Analysis (JACM 2010) [EsparzaKieferLuttenberger2010]: algebraic differential \(Df|_\nu\), Newton sequence, convergence to least fixed point.
Newtonian Program Analysis via Tensor Product (TOPLAS 2016) [RepsEtAlTOPLAS2016]: solving the LCFL (linear context-free language) sub-problems produced each Newton round by “regularizing” them via a tensor product (paired semiring).
Compositional Recurrence Analysis Revisited, PLDI 2017.
The foundational paper [EsparzaKieferLuttenberger2010] (JACM) presents a novel generic technique for solving interprocedural dataflow equations by generalizing Newton’s method to ω-continuous semirings. When the semiring multiplication is non-commutative (e.g. function composition in dataflow), the linearized system at each Newton round is an LCFL equation system (coefficients on both sides of variables). [RepsEtAlTOPLAS2016] shows how to convert such systems into left-linear (regular) form over a paired semiring, then solve and project back (Algorithm 3.4).
Key Insight: Algebraic Generalization
The key insight is that Newton’s method can be reformulated purely algebraically, without requiring subtraction, division, or limits. This allows the method to be applied to arbitrary semirings, not just the real numbers.
The method works as follows:
Programs are encoded as systems of (possibly non-linear) recurrence equations over an ω-continuous semiring.
At each iteration, Newton’s method linearizes the system at the current approximation using algebraic differentials.
The differential of a power series is defined inductively using algebraic rules (product rule, sum rule) rather than limits.
Each Newton step solves a linear system to obtain the next approximation.
This yields:
Faster convergence than classical Kleene iteration (often exponentially faster).
Robustness: guaranteed convergence for any ω-continuous semiring.
Termination guarantees: for idempotent commutative semirings, Newton’s method terminates in at most n iterations for a system of n equations. In code, this bound is applied automatically only when the domain declares
static constexpr bool commutative_extend = true.
Mathematical Foundation
ω-Continuous Semirings
An ω-continuous semiring is a tuple \(\langle S, +, \cdot, 0, 1 \rangle\) where:
\((S, +, 0)\) is a commutative monoid (addition/join operation)
\((S, \cdot, 1)\) is a monoid (multiplication/sequencing operation)
Multiplication distributes over addition
The relation \(a \sqsubseteq b\) (defined as \(\exists d : a + d = b\)) is a partial order
Every ω-chain has a supremum with respect to \(\sqsubseteq\)
Infinite sums satisfy standard continuity properties
This generalizes:
Lattices (classical dataflow analysis): idempotent semirings where \(a + a = a\)
Language semirings: languages over an alphabet with union and concatenation
Counting semirings: sets of vectors (Parikh images) for counting occurrences
Probabilistic semirings: nonnegative reals for probability and expected runtime analysis
Relational semirings: relations over program states for may-alias and summary relations
Comparison with Kleene’s Method
Traditional dataflow analysis uses Kleene iteration, which:
Starts at \(\kappa^{(0)} = 0\) (or \(f(0)\))
Iterates: \(\kappa^{(i+1)} = f(\kappa^{(i)})\)
Converges to the least fixed point \(\mu f = \sup_i \kappa^{(i)}\)
Newton’s method instead:
Starts at \(\nu^{(0)} = f(0)\)
Iterates: \(\nu^{(i+1)} = \nu^{(i)} + \Delta^{(i)}\) where \(\Delta^{(i)}\) is the least solution of the linearized system \(Df|_{\nu^{(i)}}(X) + \delta^{(i)} = X\)
Converges at least as fast as Kleene: \(\kappa^{(i)} \sqsubseteq \nu^{(i)} \sqsubseteq \mu f\)
For probabilistic programs, Newton’s method can achieve linear convergence (one bit of precision per iteration) compared to logarithmic convergence for Kleene iteration. For commutative idempotent semirings, Newton’s method guarantees termination in at most n iterations for n equations.
Examples and Applications
The NPA framework supports both qualitative and quantitative analyses:
May-Alias Analysis
Using the counting semiring (Parikh abstraction), NPA can perform may-alias analysis that tracks how many times each data access path is traversed. Unlike Kleene iteration, which may not terminate for recursive procedures, Newton’s method terminates in one iteration for this class of problems.
Average Runtime Analysis
Using probabilistic semirings, NPA can compute: * The probability that a procedure terminates * The expected runtime (conditional on termination)
For probabilistic programs with recursive procedures, Newton’s method converges substantially faster than Kleene iteration, achieving linear rather than logarithmic convergence.
LCFL Sub-Problems and Tensor-Product Regularization
When the semiring extend (multiplication) is non-commutative, the linearized system at each Newton round has the form of an LCFL equation system \(Y_j = c_j \oplus \bigoplus_{i,k} (a_{i,j,k} \otimes Y_i \otimes b_{i,j,k})\) (Reps et al. TOPLAS 2016, Definition 3.1): coefficients appear on both sides of variables, corresponding to path problems described by linear context-free languages. Conventional intraprocedural (regular) solvers cannot be applied directly.
The tensor-product approach (Algorithm 3.4 in [RepsEtAlTOPLAS2016]):
(1) Convert the LCFL system into a left-linear system over a paired
semiring \((a,b) \otimes_p (a',b') = (a' \otimes a, b \otimes b')\),
\(R((w_1,w_2)) = w_1 \otimes w_2\); (2) solve the left-linear system
(e.g. by worklist or path expressions); (3) project back via \(R\).
The paired structure maintains the mirrored correlation of left/right
coefficients. The implementation uses TensorProductDomain and
solve_linear_tensor_impl when the system has LCFL structure (Concat/Star).
Star is the paper-faithful Kleene-star construct used by Newton/tensor
regularization. Mu remains available as a generic least-fixpoint construct
for plain evaluation, but it is intentionally rejected on Newton/tensor paths.
Implementation note: to preserve left/right correlation during projection, the tensor solver uses an exact correlated representation for idempotent domains (modeling sums as finite sets of pairs) rather than componentwise addition over a single pair.
Other Applications
Language analysis: Computing the set of execution paths (context-free languages)
Relational analysis: Computing summary relations for interprocedural analysis
Cost analysis: Expected resource consumption under probabilistic execution models
Distributive vs. Non-Distributive Analyses
NPA supports both distributive and non-distributive program analyses:
Distributive analyses: Multiplication distributes over addition (\(a \cdot (b + c) = a \cdot b + a \cdot c\)). In this case, the least fixed point coincides with the join-over-all-paths (JOP) solution.
Non-distributive analyses: Only subdistributivity holds (\(a \cdot (b + c) \sqsupseteq a \cdot b + a \cdot c\)). In this case, the least fixed point is an overapproximation of the JOP solution, but still provides a sound analysis result. In-tree clients such as constant propagation and interval analysis use
SummaryTransformerDomainas NPA’s current abstract-summary path for this fragment.
Core Implementation (include/Dataflow/NPA/Core)
The core headers implement the algorithms from Esparza et al. (JACM) and Reps et al. (TOPLAS 2016):
NPACommon.h: Domain concept (ω-continuous semiring),
LinearStrategy(Naive, Worklist, SCC, TensorProduct).Expressions.h:
Exp0(polynomial equation AST),Exp1(linearized AST);Concatencodes the LCFL form \(a \cdot X \cdot b\);Staris the Newton/tensor Kleene-star fragment andMuis a generic least-fixpoint node.Diff.h: Builds the differential \(Df|_\nu\) from a polynomial expression (Esparza et al. JACM, Defn. 3.1, 3.5) and caches differential shape plans across Newton rounds.
Eval.h:
I0evaluates Exp0 (full system);I1evaluates Exp1 (linear RHS).Fixpoint.h:
fix/fix_vecfor Kleene-like iteration.LCFLDetector.h: Detects Concat/Star (LCFL structure).
LinearSolvers.h:
solve_linear_worklist_impl,solve_linear_scc_impl,solve_linear_tensor_implfor the linearized system.TensorLinearSolve.h: Tensor-product solver (Reps et al. Alg. 3.4).
Solver.h:
KleeneIter(κ^(i+1) = f(κ^(i))),NewtonIter(ν^(i+1) = ν^(i) ⊔ Δ^(i)).
Usage Notes
In the current code base, NPA is primarily exposed as an internal library component. Clients that use NPA are expected to be familiar with the above papers and to instantiate the provided abstractions for their specific numeric domains and recurrence schemes.
Interprocedural clients currently assume a closed-world LLVM module for indirect calls: if no stronger resolver is attached, unknown call sites are approximated by the set of type-compatible defined functions in the current module.
Practical notes for numeric domains
For floating-point / numeric semirings, exact equality often prevents termination
in iterative solvers. Domains may provide an optional method
approx_equal(a,b); when present, NPA uses it for convergence checks instead
of equal(a,b).
Domains may also opt into bounding iteration:
max_fixpoint_iterscaps generic fixpoint loops (e.g.Star/Mubodies).max_linear_stepscaps worklist/SCC steps for the linearized system.
If a domain implements project(), Newton/tensor paths require an additional
opt-in project_newton_safe contract. That contract is the domain author’s
assertion that projection is monotone and compatible with combine and the
linearized summary equations used by the Newton pipeline.
This engine is not currently wired into a dedicated command-line tool; instead, it serves as a building block for experimental analyses within Lotus.