Newtonian Program Analysis (NPA) ================================ Overview ======== The **Newtonian Program Analysis (NPA)** engine in ``lib/Dataflow/NPA`` implements advanced, research-oriented techniques for compositional and recurrence-based data-flow reasoning. It is designed for **numeric** and **relational** analyses that go beyond classical bit-vector or IFDS/IDE formulations. * **Location**: ``lib/Dataflow/NPA`` (implementation), ``include/Dataflow/NPA`` (headers) * **Purpose**: library infrastructure for Newton-style program analyses Conceptual Background ===================== NPA is based on a sequence of works that recast program analysis as a form of **Newton iteration** over suitable abstract domains: * *Newtonian Program Analysis* (JACM 2010) [EsparzaKieferLuttenberger2010]_: algebraic differential :math:`Df|_\nu`, Newton sequence, convergence to least fixed point. * *Newtonian Program Analysis via Tensor Product* (TOPLAS 2016) [RepsEtAlTOPLAS2016]_: solving the **LCFL** (linear context-free language) sub-problems produced each Newton round by "regularizing" them via a **tensor product** (paired semiring). * *Compositional Recurrence Analysis Revisited*, PLDI 2017. The foundational paper [EsparzaKieferLuttenberger2010]_ (JACM) presents a novel generic technique for solving interprocedural dataflow equations by generalizing Newton's method to **ω-continuous semirings**. When the semiring multiplication is **non-commutative** (e.g. function composition in dataflow), the linearized system at each Newton round is an **LCFL equation system** (coefficients on both sides of variables). [RepsEtAlTOPLAS2016]_ shows how to convert such systems into left-linear (regular) form over a paired semiring, then solve and project back (Algorithm 3.4). Key Insight: Algebraic Generalization -------------------------------------- The key insight is that Newton's method can be reformulated **purely algebraically**, without requiring subtraction, division, or limits. This allows the method to be applied to arbitrary semirings, not just the real numbers. The method works as follows: * Programs are encoded as systems of (possibly non-linear) recurrence equations over an ω-continuous semiring. * At each iteration, Newton's method linearizes the system at the current approximation using **algebraic differentials**. * The differential of a power series is defined inductively using algebraic rules (product rule, sum rule) rather than limits. * Each Newton step solves a linear system to obtain the next approximation. This yields: * **Faster convergence** than classical Kleene iteration (often exponentially faster). * **Robustness**: guaranteed convergence for any ω-continuous semiring. * **Termination guarantees**: for idempotent commutative semirings, Newton's method terminates in at most *n* iterations for a system of *n* equations. In code, this bound is applied automatically only when the domain declares ``static constexpr bool commutative_extend = true``. Mathematical Foundation ======================== ω-Continuous Semirings ---------------------- An **ω-continuous semiring** is a tuple :math:`\langle S, +, \cdot, 0, 1 \rangle` where: * :math:`(S, +, 0)` is a commutative monoid (addition/join operation) * :math:`(S, \cdot, 1)` is a monoid (multiplication/sequencing operation) * Multiplication distributes over addition * The relation :math:`a \sqsubseteq b` (defined as :math:`\exists d : a + d = b`) is a partial order * Every ω-chain has a supremum with respect to :math:`\sqsubseteq` * Infinite sums satisfy standard continuity properties This generalizes: * **Lattices** (classical dataflow analysis): idempotent semirings where :math:`a + a = a` * **Language semirings**: languages over an alphabet with union and concatenation * **Counting semirings**: sets of vectors (Parikh images) for counting occurrences * **Probabilistic semirings**: nonnegative reals for probability and expected runtime analysis * **Relational semirings**: relations over program states for may-alias and summary relations Comparison with Kleene's Method -------------------------------- Traditional dataflow analysis uses **Kleene iteration**, which: * Starts at :math:`\kappa^{(0)} = 0` (or :math:`f(0)`) * Iterates: :math:`\kappa^{(i+1)} = f(\kappa^{(i)})` * Converges to the least fixed point :math:`\mu f = \sup_i \kappa^{(i)}` **Newton's method** instead: * Starts at :math:`\nu^{(0)} = f(0)` * Iterates: :math:`\nu^{(i+1)} = \nu^{(i)} + \Delta^{(i)}` where :math:`\Delta^{(i)}` is the least solution of the linearized system :math:`Df|_{\nu^{(i)}}(X) + \delta^{(i)} = X` * Converges at least as fast as Kleene: :math:`\kappa^{(i)} \sqsubseteq \nu^{(i)} \sqsubseteq \mu f` For probabilistic programs, Newton's method can achieve linear convergence (one bit of precision per iteration) compared to logarithmic convergence for Kleene iteration. For commutative idempotent semirings, Newton's method guarantees termination in at most *n* iterations for *n* equations. Examples and Applications ========================= The NPA framework supports both **qualitative** and **quantitative** analyses: May-Alias Analysis ------------------ Using the **counting semiring** (Parikh abstraction), NPA can perform may-alias analysis that tracks how many times each data access path is traversed. Unlike Kleene iteration, which may not terminate for recursive procedures, Newton's method terminates in one iteration for this class of problems. Average Runtime Analysis ------------------------ Using **probabilistic semirings**, NPA can compute: * The probability that a procedure terminates * The expected runtime (conditional on termination) For probabilistic programs with recursive procedures, Newton's method converges substantially faster than Kleene iteration, achieving linear rather than logarithmic convergence. LCFL Sub-Problems and Tensor-Product Regularization -------------------------------------------------- When the semiring extend (multiplication) is **non-commutative**, the linearized system at each Newton round has the form of an **LCFL equation system** :math:`Y_j = c_j \oplus \bigoplus_{i,k} (a_{i,j,k} \otimes Y_i \otimes b_{i,j,k})` (Reps et al. TOPLAS 2016, Definition 3.1): coefficients appear on **both sides** of variables, corresponding to path problems described by linear context-free languages. Conventional intraprocedural (regular) solvers cannot be applied directly. The **tensor-product** approach (Algorithm 3.4 in [RepsEtAlTOPLAS2016]_): (1) Convert the LCFL system into a **left-linear** system over a **paired** semiring :math:`(a,b) \otimes_p (a',b') = (a' \otimes a, b \otimes b')`, :math:`R((w_1,w_2)) = w_1 \otimes w_2`; (2) solve the left-linear system (e.g. by worklist or path expressions); (3) **project** back via :math:`R`. The paired structure maintains the mirrored correlation of left/right coefficients. The implementation uses ``TensorProductDomain`` and ``solve_linear_tensor_impl`` when the system has LCFL structure (Concat/Star). ``Star`` is the paper-faithful Kleene-star construct used by Newton/tensor regularization. ``Mu`` remains available as a generic least-fixpoint construct for plain evaluation, but it is intentionally rejected on Newton/tensor paths. Implementation note: to preserve left/right correlation during projection, the tensor solver uses an exact correlated representation for idempotent domains (modeling sums as finite sets of pairs) rather than componentwise addition over a single pair. Other Applications ------------------ * **Language analysis**: Computing the set of execution paths (context-free languages) * **Relational analysis**: Computing summary relations for interprocedural analysis * **Cost analysis**: Expected resource consumption under probabilistic execution models Distributive vs. Non-Distributive Analyses =========================================== NPA supports both **distributive** and **non-distributive** program analyses: * **Distributive analyses**: Multiplication distributes over addition (:math:`a \cdot (b + c) = a \cdot b + a \cdot c`). In this case, the least fixed point coincides with the join-over-all-paths (JOP) solution. * **Non-distributive analyses**: Only subdistributivity holds (:math:`a \cdot (b + c) \sqsupseteq a \cdot b + a \cdot c`). In this case, the least fixed point is an **overapproximation** of the JOP solution, but still provides a sound analysis result. In-tree clients such as constant propagation and interval analysis use ``SummaryTransformerDomain`` as NPA's current abstract-summary path for this fragment. Core Implementation (include/Dataflow/NPA/Core) =============================================== The core headers implement the algorithms from Esparza et al. (JACM) and Reps et al. (TOPLAS 2016): * **NPACommon.h**: Domain concept (ω-continuous semiring), ``LinearStrategy`` (Naive, Worklist, SCC, TensorProduct). * **Expressions.h**: ``Exp0`` (polynomial equation AST), ``Exp1`` (linearized AST); ``Concat`` encodes the LCFL form :math:`a \cdot X \cdot b`; ``Star`` is the Newton/tensor Kleene-star fragment and ``Mu`` is a generic least-fixpoint node. * **Diff.h**: Builds the differential :math:`Df|_\nu` from a polynomial expression (Esparza et al. JACM, Defn. 3.1, 3.5) and caches differential shape plans across Newton rounds. * **Eval.h**: ``I0`` evaluates Exp0 (full system); ``I1`` evaluates Exp1 (linear RHS). * **Fixpoint.h**: ``fix`` / ``fix_vec`` for Kleene-like iteration. * **LCFLDetector.h**: Detects Concat/Star (LCFL structure). * **LinearSolvers.h**: ``solve_linear_worklist_impl``, ``solve_linear_scc_impl``, ``solve_linear_tensor_impl`` for the linearized system. * **TensorLinearSolve.h**: Tensor-product solver (Reps et al. Alg. 3.4). * **Solver.h**: ``KleeneIter`` (κ^(i+1) = f(κ^(i))), ``NewtonIter`` (ν^(i+1) = ν^(i) ⊔ Δ^(i)). Usage Notes =========== In the current code base, NPA is primarily exposed as an internal library component. Clients that use NPA are expected to be familiar with the above papers and to instantiate the provided abstractions for their specific numeric domains and recurrence schemes. Interprocedural clients currently assume a closed-world LLVM module for indirect calls: if no stronger resolver is attached, unknown call sites are approximated by the set of type-compatible defined functions in the current module. Practical notes for numeric domains =================================== For floating-point / numeric semirings, exact equality often prevents termination in iterative solvers. Domains may provide an optional method ``approx_equal(a,b)``; when present, NPA uses it for convergence checks instead of ``equal(a,b)``. Domains may also opt into bounding iteration: * ``max_fixpoint_iters`` caps generic fixpoint loops (e.g. ``Star`` / ``Mu`` bodies). * ``max_linear_steps`` caps worklist/SCC steps for the linearized system. If a domain implements ``project()``, Newton/tensor paths require an additional opt-in ``project_newton_safe`` contract. That contract is the domain author's assertion that projection is monotone and compatible with ``combine`` and the linearized summary equations used by the Newton pipeline. This engine is **not** currently wired into a dedicated command-line tool; instead, it serves as a building block for experimental analyses within Lotus. References ========== .. [RepsEtAlTOPLAS2016] Thomas Reps, Emma Turetsky, and Prathmesh Prabhu. Newtonian Program Analysis via Tensor Product. ACM Transactions on Programming Languages and Systems (TOPLAS), 2016. (Portions in POPL 2016.) .. [EsparzaKieferLuttenberger2010] Javier Esparza, Stefan Kiefer, and Michael Luttenberger. Newtonian Program Analysis. Journal of the ACM, 57(6):1-47, 2010.