Please note that this service will be discontinued after December 31, 2013
The MFP algorithm — theoretical background

### Introduction

The MFP (minimum fixed point) algorithm is used in dataflow analysis to determine properties of a program at compile time without actually executing it. The program is represented by its control flow graph. In this abstraction of the program, the nodes correspond to the statements of the program and the edges represent the possible control transitions. This graph has a unique start node s and an end node e. Every edge is associated with a transfer function which specifies how information is propagated along this edge. The control flow graph and the transfer functions form a recursive equation system, which is solved iteratively by the MFP algorithm.

The MOP (merge-over-all-paths) solution computes for every node n of the CFG (control flow graph) the incoming informations from all paths that lead from the start node s to n. These informations are combined and the result is used as input by the transfer functions associated with the outgoing edges of n. In this way, information is gathered for all nodes in the CFG and stored there.

Unfortunately the MOP solution is only computable in special cases. To get a good approximation of the MOP solution, the MFP algorithm is used, which works whenever the transfer functions are monotonic. The difference between these two approaches is that the MFP algorithm only combines the informations from the predecessors of a node without checking the whole path from s to that node.

### Theoretical Background

The theory behind the MFP algorithm will be explained via the following definitions:

1. Control Flow Graph (CFG)
The control flow graph (CFG) is a directed graph G = (N,E,s,e) with a finite set N of nodes, a set E of edges which is a subset of NxN, a start node s and an end node e, where s and e are elements of N. If there exists an edge (n,m), then m is called successor of n and n is called predecessor of m. The start node s is required to have no predecessor and e no successor.
In addition, the graph has to be connected: Every node must be reachable from s and e is reachable from any node.

2. Path
A path p from node n0 to node nk in a CFG (N,E,s,e) is a sequence of edges (elements of E) of the form
p = (n0,n1), (n1,n2), ..., (nk-1,nk). A path may contain the same edge many times. Thus, there are infinitely many paths if the CFG contains circles (loops).

3. Complete Lattice
A partially ordered set (A,) is called a complete lattice, iff every subset of A has a greatest lower bound () and a least upper bound (). The elements = A, = A are called bottom and top elements of A. The application of the greatest lower bound to two elements {a,b} is written as ab. Analogously, {a,b} is written as ab.

4. Chain
An (ascending) chain (xi)i is a sequence x0, x1, ..., such that xj xj+1 holds for all j. A chain (xi)i is called strongly ascending, iff for all j, xj and xj+1 are different. A chain eventually stabilizes iff there exists an index j such that for all indexes n > j, xj = xn holds. Similar definitions apply to descending chains.

5. Pointwise order, Monotone and Distributive Functions
Functions from a set A to a partially ordered set B are usually ordered by the pointwise ordering: fg iff f(x)g(x) for all x in A.
Let A and B be complete lattices. A function f: AB is called monotone, iff for all elements x, y of A, xy implies f(x) f(y) in B. The function f is called distributive, iff for all elements x, y of A the equality f(xy) = f(x)f(y) holds. All distributive functions are monotone.

6. Data Flow Problems
A data flow problem is a CFG together with a complete lattice D of abstract values, a family of transfer functions, and a start value i in D. The transfer functions express the local (abstract) semantics of the CFG. They assign a meaning function to every edge of the CFG:  tf: EDD,  i.e.  tf(e): DD  for all edges e in E.

The path semantics along some path is the composition of the transfer functions tf along the path (and the identity for the empty path):

7. Merge Over All Path (MOP) Solution
The MOP solution of a data flow problem is the lub of the path semantics, applied to the start value i:

Thus, all possible executions of a program which reach the node n are combined with . This combination ensures that the information is valid for every program execution and therefore is independent of the input of the program. Since the MOP solution is in general not computable if there is an infinite number of paths and the lattice D is infinite, the MFP solution is used instead.

8. Minimum Fixed Point (MFP) Solution
The MFP solution is the least function (in the pointwise ordering) MFP: ND satisfying the following system of equations:
 MFP(s) =  i, MFP(n) =  { tf(e)(MFP(m)) | e = (m,n) in E } for n different from s.
For the computation of the MFP solution, see below.

9. Example: The Difference between MOP and MFP

Consider the CFG with node set N = {s, a1, a2, b, e} and edge set E = {(s,a1), (s,a2), (a1,b), (a2,b), (b,e)}. Let D be any complete lattice, and i, u, and v any three elements of D. We take i as the start value, and the following transfer functions:
tf(s,a1) = cu, the constant function with value u,   tf(s,a2) = cv, the constant function with value v,
tf(a1,b) = tf(a2,b) = id,   tf(b,e) = some function f.

In this case, the MOP solution is easily computable since there are at most 2 paths from s to any node n. The result is
MOP(s) = iMOP(a1) = uMOP(a2) = vMOP(b) = uvMOP(e) = f(u)f(v).
The value MOP(e) is the lub of the semantics of the two paths (s,a1),(a1,b),(b,e) and (s,a2),(a2,b),(b,e) from s to e.

The MFP solution is the least solution of the following system of equations:
MFP(s) = iMFP(a1) = cu(MFP(s)),  MFP(a2) = cv(MFP(s)),
MFP(b) = MFP(a1)MFP(a2),  MFP(e) = f(MFP(b)).
Because of the absence of loops in the CFG, this system has exactly one solution, namely
MFP(s) = iMFP(a1) = uMFP(a2) = vMFP(b) = uvMFP(e) = f(uv),
which is in general different from  MOP(e) = f(u)f(v).

10. The General Relationship between MOP and MFP
According to the coincidence theorem (Kam and Ullman 1997), MOPMFP holds if the transfer functions are monotone. This means that MFP is a correct approximation of the MOP solution. If the transfer functions are even distributive, then MOP = MFP follows. Hence in this special situation, the MOP solution is computable by the MFP algorithm presented below.

11. Computation of the MFP Solution (Fixed Point Iteration)
The MFP solution is computable by the following fixed point iteration if all transfer functions are monotone and every ascending chain of the lattice D eventually stabilizes.
The algorithm consists of the inductive computation of a sequence (Fk)k of functions Fk: ND. The function F0 maps the start node s to the start value i, and all other nodes n to , the least element of the complete lattice D. The function Fk+1 is computed from Fk as follows using the equation system for MFP:
 Fk+1(s) =  i, Fk+1(n) =  { tf(e)(Fk(m)) | e = (m,n) in E } for n different from s.
The algorithm stops once Fk+1 = Fk. In this case, Fk is the MFP solution.
Monotonicity of the transfer functions implies FkFk+1 for all k. Hence, (Fk(n))k is an ascending chain for every node n. If ascending chains in D eventually stabilize, the chain (Fk)k of functions eventually stabilizes as well, i.e. the algorithm terminates.
The fixed point algorithm can be modified in several ways. In the computation of Fk+1(n), the value of Fk+1(m) may be used if it is already available. This will change the individual functions Fk, but not the final result. In this case, it is useful to traverse the nodes in an order which follows the edges of the CFG as far as possible. Moreover, it is useful to recompute the value at a node only if the value at one of his predecessors has changed. All these ideas are incorporated in the PAG/WWW implementation of the fixed point algorithm.

### Search

 PAG/WWW all AbsInt sites for