Inside Bound-T

How does Bound-T work, and how is it implemented? This page tells you a bit about the internals of Bound-T:

The overall architecture of Bound-T.
How this architecture is implemented.
Acknowledgements for tools, programs, and libraries that Bound-T uses.

The "Analysis Process" chapter of the Bound-T Reference Manual goes into more detail about the analysis methods.

The decription below focuses on the worst-case execution time (WCET) analysis. Stack usage analysis is a part of WCET analysis, but Bound-T can also be configured as a pure stack-usage analyser.

Architecture of Bound-T

Here we sketch the architecture of Bound-T, concentrating on what Bound-T does and dividing it as follows:

an overview of the inputs, the processing, and the outputs of Bound-T;
how Bound-T constructs the flow-graphs and the call-graph of the program under analysis;
modelling and analysing the computations in the target program and deducing loop iteration bounds;
finding an upper bound on the execution time of each instruction in the program; and
calculating an overall WCET bound from the control-flow analysis, loop bounds, and instruction execution times.

Input, processing, output

As presently implemented, Bound-T is a classical computer program in the sense that it reads some inputs, performs some processing on the inputs, and emits some outputs.

The inputs to Bound-T are the target program to be analysed, as a compiled and linked (executable) file; the command-line arguments and options; and the user-written assertions in one or more files of text.

The processing falls into five main phases, each supported by the relevant assertions:

Extracting the code of the target program from the executable file, organising it into the control-flow graphs for each subprogram, and linking the flow-graphs by the call-graph.
Modelling and analysing the computations performed by the instructions in the target program. A necessary part of the computation model is an analysis of the stack usage. Several types of data-flow analysis are used, the most powerful (and time-consuming) being based on a decision procedure for Presburger Arithmetic.
Using the computation model to study the data-dependent flow of control and thereby find the possible execution paths. The main point is finding loop bounds: bounds on the number of iterations of each loop in the code.
Modelling the time taken to execute each instruction in the target program, and thus finding an upper bound on the execution time of each basic block in the flow-graph of each subprogram.
Calculating the worst-case execution time as constrained by the loop bounds and assertions. This phase uses the Implicit Path Enumeration Technique (IPET) which applies Integer Linear Programming to maximise the execution time.

The outputs from Bound-T are simple representations of the results of the analysis: text on the standard output channel giving the computed WCET bounds and stack usage bounds, and text files that represent the flow-graphs and the call-graph in a form that can be viewed with the "dot" graph-drawing tool from the GraphViz kit.

The subsections below go into each processing phase in some more detail.

Building flow-graphs and call-graphs

The first analysis phase in Bound-T constructs the flow-graphs of the subprograms to be analysed and the call-graph that links the flow-graphs. The input is the program's code in the executable file.

Executable files and their content

The target program is usually given as a compiled and linked executable file, for example in ELF form. Such files contain the initial memory image of the program, as loaded into the target processor's memory (or memories) before the processor is booted. The memory image contains the code (machine instructions) and initial values for initialised data variables. However, the initial values of variables are usually not placed (loaded) directly into the variables (data memory locations); rather, the startup code copies the initial values from their load-time locations (often in the program memory) to the variables.

The executable files generally (and hopefully) also contain information that connects the machine-level program, where parts are numerically identified by code and data addresses, to the symbolically identified source-code entities such as subprograms, variables, and source-code files. This information is necessary for source-level debugging of the program and is therefore usually called debugging information. Bound-T uses the debugging information to accept input (especially assertions) written in source-code terms and to display outputs in source-code terms.

One of the main assumptions that Bound-T relies on is that the program code given in the memory image is immutable, that is, it does not change when the program is executed. Self-modifying programs are not in scope for Bound-T.

Instructions one by one

Bound-T starts its analysis from the entry point of the root subprogram named on the command line. Bound-T looks up this address in the target program's memory image, reads the machine code (binary form) of the instruction at that address, decodes the instruction to find out what it does, and inserts a (model of) the instruction in the flow-graph under construction for this subprogram. If the instruction is not a branch (jump) instruction Bound-T goes on to the next address and reads, decodes, and models the next instruction in the same way.

When Bound-T finds a branch instruction there may be two, or sometimes more, possible "next addresses" and their next instructions. Bound-T creates a branch point in the flow-graph and follows all the paths. Likewise, when Bound-T finds a return instruction, there is no next instruction (in this subprogram) and the exploration of this execution path stops with the return instruction, creating a return point — a childless node — in the flow-graph.

When Bound-T finds a call instruction it creates a special kind of node in the flow-graph to represent the execution of the called subprogram (the callee) and then goes on with the construction of the flow-graph of the calling subprogram (the caller) from the return point of the call instruction. Bound-T also adds to the call-graph a node that represents this call, and includes the callee in the analysis. That is, Bound-T later analyses the code of the callee in the same way, starting from the callee's entry point and analysing instructions one at a time to build the flow-graph of the callee.

Dynamic control flow

Constructing flow-graphs instruction by instruction in the above way is generally easy as long as each instruction specifies its possible successors statically. This is true even if the condition that selects one of these successors at run time depends dynamically on the run-time values of variables, as happens in conditional branches.

However, many programs contain branch instructions where the actual address of the successor — the branch target — is computed dynamically at run time. For example, many instruction sets contain a register-indirect branch instruction where the target address is taken from a register.

When Bound-T finds such a dynamic branch instruction, it suspends the exploration of this execution path and thus leaves the flow-graph incomplete. Later data-flow analysis, in particular the constant propagation and the Presburger analysis, may model the computation of the branch target well enough to find all the possible target addresses and thus resolve the dynamic branch. If so, Bound-T resumes the construction of the flow-graph from these resolved target addresses, and hopefully completes it. However, when the subprogram has several dynamic branches, Bound-T may find more dynamic branches and thus require several cycles of flow-graph construction interleaved with data-flow analysis until the flow-graph is completed.

The target address of a call instruction can also be computed dynamically, resulting in a dynamic call.

Programs typically use dynamic branches or calls for two purposes:

for switch-case statements that are implemented by means of constant tables of static addresses, and
for calling subprograms via pointers (function pointers).

Bound-T is usually able to resolve the switch-case kind of dynamic branch by analysing the local code to find the bounds of the address table, and then reading the contents of the address table from the memory image. This of course assumes that the address table is constant during execution.

Bound-T usually fails to resolve calls via pointers, because the run-time value of the pointer is usually set at some distant point in the program, and Bound-T does not do the necessary global analysis to find the possible pointer values. Thus you must usually find the possible callees manually and list them in an assertion.

Modelling and analysing the computations

One of the main problems in program analysis is data-dependent control flow &mdash that is, conditional branches. For WCET analysis, in particular, the conditional branches that create loops are very important, because an upper bound on the number of loop iterations must be known before any WCET bound can be found. While you can, and often must, write assertions to set loop bounds, most WCET tools try to find loop bounds automatically, by analysing the computations that decide when the loop terminates.

Modelling the computation

When Bound-T decodes instructions and builds the flow-graphs, it also creates a model of the computational effect of each instruction. In this model, the effect of an instruction is a set of assignments of values to data cells — registers, flags, memory locations. The assigned value is defined by an arithmetic expression that uses arithmetic operations such as addition and subtraction, logical bit-level operations such as bitwise "and", "or", "xor" or shift and rotate operations. The operands in the expressions are constants or data cells.

When Bound-T adds a conditional branch to a flow-graph, as an edge between two nodes of the flow-graph, it models the logical condition for the branch as a Boolean expression, similar to an arithmetic expression but returning a Boolean value.

These models of the computation turn the flow-graphs into full models of the program under analysis. In principle, the flow-graphs could be simulated and would then perform the same computation as the original program. However, in practice some computations are not modelled, or are modelled approximately. For example, Bound-T does not model floating-point computations because loop bounds seldom depend on floating-point values.

There are several knotty problems in the modelling of the computations, for example data cells that overlap, dynamically computed pointers to data cells, the differences between signed and unsigned computations, and the modelling of underflows and overflows. A deeper discussion of these problems does not fit on this page (not even in the margin :-).

Basic data-flow analyses

Given a flow-graph with the model of the computation, Bound-T applies several forms of data-flow analysis to simplify the model and deduce constraints on the values of variables. These analyses include:

Constant propagation.
Value-origin analysis.
Presburger Arithmetic analysis.

Constant propagation simplifies arithmetic and Boolean expressions by replacing data-cell operands with their values, when the values are statically known, and evaluating operations that have constant operands. For example, if a basic block consists of two instructions, where the first instruction assigns the value 14 to register r2, and the second instruction assigns the expression r2+7 to register r4, constant propagation simplifies the expression r2+7 to 14+7 and then evaluates the sum to give 21. After constant propagation, the model of the second instruction simply assigns 21 to r4.

Value-origin analysis applies to every use (occurrence) of a data cell as an operand in an expression and identifies the instructions that can have defined (assigned) the value of the data cell that is used in this expression. This is also called "def-use" analysis or "value numbering". For example, if a subprogram saves and restores a "callee-save" register by first pushing it onto the stack, then popping it off the stack before returning, value-origin analysis can show that the popped value originates in the pushed value, and thus that the value of the register on return is the same as the value on entry, so the register's value is preserved over any call of this subprogram.

The Presburger Arithmetic (PA) analysis models the relations between data-cell values, even when the actual values are not statically known. For example, if an instruction assigns the expression 2*r2+7 to r4, where r2 and r4 are registers, the PA analysis records that the relation r4=2*r2+7 holds from this instruction on, until either r2 or r4 is changed by a new assignment. Each assignment implies a relation between data-cell values. The PA analysis propagates these relations over the whole subprogram. The resulting relation set can be queried in various ways to reveal bounds on data-cell values and other things.

The Bound-T manuals tend to call the PA analysis "arithmetic analysis".

Loop bounds analysis

Bound-T uses the Presburger Arithmetic analysis to identify loop induction variables, which are data cells that change (increase or decrease) by a known step on every repetition of the loop. The value of an induction variable on iteration number n can thus be computed as the initial value plus n times the step. Combining this formula with the Presburger Arithmetic model of the loop's termination condition can give bounds on the iteration count n, thus bounds on the number of times the loop is repeated.

Modelling the execution time of instructions

After the control-flow and loop-bounds analysis, the second main problem in WCET analysis is to find upper bounds on the execution time of all instructions in the target program. This depends on the instructions and on the particular target processor or target system and can be very difficult for complex processors with several levels of cache, external memory banks, busses with several competing masters, and so on.

The target processors that Bound-T currently supports are simple enough that most instructions have a fixed execution time — a fixed number of clock cycles. The time can depend on the kind of instruction, for example a load instruction or an addition instruction, and on the kinds of the operands, for example registers or memory cells, but the execution time usually does not depend on the values of the operands, the address of the instruction, or the earlier history of execution.

For processors with pipelines, such as the SPARC/ERC32 or the ADSP-21020 (SHARC), Bound-T models the pipeline and includes the effect of pipeline "stalls" (interlocks) in the timing. This is aided by Bound-T's unusual view of control-flow graphs. In the usual view, each instruction occurs once in the flow-graph, and the flow-graph is actually a graph of the possible values of the Program Counter. Each PC value identifies a single instruction. In Bound-T, however, the flow-graph is a graph of the possible states of the pipeline, and a pipeline state (and thus a flow-graph node) identifies a set or sequence of instructions, one in each stage of the pipeline. This makes it easier to locate the points in the flow-graph where the pipeline may stall.

For processors that support external memory, Bound-T may let the user choose a number of wait-state cycles to be assumed for memory accesses. Some versions of Bound-T allow different wait-state choices for different sorts of memory accesses: code fetch, stack data read or write, other data read or write.

At present, Bound-T's most complex model of instruction timing is used for the SPARC/ERC32 target. This processor has separate computation units for integer computations and for floating-point computations — the Integer Unit (IU) and Floating-Point Unit (FPU). The IU and FPU can run in parallel. When the processor fetches and decodes a floating-point instruction, the FPU starts to execute this instruction. Meanwhile, the IU can fetch, decode, and execute any number of integer instructions. When the IU encounters a new FPU instruction, or an instruction that needs the results of the earlier FPU instruction, the IU stalls until the FPU has finished the earlier instruction. Bound-T models the worst-case effect of this IU/FPU synchronisation by distributing stall delays in the flow-graph according to an heuristic but safe procedure, guided by an ILP optimisation.

For any target processor, the end result of this processing phase in Bound-T is an upper bound on the execution time of each basic block and each control-flow edge in the flow-graphs.

Calculating the overall bounds

Bound-T uses the Implicit Path Enumeration Technique (IPET) to compute an upper bound on the WCET of each subprogram, once all loops have iteration bounds and the execution times of all basic blocks and control-edge edges are bounded. IPET uses an Integer Linear Programming formulation to maximise the execution time bound over all possible execution paths (see the TECS survey paper for references on IPET and the Wikipedia entry for more about ILP).

IPET considers the total execution time of a flow-graph as a function of the number of times each part of the flow-graph (each basic block and each edge) is executed. Let x_p stand for the number of times the flow-graph part p is executed. Let t_p stand for the WCET bound on part p. An upper bound T(x) on the total execution time is then given by the sum of the products t_p·x_p over all parts p. This sum T(x) is a linear function of the non-negative integer variables x_p.

The problem is thus to find the maximum value of T(x) over all values of x that are possible for one execution of the flow-graph. The x variables are constrained by the structure of the flow-graph and by the results of the control-flow analysis. Let the entry point of the flow-graph be the target of a notional "entry edge" e that represents a call to the subprogram, and let each return point in the flow-graph be the source of a notional "return edge" that represents a return from the subprogram. Then:

x_p > 0 for all p.
x_e = 1.
The sum x_r over all return edges r is 1.
For every basic block b: x_b equals the sum of x_in over all edges in that enter b, and also equals the sum of x_out over all edges out that leave b.
For every loop in the flow-graph, the upper bound on the number of loop iterations implies a constraint on the number of times the loop body is executed, relative to the number of times the loop is started.
Other constraints on x may be derived by flow analysis (infeasible paths, for example) or come from user-written assertions that constrain the number of executions of specific parts of the flow-graph, for example calls to other subprograms.

Given the expression for T(x) and the affine constraints on x an ILP solver like LP_Solve can compute the maximum of T(x), which is the desired overall WCET bound. The ILP solver also gives a value of x that reaches this maximum, which is not necessarily unique, but gives some idea of the contribution of each part of the flow-graph to the overall WCET bound.

Bound-T applies IPET to each subprogram's flow-graph separately, traversing the call-graph in a bottom-up direction from the leaf subprograms to the root subprogram.

Applying IPET per subprogram, and not to the whole extended flow-graph at once, has the advantage of keeping the ILP problems small, but the disadvantage of preventing any inter-subprogram flow constraints. In partial mitigation, Bound-T performs context-specific analyses for subprograms. Thus, the loop bounds in a subprogram can depend on its parameters, and Bound-T can find different loop bounds and different WCET bounds for this subprogram in different contexts (different call paths).

Calculating an overall stack-usage bound from the stack-usage bounds of each subprogram is much simpler. Again proceeding bottom-up in the call-graph, Bound-T computes the stack usage for a subprogram call by adding the local stack height at the call instruction to the total stack usage of the callee. The total stack usage of a non-leaf (calling) subprogram is computed as the maximum of its local stack height and the stack usages of its calls. When the root subprogram is reached we have both a stack usage bound for the root, including its callees, and a call-path that reaches this stack-usage (if each subprogram on the call path reaches its local worst-case stack usage).

Stack usage can also be context-specific, although this is much rarer than contex-specific WCET bounds.

Implementation of Bound-T

As in architecture, so also in implementation is Bound-T a classical computer program. Bound-T is compiled and linked from manually written source code into a monolithic executable file. The only external, dynamically linked or activated components are the operating system libraries and the two supporting tools: the Omega Calculator program (oc) for solving problems in Presburger Arithmetic, and the LP_Solve program (lp_solve) for solving Integer Linear Programming problems.

Programming language

Bound-T is written in the Ada language and compiled with the GNU Ada compiler GNAT. The main development platform is Debian Linux which has excellent Ada support thanks to Ludovic Brenta and others.

Why Ada?

For many, many reasons, both low-level and high-level. To take a fairly trivial low-level point as a first example, analysis and simulation of binary machine code requires well-defined and portable binary integer types, of various bit-widths. Ada is excellent at this, as its modular types can be defined to have exactly the number of bits desired.

On a slightly higher level, one of the main steps in porting Bound-T to a new target processor is to define an internal, abstracted representation of the instruction set: the types of instructions, operands, and other things that make up the instruction set. The strong support in Ada for enumeration types and structured types such as arrays and records is very helpful here. In particular, arrays indexed by enumeration values, and discriminated records with variant forms, are often a close fit to this problem domain.

On an intermediate level, the Ada feature of generic programming lets the programmer write a complex, general algorithm once, and then instantiate it for the particular data types and operations required by a given problem. For example, Bound-T contains a generic package for least-fixpoint iteration, a very common and powerful tool in program analysis. Different instances of this generic package implement different phases of the analysis, such as constant propagation and value-origin analysis.

Finally, to take an example from the highest level, Ada makes a clear separation between the interface of a module, and the implementation of the module. For Bound-T, in particular, this helps to separate the general, target-independent modules from the target-specific modules.

Ada features used

Bound-T relies mostly on the 1995 Ada standard but increasingly uses the features added in the 2005 standard.

Object-oriented coding, in the form of Ada's tagged types, is used more and more. We find it to be a good new way to reduce dependencies between modules without hampering further development and refinement of each module. The exception-handling feature of Ada is also useful here, in addition to its primary (and important) role of catching programming errors and errors or unforeseen properties of the input data.

The stream-oriented Ada I/O packages are exactly what we need for reading compiled and linked executables, for example ELF or COFF files. The ability to override the stream input operations for scalar types lets us support both little-endian and big-endian files very easily, as well as compute check-sums on the fly. For structured types (arrays and records) Ada automatically constructs input operations by combining the input operations of the components of the structured type. This avoids a lot of manual programming and greatly reduces the risk of errors in such code.

One feature of Ada that Bound-T does not yet use is tasking: the ability to divide the program into concurrently executing tasks and thus, for example, make good use of the several CPU cores now often available in workstations. However, static program analysis can be very time consuming — for Bound-T, this holds especially for the Presburger Arithmetic analysis. In the future, we may well use tasking to spread the computations over all the available cores.

Modular architecture

Bound-T relies heavily on modularity through the Ada package feature. Counting both the general parts of Bound-T and the target-specific parts for all current targets, Bound-T contains a total of some 480 Ada packages. These packages can be grouped into four subsets as follows:

Some 150 general, target-independent packages that form the core of Bound-T and are used for every target processor.
Some 80 library-type packages, mainly for reading various kinds of executable files and their debugging information. The libraries needed for a given target processor depend on the executable-file formats commonly used with that processor. For example, programs for the Intel-8051 processor often use a variant of the Intel AOMF-51 format, while programs for ARM7TDMI favour the ELF format.
About 10 library-type packages that encapsulate functions of the operating system of the host PC (workstation) on which Bound-T runs: Linux, MS-Windows, Mac OS-X.
Typically about 40 target-specific packages for each target processor.

A separate Bound-T executable is compiled for each target processor. Target-specific packages define fundamental data types for the target processor, such as the code- and data-address types, the structure of the pipeline state that identifies nodes in the flow-graphs, and the set of data storage cells. These target-specific types and their operations are used in the general, target-independent packages that build flow-graphs, analyse arithmetic expressions of data cells, and so on.

Portability

Source-level portability to the most common host PC (workstation) types is an important goal for Bound-T. The Ada language has worked very well for this: only a very small part of Bound-T is host-specific. The main host-specific code implements the child-process creation and communication through which Bound-T uses its support programs: the Omega Calculator and LP_Solve. This host-specific code exists in two versions, one for Unix/Linux/Mac and one for MS-Windows. Each version consists of about 300 Ada declarations and statements, an insignificant amount compared to the 51,000 declarations and statements in the core Bound-T (the general, target-independent packages and the library packages) which is portable to all these hosts. Not to mention a similar amount of host-portable but target-specific code.

Grateful acknowledgements

Bound-T was first developed at Space Systems Finland Ltd (SSF) with support from the European Space Agency (ESA/ESTEC). Free software has played an important role: we are grateful to Ada Core Technology and GNU for the GNAT compiler, to William Pugh and others for the Omega system, to Michel Berkelaar for the LP_Solve program, to Mats Weber, Ted Dennison and EPFL-DI-LGL for Ada programming component libraries, and to Marc Criley for the XML EZ_Out package that helps to generate XML outputs. Call-graphs and flow-graphs from Bound-T are displayed with the dot tool from AT&T Bell Laboratories.