7. Improving Performance

This chapter presents several topics related to program performance. It first describes some of the tradeoffs that need to be considered and some of the techniques for making your program run faster. It then documents the gnatelim tool and unused subprogram/data elimination feature, which can reduce the size of program executables.

7.1 Performance Considerations

7.2 Text_IO Suggestions

7.3 Reducing Size of Ada Executables with gnatelim

7.4 Reducing Size of Executables with Unused Subprogram/Data Elimination

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.1 Performance Considerations

The GNAT system provides a number of options that allow a trade-off between

performance of the generated code
speed of compilation
minimization of dependences and recompilation
the degree of run-time checking.

The defaults (if no options are selected) aim at improving the speed of compilation and minimizing dependences, at the expense of performance of the generated code:

no optimization
no inlining of subprogram calls
all run-time checks enabled except overflow and elaboration checks

These options are suitable for most program development purposes. This chapter describes how you can modify these choices, and also provides some guidelines on debugging optimized code.

7.1.1 Controlling Run-Time Checks

7.1.2 Use of Restrictions

7.1.3 Optimization Levels

7.1.4 Debugging Optimized Code

7.1.5 Inlining of Subprograms

7.1.6 Vectorization of loops

7.1.7 Other Optimization Switches

7.1.8 Optimization and Strict Aliasing

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.1.1 Controlling Run-Time Checks

By default, GNAT generates all run-time checks, except integer overflow checks, stack overflow checks, and checks for access before elaboration on subprogram calls. The latter are not required in default mode, because all necessary checking is done at compile time. Two gnat switches, `-gnatp' and `-gnato' allow this default to be modified. See section 3.2.6 Run-Time Checks.

Our experience is that the default is suitable for most development purposes.

We treat integer overflow specially because these are quite expensive and in our experience are not as important as other run-time checks in the development process. Note that division by zero is not considered an overflow check, and divide by zero checks are generated where required by default.

Elaboration checks are off by default, and also not needed by default, since GNAT uses a static elaboration analysis approach that avoids the need for run-time checking. This manual contains a full chapter discussing the issue of elaboration checks, and if the default is not satisfactory for your use, you should read this chapter.

For validity checks, the minimal checks required by the Ada Reference Manual (for case statements and assignments to array elements) are on by default. These can be suppressed by use of the `-gnatVn' switch. Note that in Ada 83, there were no validity checks, so if the Ada 83 mode is acceptable (or when comparing GNAT performance with an Ada 83 compiler), it may be reasonable to routinely use `-gnatVn'. Validity checks are also suppressed entirely if `-gnatp' is used.

Note that the setting of the switches controls the default setting of the checks. They may be modified using either pragma Suppress (to remove checks) or pragma Unsuppress (to add back suppressed checks) in the program source.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.1.2 Use of Restrictions

The use of pragma Restrictions allows you to control which features are permitted in your program. Apart from the obvious point that if you avoid relatively expensive features like finalization (enforceable by the use of pragma Restrictions (No_Finalization), the use of this pragma does not affect the generated code in most cases.

One notable exception to this rule is that the possibility of task abort results in some distributed overhead, particularly if finalization or exception handlers are used. The reason is that certain sections of code have to be marked as non-abortable.

If you use neither the abort statement, nor asynchronous transfer of control (select ... then abort), then this distributed overhead is removed, which may have a general positive effect in improving overall performance. Especially code involving frequent use of tasking constructs and controlled types will show much improved performance. The relevant restrictions pragmas are

pragma Restrictions (No_Abort_Statements); pragma Restrictions (Max_Asynchronous_Select_Nesting => 0);

It is recommended that these restriction pragmas be used if possible. Note that this also means that you can write code without worrying about the possibility of an immediate abort at any point.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.1.3 Optimization Levels

Without any optimization option, the compiler's goal is to reduce the cost of compilation and to make debugging produce the expected results. Statements are independent: if you stop the program with a breakpoint between statements, you can then assign a new value to any variable or change the program counter to any other statement in the subprogram and get exactly the results you would expect from the source code.

Turning on optimization makes the compiler attempt to improve the performance and/or code size at the expense of compilation time and possibly the ability to debug the program.

If you use multiple -O options, with or without level numbers, the last such option is the one that is effective.

The default is optimization off. This results in the fastest compile times, but GNAT makes absolutely no attempt to optimize, and the generated programs are considerably larger and slower than when optimization is enabled. You can use the `-O' switch (the permitted forms are `-O0', `-O1' `-O2', `-O3', and `-Os') to gcc to control the optimization level:

`-O0'

No optimization (the default); generates unoptimized code but has the fastest compilation time.

Note that many other compilers do fairly extensive optimization even if "no optimization" is specified. With gcc, it is very unusual to use -O0 for production if execution time is of any concern, since -O0 really does mean no optimization at all. This difference between gcc and other compilers should be kept in mind when doing performance comparisons.

`-O1'

Moderate optimization; optimizes reasonably well but does not degrade compilation time significantly.

`-O2'

Full optimization; generates highly optimized code and has the slowest compilation time.

`-O3'

Full optimization as in `-O2'; also uses more aggressive automatic inlining of subprograms within a unit (see section 7.1.5 Inlining of Subprograms) and attempts to vectorize loops.

`-Os'

Optimize space usage (code and data) of resulting program.

Higher optimization levels perform more global transformations on the program and apply more expensive analysis algorithms in order to generate faster and more compact code. The price in compilation time, and the resulting improvement in execution time, both depend on the particular application and the hardware environment. You should experiment to find the best level for your application.

Since the precise set of optimizations done at each level will vary from release to release (and sometime from target to target), it is best to think of the optimization settings in general terms. See section `Options That Control Optimization' in Using the GNU Compiler Collection (GCC), for details about the `-O' settings and a number of `-f' options that individually enable or disable specific optimizations.

Unlike some other compilation systems, gcc has been tested extensively at all optimization levels. There are some bugs which appear only with optimization turned on, but there have also been bugs which show up only in unoptimized code. Selecting a lower level of optimization does not improve the reliability of the code generator, which in practice is highly reliable at all optimization levels.

Note regarding the use of `-O3': The use of this optimization level is generally discouraged with GNAT, since it often results in larger executables which may run more slowly. See further discussion of this point in 7.1.5 Inlining of Subprograms.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.1.4 Debugging Optimized Code

Although it is possible to do a reasonable amount of debugging at nonzero optimization levels, the higher the level the more likely that source-level constructs will have been eliminated by optimization. For example, if a loop is strength-reduced, the loop control variable may be completely eliminated and thus cannot be displayed in the debugger. This can only happen at `-O2' or `-O3'. Explicit temporary variables that you code might be eliminated at level `-O1' or higher.

The use of the `-g' switch, which is needed for source-level debugging, affects the size of the program executable on disk, and indeed the debugging information can be quite large. However, it has no effect on the generated code (and thus does not degrade performance)

Since the compiler generates debugging tables for a compilation unit before it performs optimizations, the optimizing transformations may invalidate some of the debugging data. You therefore need to anticipate certain anomalous situations that may arise while debugging optimized code. These are the most common cases:

The "hopping Program Counter": Repeated step or next commands show the PC bouncing back and forth in the code. This may result from any of the following optimizations:
- Common subexpression elimination: using a single instance of code for a quantity that the source computes several times. As a result you may not be able to stop on what looks like a statement.
- Invariant code motion: moving an expression that does not change within a loop, to the beginning of the loop.
- Instruction scheduling: moving instructions so as to overlap loads and stores (typically) with other code, or in general to move computations of values closer to their uses. Often this causes you to pass an assignment statement without the assignment happening and then later bounce back to the statement when the value is actually needed. Placing a breakpoint on a line of code and then stepping over it may, therefore, not always cause all the expected side-effects.
The "big leap": More commonly known as cross-jumping, in which two identical pieces of code are merged and the program counter suddenly jumps to a statement that is not supposed to be executed, simply because it (and the code following) translates to the same thing as the code that was supposed to be executed. This effect is typically seen in sequences that end in a jump, such as a goto, a return, or a break in a C switch statement.
The "roving variable": The symptom is an unexpected value in a variable. There are various reasons for this effect:
- In a subprogram prologue, a parameter may not yet have been moved to its "home".
- A variable may be dead, and its register re-used. This is probably the most common cause.
- As mentioned above, the assignment of a value to a variable may have been moved.
- A variable may be eliminated entirely by value propagation or other means. In this case, GCC may incorrectly generate debugging information for the variable
In general, when an unexpected value appears for a local variable or parameter you should first ascertain if that value was actually computed by your program, as opposed to being incorrectly reported by the debugger. Record fields or array elements in an object designated by an access value are generally less of a problem, once you have ascertained that the access value is sensible. Typically, this means checking variables in the preceding code and in the calling subprogram to verify that the value observed is explainable from other values (one must apply the procedure recursively to those other values); or re-running the code and stopping a little earlier (perhaps before the call) and stepping to better see how the variable obtained the value in question; or continuing to step from the point of the strange value to see if code motion had simply moved the variable's assignments later.

In light of such anomalies, a recommended technique is to use `-O0' early in the software development cycle, when extensive debugging capabilities are most needed, and then move to `-O1' and later `-O2' as the debugger becomes less critical. Whether to use the `-g' switch in the release version is a release management issue. Note that if you use `-g' you can then use the strip program on the resulting executable, which removes both debugging information and global symbols.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.1.5 Inlining of Subprograms

A call to a subprogram in the current unit is inlined if all the following conditions are met:

The optimization level is at least `-O1'.
The called subprogram is suitable for inlining: It must be small enough and not contain something that gcc cannot support in inlined subprograms.
Any one of the following applies: pragma Inline is applied to the subprogram and the `-gnatn' switch is specified; the subprogram is local to the unit and called once from within it; the subprogram is small and optimization level `-O2' is specified; optimization level `-O3') is specified.

Calls to subprograms in with'ed units are normally not inlined. To achieve actual inlining (that is, replacement of the call by the code in the body of the subprogram), the following conditions must all be true.

The optimization level is at least `-O1'.
The called subprogram is suitable for inlining: It must be small enough and not contain something that gcc cannot support in inlined subprograms.
The call appears in a body (not in a package spec).
There is a pragma Inline for the subprogram.
The `-gnatn' switch is used on the command line.

Even if all these conditions are met, it may not be possible for the compiler to inline the call, due to the length of the body, or features in the body that make it impossible for the compiler to do the inlining.

Note that specifying the `-gnatn' switch causes additional compilation dependencies. Consider the following:

package R is procedure Q; pragma Inline (Q); end R; package body R is ... end R; with R; procedure Main is begin ... R.Q; end Main;

With the default behavior (no `-gnatn' switch specified), the compilation of the Main procedure depends only on its own source, `main.adb', and the spec of the package in file `r.ads'. This means that editing the body of R does not require recompiling Main.

On the other hand, the call R.Q is not inlined under these circumstances. If the `-gnatn' switch is present when Main is compiled, the call will be inlined if the body of Q is small enough, but now Main depends on the body of R in `r.adb' as well as on the spec. This means that if this body is edited, the main program must be recompiled. Note that this extra dependency occurs whether or not the call is in fact inlined by gcc.

The use of front end inlining with `-gnatN' generates similar additional dependencies.

Note: The `-fno-inline' switch can be used to prevent all inlining. This switch overrides all other conditions and ensures that no inlining occurs. The extra dependences resulting from `-gnatn' will still be active, even if this switch is used to suppress the resulting inlining actions.

Note: The `-fno-inline-functions' switch can be used to prevent automatic inlining of subprograms if `-O3' is used.

Note: The `-fno-inline-small-functions' switch can be used to prevent automatic inlining of small subprograms if `-O2' is used.

Note: The `-fno-inline-functions-called-once' switch can be used to prevent inlining of subprograms local to the unit and called once from within it if `-O1' is used.

Note regarding the use of `-O3': There is no difference in inlining behavior between `-O2' and `-O3' for subprograms with an explicit pragma Inline assuming the use of `-gnatn' or `-gnatN' (the switches that activate inlining). If you have used pragma Inline in appropriate cases, then it is usually much better to use `-O2' and `-gnatn' and avoid the use of `-O3' which in this case only has the effect of inlining subprograms you did not think should be inlined. We often find that the use of `-O3' slows down code by performing excessive inlining, leading to increased instruction cache pressure from the increased code size. So the bottom line here is that you should not automatically assume that `-O3' is better than `-O2', and indeed you should use `-O3' only if tests show that it actually improves performance.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.1.6 Vectorization of loops

You can take advantage of the auto-vectorizer present in the gcc back end to vectorize loops with GNAT. The corresponding command line switch is `-ftree-vectorize' but, as it is enabled by default at `-O3' and other aggressive optimizations helpful for vectorization also are enabled by default at this level, using `-O3' directly is recommended.

You also need to make sure that the target architecture features a supported SIMD instruction set. For example, for the x86 architecture, you should at least specify `-msse2' to get significant vectorization (but you don't need to specify it for x86-64 as it is part of the base 64-bit architecture). Similarly, for the PowerPC architecture, you should specify `-maltivec'.

The preferred loop form for vectorization is the for iteration scheme. Loops with a while iteration scheme can also be vectorized if they are very simple, but the vectorizer will quickly give up otherwise. With either iteration scheme, the flow of control must be straight, in particular no exit statement may appear in the loop body. The loop may however contain a single nested loop, if it can be vectorized when considered alone:

A : array (1..4, 1..4) of Long_Float; S : array (1..4) of Long_Float; procedure Sum is begin for I in A'Range(1) loop for J in A'Range(2) loop S (I) := S (I) + A (I, J); end loop; end loop; end Sum;

The vectorizable operations depend on the targeted SIMD instruction set, but the adding and some of the multiplying operators are generally supported, as well as the logical operators for modular types. Note that, in the former case, enabling overflow checks, for example with `-gnato', totally disables vectorization. The other checks are not supposed to have the same definitive effect, although compiling with `-gnatp' might well reveal cases where some checks do thwart vectorization.

Type conversions may also prevent vectorization if they involve semantics that are not directly supported by the code generator or the SIMD instruction set. A typical example is direct conversion from floating-point to integer types. The solution in this case is to use the following idiom:

Integer (S'Truncation (F))

if S is the subtype of floating-point object F.

In most cases, the vectorizable loops are loops that iterate over arrays. All kinds of array types are supported, i.e. constrained array types with static bounds:

type Array_Type is array (1 .. 4) of Long_Float;

constrained array types with dynamic bounds:

type Array_Type is array (1 .. Q.N) of Long_Float; type Array_Type is array (Q.K .. 4) of Long_Float; type Array_Type is array (Q.K .. Q.N) of Long_Float;

or unconstrained array types:

type Array_Type is array (Positive range <>) of Long_Float;

The quality of the generated code decreases when the dynamic aspect of the array type increases, the worst code being generated for unconstrained array types. This is so because, the less information the compiler has about the bounds of the array, the more fallback code it needs to generate in order to fix things up at run time.

You can obtain information about the vectorization performed by the compiler by specifying `-ftree-vectorizer-verbose=N'. For more details of this switch, see section `Options for Debugging Your Program or GCC' in Using the GNU Compiler Collection (GCC).

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.1.7 Other Optimization Switches

Since GNAT uses the gcc back end, all the specialized gcc optimization switches are potentially usable. These switches have not been extensively tested with GNAT but can generally be expected to work. Examples of switches in this category are `-funroll-loops' and the various target-specific `-m' options (in particular, it has been observed that `-march=xxx' can significantly improve performance on appropriate machines). For full details of these switches, see section `Hardware Models and Configurations' in Using the GNU Compiler Collection (GCC).

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.1.8 Optimization and Strict Aliasing

The strong typing capabilities of Ada allow an optimizer to generate efficient code in situations where other languages would be forced to make worst case assumptions preventing such optimizations. Consider the following example:

procedure R is type Int1 is new Integer; type Int2 is new Integer; type Int1A is access Int1; type Int2A is access Int2; Int1V : Int1A; Int2V : Int2A; ... begin ... for J in Data'Range loop if Data (J) = Int1V.all then Int2V.all := Int2V.all + 1; end if; end loop; ... end R;

In this example, since the variable Int1V can only access objects of type Int1, and Int2V can only access objects of type Int2, there is no possibility that the assignment to Int2V.all affects the value of Int1V.all. This means that the compiler optimizer can "know" that the value Int1V.all is constant for all iterations of the loop and avoid the extra memory reference required to dereference it each time through the loop.

This kind of optimization, called strict aliasing analysis, is triggered by specifying an optimization level of `-O2' or higher or `-Os' and allows GNAT to generate more efficient code when access values are involved.

However, although this optimization is always correct in terms of the formal semantics of the Ada Reference Manual, difficulties can arise if features like Unchecked_Conversion are used to break the typing system. Consider the following complete program example:

package p1 is type int1 is new integer; type int2 is new integer; type a1 is access int1; type a2 is access int2; end p1; with p1; use p1; package p2 is function to_a2 (Input : a1) return a2; end p2; with Unchecked_Conversion; package body p2 is function to_a2 (Input : a1) return a2 is function to_a2u is new Unchecked_Conversion (a1, a2); begin return to_a2u (Input); end to_a2; end p2; with p2; use p2; with p1; use p1; with Text_IO; use Text_IO; procedure m is v1 : a1 := new int1; v2 : a2 := to_a2 (v1); begin v1.all := 1; v2.all := 0; put_line (int1'image (v1.all)); end;

This program prints out 0 in `-O0' or `-O1' mode, but it prints out 1 in `-O2' mode. That's because in strict aliasing mode, the compiler can and does assume that the assignment to v2.all could not affect the value of v1.all, since different types are involved.

This behavior is not a case of non-conformance with the standard, since the Ada RM specifies that an unchecked conversion where the resulting bit pattern is not a correct value of the target type can result in an abnormal value and attempting to reference an abnormal value makes the execution of a program erroneous. That's the case here since the result does not point to an object of type int2. This means that the effect is entirely unpredictable.

However, although that explanation may satisfy a language lawyer, in practice an applications programmer expects an unchecked conversion involving pointers to create true aliases and the behavior of printing 1 seems plain wrong. In this case, the strict aliasing optimization is unwelcome.

Indeed the compiler recognizes this possibility, and the unchecked conversion generates a warning:

p2.adb:5:07: warning: possible aliasing problem with type "a2" p2.adb:5:07: warning: use -fno-strict-aliasing switch for references p2.adb:5:07: warning: or use "pragma No_Strict_Aliasing (a2);"

Unfortunately the problem is recognized when compiling the body of package p2, but the actual "bad" code is generated while compiling the body of m and this latter compilation does not see the suspicious Unchecked_Conversion.

As implied by the warning message, there are approaches you can use to avoid the unwanted strict aliasing optimization in a case like this.

One possibility is to simply avoid the use of `-O2', but that is a bit drastic, since it throws away a number of useful optimizations that do not involve strict aliasing assumptions.

A less drastic approach is to compile the program using the option `-fno-strict-aliasing'. Actually it is only the unit containing the dereferencing of the suspicious pointer that needs to be compiled. So in this case, if we compile unit m with this switch, then we get the expected value of zero printed. Analyzing which units might need the switch can be painful, so a more reasonable approach is to compile the entire program with options `-O2' and `-fno-strict-aliasing'. If the performance is satisfactory with this combination of options, then the advantage is that the entire issue of possible "wrong" optimization due to strict aliasing is avoided.

To avoid the use of compiler switches, the configuration pragma No_Strict_Aliasing with no parameters may be used to specify that for all access types, the strict aliasing optimization should be suppressed.

However, these approaches are still overkill, in that they causes all manipulations of all access values to be deoptimized. A more refined approach is to concentrate attention on the specific access type identified as problematic.

First, if a careful analysis of uses of the pointer shows that there are no possible problematic references, then the warning can be suppressed by bracketing the instantiation of Unchecked_Conversion to turn the warning off:

pragma Warnings (Off); function to_a2u is new Unchecked_Conversion (a1, a2); pragma Warnings (On);

Of course that approach is not appropriate for this particular example, since indeed there is a problematic reference. In this case we can take one of two other approaches.

The first possibility is to move the instantiation of unchecked conversion to the unit in which the type is declared. In this example, we would move the instantiation of Unchecked_Conversion from the body of package p2 to the spec of package p1. Now the warning disappears. That's because any use of the access type knows there is a suspicious unchecked conversion, and the strict aliasing optimization is automatically suppressed for the type.

If it is not practical to move the unchecked conversion to the same unit in which the destination access type is declared (perhaps because the source type is not visible in that unit), you may use pragma No_Strict_Aliasing for the type. This pragma must occur in the same declarative sequence as the declaration of the access type:

type a2 is access int2; pragma No_Strict_Aliasing (a2);

Here again, the compiler now knows that the strict aliasing optimization should be suppressed for any reference to type a2 and the expected behavior is obtained.

Finally, note that although the compiler can generate warnings for simple cases of unchecked conversions, there are tricker and more indirect ways of creating type incorrect aliases which the compiler cannot detect. Examples are the use of address overlays and unchecked conversions involving composite types containing access types as components. In such cases, no warnings are generated, but there can still be aliasing problems. One safe coding practice is to forbid the use of address clauses for type overlaying, and to allow unchecked conversion only for primitive types. This is not really a significant restriction since any possible desired effect can be achieved by unchecked conversion of access values.

The aliasing analysis done in strict aliasing mode can certainly have significant benefits. We have seen cases of large scale application code where the time is increased by up to 5% by turning this optimization off. If you have code that includes significant usage of unchecked conversion, you might want to just stick with `-O1' and avoid the entire issue. If you get adequate performance at this level of optimization level, that's probably the safest approach. If tests show that you really need higher levels of optimization, then you can experiment with `-O2' and `-O2 -fno-strict-aliasing' to see how much effect this has on size and speed of the code. If you really need to use `-O2' with strict aliasing in effect, then you should review any uses of unchecked conversion of access types, particularly if you are getting the warnings described above.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.2 `Text_IO` Suggestions

The Ada.Text_IO package has fairly high overheads due in part to the requirement of maintaining page and line counts. If performance is critical, a recommendation is to use Stream_IO instead of Text_IO for volume output, since this package has less overhead.

If Text_IO must be used, note that by default output to the standard output and standard error files is unbuffered (this provides better behavior when output statements are used for debugging, or if the progress of a program is observed by tracking the output, e.g. by using the Unix tail -f command to watch redirected output.

If you are generating large volumes of output with Text_IO and performance is an important factor, use a designated file instead of the standard output file, or change the standard output file to be buffered using Interfaces.C_Streams.setvbuf.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.3 Reducing Size of Ada Executables with `gnatelim`

This section describes gnatelim, a tool which detects unused subprograms and helps the compiler to create a smaller executable for your program.

7.3.1 About gnatelim

7.3.2 Running gnatelim

7.3.3 Processing Precompiled Libraries

7.3.4 Correcting the List of Eliminate Pragmas

7.3.5 Making Your Executables Smaller

7.3.6 Summary of the gnatelim Usage Cycle

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.3.1 About `gnatelim`

When a program shares a set of Ada packages with other programs, it may happen that this program uses only a fraction of the subprograms defined in these packages. The code created for these unused subprograms increases the size of the executable.

gnatelim tracks unused subprograms in an Ada program and outputs a list of GNAT-specific pragmas Eliminate marking all the subprograms that are declared but never called. By placing the list of Eliminate pragmas in the GNAT configuration file `gnat.adc' and recompiling your program, you may decrease the size of its executable, because the compiler will not generate the code for 'eliminated' subprograms. See section `Pragma Eliminate' in GNAT Reference Manual, for more information about this pragma.

gnatelim needs as its input data the name of the main subprogram.

If a set of source files is specified as gnatelim arguments, it treats these files as a complete set of sources making up a program to analyse, and analyses only these sources.

After a full successful build of the main subprogram gnatelim can be called without specifying sources to analyse, in this case it computes the source closure of the main unit from the `ALI' files.

The following command will create the set of `ALI' files needed for gnatelim:

$ gnatmake -c Main_Prog

Note that gnatelim does not need object files.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.3.2 Running `gnatelim`

gnatelim has the following command-line interface:

$ gnatelim [switches] -main=main_unit_name {filename} [-cargs gcc_switches]

main_unit_name should be a name of a source file that contains the main subprogram of a program (partition).

Each filename is the name (including the extension) of a source file to process. "Wildcards" are allowed, and the file name may contain path information.

`gcc_switches' is a list of switches for gcc. They will be passed on to all compiler invocations made by gnatelim to generate the ASIS trees. Here you can provide `-I' switches to form the source search path, use the `-gnatec' switch to set the configuration file, use the `-gnat05' switch if sources should be compiled in Ada 2005 mode etc.

gnatelim has the following switches:

`-files=filename': Take the argument source files from the specified file. This file should be an ordinary text file containing file names separated by spaces or line breaks. You can use this switch more than once in the same call to gnatelim. You also can combine this switch with an explicit list of files.
`-log': Duplicate all the output sent to `stderr' into a log file. The log file is named `gnatelim.log' and is located in the current directory.
`-log=filename': Duplicate all the output sent to `stderr' into a specified log file.
`--no-elim-dispatch': Do not generate pragmas for dispatching operations.
`--ignore=filename': Do not generate pragmas for subprograms declared in the sources listed in a specified file
`-o=report_file': Put gnatelim output into a specified file. If this file already exists, it is overridden. If this switch is not used, gnatelim outputs its results into `stderr'
`-q': Quiet mode: by default gnatelim outputs to the standard error stream the number of program units left to be processed. This option turns this trace off.
`-t': Print out execution time.
`-v': Verbose mode: gnatelim version information is printed as Ada comments to the standard output stream. Also, in addition to the number of program units left gnatelim will output the name of the current unit being processed.
`-wq': Quiet warning mode - some warnings are suppressed. In particular warnings that indicate that the analysed set of sources is incomplete to make up a partition and that some subprogram bodies are missing are not generated.

Note: to invoke gnatelim with a project file, use the gnat driver (see 12.2 The GNAT Driver and Project Files).

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.3.3 Processing Precompiled Libraries

If some program uses a precompiled Ada library, it can be processed by gnatelim in a usual way. gnatelim will newer generate an Eliminate pragma for a subprogram if the body of this subprogram has not been analysed, this is a typical case for subprograms from precompiled libraries. Switch `-wq' may be used to suppress warnings about missing source files and non-analyzed subprogram bodies that can be generated when processing precompiled Ada libraries.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.3.4 Correcting the List of Eliminate Pragmas

In some rare cases gnatelim may try to eliminate subprograms that are actually called in the program. In this case, the compiler will generate an error message of the form:

main.adb:4:08: cannot reference subprogram "P" eliminated at elim.out:5

You will need to manually remove the wrong Eliminate pragmas from the configuration file indicated in the error message. You should recompile your program from scratch after that, because you need a consistent configuration file(s) during the entire compilation.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.3.5 Making Your Executables Smaller

In order to get a smaller executable for your program you now have to recompile the program completely with the configuration file containing pragmas Eliminate generated by gnatelim. If these pragmas are placed in `gnat.adc' file located in your current directory, just do:

$ gnatmake -f main_prog

(Use the `-f' option for gnatmake to recompile everything with the set of pragmas Eliminate that you have obtained with gnatelim).

Be aware that the set of Eliminate pragmas is specific to each program. It is not recommended to merge sets of Eliminate pragmas created for different programs in one configuration file.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.3.6 Summary of the `gnatelim` Usage Cycle

Here is a quick summary of the steps to be taken in order to reduce the size of your executables with gnatelim. You may use other GNAT options to control the optimization level, to produce the debugging information, to set search path, etc.

Create a complete set of `ALI' files (if the program has not been built already)
$ gnatmake -c main_prog
Generate a list of Eliminate pragmas in default configuration file `gnat.adc' in the current directory
$ gnatelim main_prog >[>] gnat.adc
Recompile the application
$ gnatmake -f main_prog

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.4 Reducing Size of Executables with Unused Subprogram/Data Elimination

This section describes how you can eliminate unused subprograms and data from your executable just by setting options at compilation time.

7.4.1 About unused subprogram/data elimination

7.4.2 Compilation options

7.4.3 Example of unused subprogram/data elimination

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.4.1 About unused subprogram/data elimination

By default, an executable contains all code and data of its composing objects (directly linked or coming from statically linked libraries), even data or code never used by this executable.

This feature will allow you to eliminate such unused code from your executable, making it smaller (in disk and in memory).

This functionality is available on all Linux platforms except for the IA-64 architecture and on all cross platforms using the ELF binary file format. In both cases GNU binutils version 2.16 or later are required to enable it.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.4.2 Compilation options

The operation of eliminating the unused code and data from the final executable is directly performed by the linker.

In order to do this, it has to work with objects compiled with the following options: `-ffunction-sections' `-fdata-sections'. These options are usable with C and Ada files. They will place respectively each function or data in a separate section in the resulting object file.

Once the objects and static libraries are created with these options, the linker can perform the dead code elimination. You can do this by setting the `-Wl,--gc-sections' option to gcc command or in the `-largs' section of gnatmake. This will perform a garbage collection of code and data never referenced.

If the linker performs a partial link (`-r' ld linker option), then you will need to provide one or several entry point using the `-e' / `--entry' ld option.

Note that objects compiled without the `-ffunction-sections' and `-fdata-sections' options can still be linked with the executable. However, no dead code elimination will be performed on those objects (they will be linked as is).

The GNAT static library is now compiled with -ffunction-sections and -fdata-sections on some platforms. This allows you to eliminate the unused code and data of the GNAT library from your executable.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.4.3 Example of unused subprogram/data elimination

Here is a simple example:

with Aux; procedure Test is begin Aux.Used (10); end Test; package Aux is Used_Data : Integer; Unused_Data : Integer; procedure Used (Data : Integer); procedure Unused (Data : Integer); end Aux; package body Aux is procedure Used (Data : Integer) is begin Used_Data := Data; end Used; procedure Unused (Data : Integer) is begin Unused_Data := Data; end Unused; end Aux;

Unused and Unused_Data are never referenced in this code excerpt, and hence they may be safely removed from the final executable.

$ gnatmake test $ nm test | grep used 020015f0 T aux__unused 02005d88 B aux__unused_data 020015cc T aux__used 02005d84 B aux__used_data $ gnatmake test -cargs -fdata-sections -ffunction-sections \ -largs -Wl,--gc-sections $ nm test | grep used 02005350 T aux__used 0201ffe0 B aux__used_data

It can be observed that the procedure Unused and the object Unused_Data are removed by the linker when using the appropriate options.

[ << ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

This document was generated by GNAT Mailserver on May, 10 2012 using texi2html

7.1 Performance Considerations
7.2 `Text_IO` Suggestions
7.3 Reducing Size of Ada Executables with `gnatelim`
7.4 Reducing Size of Executables with Unused Subprogram/Data Elimination

7.3.1 About `gnatelim`
7.3.2 Running `gnatelim`
7.3.3 Processing Precompiled Libraries
7.3.4 Correcting the List of Eliminate Pragmas
7.3.5 Making Your Executables Smaller
7.3.6 Summary of the `gnatelim` Usage Cycle

7. Improving Performance

7.1 Performance Considerations

7.1.1 Controlling Run-Time Checks

7.1.2 Use of Restrictions

7.1.3 Optimization Levels

7.1.4 Debugging Optimized Code

7.1.5 Inlining of Subprograms

7.1.6 Vectorization of loops

7.1.7 Other Optimization Switches

7.1.8 Optimization and Strict Aliasing

7.2 Text_IO Suggestions

7.3 Reducing Size of Ada Executables with gnatelim

7.3.1 About gnatelim

7.3.2 Running gnatelim

7.3.3 Processing Precompiled Libraries

7.3.4 Correcting the List of Eliminate Pragmas

7.3.5 Making Your Executables Smaller

7.3.6 Summary of the gnatelim Usage Cycle

7.4 Reducing Size of Executables with Unused Subprogram/Data Elimination

7.4.1 About unused subprogram/data elimination

7.4.2 Compilation options

7.4.3 Example of unused subprogram/data elimination

7.2 `Text_IO` Suggestions

7.3 Reducing Size of Ada Executables with `gnatelim`

7.3.1 About `gnatelim`

7.3.2 Running `gnatelim`

7.3.6 Summary of the `gnatelim` Usage Cycle