DivisionByConstElimination
Replace all integer divisions by constants of power of two with equivalent shifts if applicable.
Example:
| Original C Code |
Code After Optimization |
a = b / 16;
|
a = b>>4;
|
a = b / 4;
|
a = b>>2;
|
a = b / 2;
|
a = b>>1;
|
Explanation:
Dividing two integers can be an expensive operation in hardware whereas, bit shifts of integers are usually quicker operations. Doing this optimization could significantly speed up modules and systems that make use of integer division by constants that are powers of two.
How To Call:
All that needs to be done to call this optimization is add it to the selected flags list. No arguments are needed.
MultiplyByConstElimination
Replace all integer multiplications by constants with equivalent shifts and additions.
Example:
| Original C Code |
Code After Optimization |
a = b * 16;
|
a = b<<4;
|
a = b * 5;
|
a = b<<2 + b;
|
a = b * 3 + a * 7;
|
a = (b<<1 + b) + (a<<2 + a + a + a);
|
Explanation:
Multiplying two integers can be an expensive operation in hardware whereas, adding and bit shifts of integers are usually quicker operations. Doing this optimization could significantly speed up modules and systems that make use of integer multiplication by constants.
How To Call:
All that needs to be done to call this optimization is add it to the selected flags list. No arguments are needed.
LoopUnrolling
Unroll the loop at the given C label by an amount. If the loop has constant bounds, the loop can be fully unrolled.
Note: This optimization is only available for systems.
Example:

Explanation:
When unrolling loops, the end result ends up being that more data is accessed and computed in parallel per loop iteration which ends up speeding up the hardware. However, the size of the generated hardware also increases because of that.
This optimization allows you to quickly and easily unroll a loop at any given amount without having to modify the C code. So for the example above that was unrolled once, it could as easily been unrolled to have five loop bodies by just changing the number of unrolled loop bodies from 2 to 5 in the optimizations page.
How To Call:
First thing that must be done is there must be a C label labeling the loop you want to unroll. In the example above, we labeled the loop we were going to unroll “L0″. You can choose any label name you want.
Next, on the high-level optimizations page, add LoopUnrolling to the selected flags list. Then for the arguments at the bottom of the page, put the label that is in the C code next to the loop you want to unroll. In this example, we put L0.
Then put how many loop bodies you want after the unrolling. If you want to fully unroll a loop, just put FULLY in that box as long as the loop has constant bounds. In this example we put the number 2 since there were 2 loop bodies after unrolling.
FullyUnroll
Fully unroll all loops in the program.
Note: This optimization is automatically called on modules. Therefore, you can only choose this on systems.
Example:

Explanation:
FullyUnroll will fully unroll all loops. Any of the stream array accesses will eventually be turned into input and output scalars rather than remaining a stream. Systems that are fully unrolled successfully will also become modules.
This is purely a shortcut optimization as the same behavior could be done by using a LoopUnrolling optimization on each of the loops on your program. LoopUnrolling is also automatically called in modules since modules do not support using loops in the hardware.
How To Call:
Simply add this flag to the selected optimizations list and it will be called, no arguments needed.
LoopFusion
Merge successive loops with the same bounds.
Note: This optimization is only available for systems.
Example:

Explanation:
LoopFusion will combine any loops that have the same bounds. This will simplify control in the hardware since there are less loops to manage. This optimization is also automatically called after any loop unrolling.
How To Call:
Add this flag into the optimizations list and that is it. No loop labels or arguments are needed.
InlineModule
Removes the architectural restrictions imposed by black box construction allowing for current optimizations to transform the internals of the inlined module.
Example:

Explanation:
When a module is not inlined, all of the internals of the module are hidden from the calling component. Inlining the module call will remove the black box construction and integrate the internals of the module into the calling component. All prior optimizations done on the compiled module are lost. Instead, the optimizations done on the calling component will now effect the now integrated internals of the inlined module. Inlining the module will also remove the registers that were necessary at the input and output when treated as a black box.
How To Call:
Add InlineModule to the selected flags list on the optimizations page. For the argument at the bottom of the page, put the name of the module that want to inline. This will inline all modules of that name in the component you are compiling.
InlineAllModules
Inlines all module calls found in a depth amount that is specified.
Example:

Explanation:
Inlines all modules found in a given component. It will continue doing this depth number times as long as new modules are introduced from inlining. If the specified depth is INFINITE, modules will continue to be inlined until none exist in the compiled component.
How To Call:
Add InlineAllModules to the selected flags list on the optimizations page. For the argument at the bottom of the page, put how deep modules should continue to be inlined. Putting INFINITE as the depth will make it continuously inline modules until non exist.
Redundancy
Enable dual or triple redundancy for a module at a given C label.
Example:

Explanation:
The user can specify for individual modules to be replicated either two or three times. The data fed into this module is also replicated and a voter is instantiated to collect the results from the redundant modules. If the values match, then the data is passed on to the rest of the circuit. If the values don’t match, then an error flag is raised to the outside world to handle the error.
When two modules have data flow between them and both are specified as redundant, then the voters are also replicated to reduce the chance of a single event upset causing an unrecoverable error.
How To Call:
First thing that must be done is there must be a C label labeling the module you want to make redundant. You can choose any label name you want. On the high-level optimizations page, add Redundancy to the selected flags list. Then for the arguments at the bottom of the page, put the label that is in the C code next to the module as the value.
Lastly, specify whether you want the module call to be DOUBLE or TRIPLE redundant. In the example above, we were using the value TRIPLE.
SystolicArrayGeneration
Transform a wave front algorithm that works over a 2-dimensional array into a one-dimensional hardware structure with feedback at every stage in order to increase throughput while reducing hardware.
Note: This optimization is only available for systems and cannot be combined with other optimizations.
Example:

Explanation:
SystolicArrayGeneration takes a wave front algorithm operating on a two-dimensional array and converts it to hardware consisting of a single dimensional array of elements that feed back to each other. The original C code must be in the form of a doubly nested for loop that calculates the value of a two-dimensional array based upon some function of the previous elements of that array.
This will greatly reduce the hardware as well as increasing the throughput for this system.
How To Call:
Your C code must be of the form described in the explanation. Then place a C label on the outer loop of the system which you can name whatever you like. Next, on the high-level optimizations page, add SystolicArrayGeneration to the selected flags list. Then for the argument at the bottom of the page, put the label that is in the C code next to the outer loop of the system.
No other high-level optimizations can be used with this optimizations.
TemporalCommonSubExpressionElimination (TCSE)
Detect and remove common code across loop iterations to reduce the size of the generated code, requiring initial values for each piece of hardware eliminated.
Note: This optimization is only available for systems.
Example:

Explanation:
Temporal common sub-expression elimination analyzes loops and detects common code across loop iterations. For example, if the same value is calculated in loop iteration 1 and loop iteration 2, this will be detected. When generating hardware, TCSE will take advantage of this fact and create feedback variables that eliminate redundant computations reducing the hardware size. Every piece of hardware that is eliminated will require initial values on startup.
How To Call:
To call TCSE, simply add TemporalCommonSubExpressionElimination to the selected flags list and it is ready to run. No labels or arguments are necessary.