This test involved timing different implementations of functionally identical code. The code consists of an inner and outer ‘for’ loop, which has been implemented as inline code, repeated function calls, recursion, and forking child processes that executed the inner loop. The different implementations were proved to be functionally identical, by calculating a sum as a by-product of the nested looping, and making sure that all implementations returned identical sums.
We tried to ensure that the compiler optimization does not override our attempts to measure the time differences, by ensuring reuse of variables that get changed inside the loops. Our goal here was not to time the actual function invocation or forking times, but to measure the relative execution times for different implementations of the same functionality.
There were a simple pair of nested loops in the main(), with sums being calculated within the inner loop and accumulated in the outer loop. (The malloc is redundant in all these tests since the malloc-free operations are always performed within the same scope)
The outer loop was in the main() while the rest of the computation was performed as a function call. The overhead incurred here is the cost of 'ntimes' function calls.
The outer loop corresponds to stepping down through the recursion, and the inner loop was implemented on the unwinding. This corresponds to a recursion that was 'ntimes' levels deep.
The outer loop was in the main() while the inner loop computation was done in a child process. The child process would only perform the inner loop computation, send a return value (partial sum) to the parent process by writing an integer into a system pipe and terminate using exit(0).The parent would wait for the child to terminate, and after executing 'wait(&stat)' it would read the integer from the pipe and continue the outer loop. The 'wait()' was required to ensure that the process table entries for the child processes were deleted and did not remain as zombie processes.
/*** Process - inline vs fn vs recursion vs fork ***/ /* Inline */
The following graphs shows the results for Linux and Windows
These plots show the execution times for the different implementations as a function of the number of iterations/fn calls/recursion levels/processes forked.
All three implementation schemes scale linearly with the number of repetitions. The larger slope of the line for recursion indicates it does not scale as well as the inline and function implementations.
An interesting observation is that without compiler optimization the inline implementation was faster than function calls. But with compiler optimization (O3) the function call implementation ran faster than the optimized inline code!
Figure 29 : Graph for Inline Vs Function Vs Recursion on Windows
No such linear trends were observed for the different implementations on Windows beyond a certain number of repetitions. This could be attributed to the ‘cygwin’ environment on which the tests were run.