Israel Cidon, Leonidas Georgiadis, et al.
IEEE/ACM Transactions on Networking
Modern single-CPU microprocessors exploit instruction-level parallelism (ILP) by deriving their performance advantage mainly from parallel execution of ALU and memory instructions within a single clock cycle. This performance advantage obtained by exploiting data ILP is severely offset by sequential execution of conditional branches, especially in branch-intensive non-numerical code. Consequently, branch ILP must also be exploited by executing branches and data instructions in parallel. This requires compilation support for scheduling branches as well as architectural support for executing branches and data instructions in the same cycle. This paper performs a comprehensive empirical study aimed at evaluating the performance impact of exploiting branch ILP using a representation of ILP code called tree representation, which has been proposed by Nicolau [A. Nicolau (1985), Technical Report TR-85-678, Cornell University, Ithaca, NY] and Ebcioglu to exploit branch ILP in the most generalized form. Our results indicate that exploiting branch ILP can enhance performance substantially (i.e., as much as a geometric mean of speedup 4.5 in the 16 -ALU machine, compared to the base speedup 3.0) and that the performance benefit comes not only from the intended parallel execution but from the decrease of useless speculative execution due to earlier scheduling of branches.
Israel Cidon, Leonidas Georgiadis, et al.
IEEE/ACM Transactions on Networking
Fan Zhang, Junwei Cao, et al.
IEEE TETC
B. Wagle
EJOR
S.F. Fan, W.B. Yun, et al.
Proceedings of SPIE 1989