Fetch engine performance is a key topic in superscalar processors, since it limits the instructionlevel parallelism that can be exploited by the execution core. In the search of high performance, the fetch engine has evolved toward more efficient designs, but its complexity has also increased. In this paper, we present the stream fetch engine, a novel architecture based on the execution of long streams of sequential instructions, taking maximum advantage of code layout optimizations. We describe our design in detail, showing that it achieves high fetch performance, while requiring less complexity than other state-of-the-art fetch architectures.

Additional Metadata
Keywords Branch prediction, Design, Fetch architecture, High performance, Instruction stream, Low complexity, Performance
Persistent URL dx.doi.org/10.1145/1011528.1011532
Journal ACM Transactions on Architecture and Code Optimization
Citation
Santana, O.J. (Oliverio J.), Ramirez, A, Larriba-Pey, J.L. (Josep L.), & Valero, M. (Mateo). (2004). A Low-Complexity Fetch Architecture for High-Performance Superscalar Processors. ACM Transactions on Architecture and Code Optimization, 1(2), 220–245. doi:10.1145/1011528.1011532