The access latency of branch predictors is a well known problem of fetch engine design. Prediction overriding techniques are commonly accepted to overcome this problem. However, prediction overriding requires a complex recovery mechanism to discard the wrong speculative work based on overridden predictions. In this paper, we show that stream and trace predictors, which use long basic prediction units, can tolerate access latency without needing overriding, thus reducing fetch engine complexity. We show that both the stream fetch engine and the trace cache architecture not using overriding outperform other efficient fetch engines, such as an EV8-like fetch architecture or the FTB fetch engine, even when they do use overriding.

Additional Metadata
Persistent URL dx.doi.org/10.1109/IWIA.2003.1262780
Conference Innovative Architecture for Future Generation High-Performance Processors and Systems, IWIA 2003
Citation
Santana, O.J. (Oliverio J.), Ramirez, A, & Valero, M. (Mateo). (2003). Latency tolerant branch predictors. In Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems (pp. 30–39). doi:10.1109/IWIA.2003.1262780