Workload characterization and optimization of high-performance text indexing on the cell broadband enginetm (Cell/B.E.)
Abstract
In this paper we examine text indexing on the Cell Broadband EngineTM (Cell/B.E.), an emerging workload on an emerging multicore architecture. The Cell Broadband Engine is a microprocessor jointly developed by Sony Computer Entertainment, Toshiba, and IBM (herein, we refer to it simply as the "Cell"). The importance of text indexing is growing not only because it is the core task of commercial and enterprise-level search engines, but also because it appears more and more frequently in desktop and mobile applications, and on network appliances. Text indexing is a computationally intensive task. Multi-core processors promise a multiplicative increase in compute power, but this power is fully available only if workloads exhibit the right amount and kind of parallelism. We present the challenges and the results of mapping text indexing tasks to the Cell processor. The Cell has become known as a platform capable of impressive performance, but only when algorithms have been parallelized with attention paid to its hardware peculiarities (expensive branching, wide SIMD units, small local memories). We propose a parallel software design that provides essential text indexing features at a high throughput (161 Mbyte/s per chip on Wikipedia inputs) and we present a performance analysis that details the resources absorbed by each subtask. Not only does this result affect traditional applications, but it also enables new ones such as live network traffic indexing for security forensics, until now believed to be too computationally demanding to be performed in real time. We conclude that, at the cost of a radical algorithmic redesign, our Cell-based solution delivers a 4× performance advantage over recent commodity machine like the Intel Q6600. In a per-chip comparison, ours is the fastest text indexer that we are aware of. © 2009 IEEE.