Abstract
Cell/B.E. is a heterogeneous multicore processor that was designed for the efficient execution of parallel and vectorizable applications with high computation and memory requirements. The transition to multicores introduces the challenge of providing tools that help programmers tune the code running on these architectures. Tracing tools, in particular, often help locate performance problems related to thread and process communication. A major impediment to implementing tracing on Cell is the absence of a common clock that can be accessed at low cost from all cores. The OS clock is costly to access from the auxiliary cores and the hardware timers cannot be simultaneously set on all the cores. In this paper, we describe an offline trace analysis algorithm that assigns wall-clock time to trace records based on their thread-local time stamps and event order. Our experiments on several Cell SDK workloads show that the indeterminism in assigning wall-clock time to events is low, on average 20-40 clock ticks (translating into 1.4-2.8μs on the system used in our experiments). We also show how various practical problems, such as the imprecision of time measurement, can be overcome. © 2009 John Wiley & Sons, Ltd.