Saurabh Paul, Christos Boutsidis, et al.
JMLR
We perform a detailed flop and bandwidth analysis of Jos Stam's Stable Fluids algorithm on the CPU, GPU, and Cell. In all three cases, we find that the algorithm is bandwidth bound, with the cores sitting idle up to 96% of the time. Knowing this, we propose two modifications to accelerate the algorithm. First, a Mehrstellen discretization for the pressure solver which reduces the running time of the solver by a third. Second, a static caching scheme that eliminates roughly 99% of the random lookups in the advection stage. We observe a 2x speedup in the advection stage using this scheme. Both modifications apply equally well to all three architectures. Copyright © 2008 by the Association for Computing Machinery, Inc.
Saurabh Paul, Christos Boutsidis, et al.
JMLR
Joxan Jaffar
Journal of the ACM
Rakesh Mohan, Ramakant Nevatia
IEEE Transactions on Pattern Analysis and Machine Intelligence
Cristina Cornelio, Judy Goldsmith, et al.
JAIR