Abstract
Remote Direct Memory Access (RDMA) is a mechanism whereby data is moved directly between the application memory of the local and remote computer. In bypassing the operating system, RDMA significantly reduces the CPU cost of large data transfers and eliminates intermediate copying across buffers, thereby making it very attractive for implementing distributed applications. With the advent of hardware implementations of RDMA over Ethernet (iWARP), its advantages have become even more obvious. In this paper we analyze the applicability of RDMA and identify hidden costs in the setup of its interactions that, if not handled carefully, remove any performance advantage, especially in hardware implementations. From an application point of view, the major difference to TCP/IP based communication is that the buffer management has to be done explicitly by the application. Without the proper optimizations, RDMA loses all its advantages. We discuss the problem in detail, analyze what applications can profit from RDMA, present a number of optimization strategies, and show through extensive performance experiments that these optimizations make a substantial difference in the overall performance of RDMA based applications. © 2009 IEEE.