SDFS: Secure distributed file system for data-at-rest security for Hadoop-as-a-service
Abstract
Cloud service providers are offering the popular Hadoop analytics platform following an «as-a-service» model, i.e. clusters of machines in their cloud infrastructures pre-configured with Hadoop software. Such offerings lower the cost and complexity of deploying a comparable system on-premises, however security considerations and in particular data confidentiality hamper wider adoption of such services by enterprises that handle data of sensitive nature. In this paper, we describe our efforts in providing security for data-at-rest (i.e. data that is stored) when Hadoop is offered as a cloud service. We analyze the requirements and architecture for such service and further describe a new distributed file system that we developed for Hadoop called SDFS, towards supporting this premise. We analyze parameter tuning for SDFS and through experiments on a real test-bed we evaluate its performance. We further present simulation results that explore the parameter space and can guide tuning.