Distributed File System (DFS) is a key component in cloud and data center networking. Frequent hardware failure and network bottlenecks in the underlying infrastructure degrade the performance significantly. Replica technique provides enhanced fault tolerance by storing multiple replicas of a single data block. In the Hadoop platform, the Hadoop Distributed File System (HDFS) handles data storage and provides replica placement services. The Default HDFS replica engine adopts a simple rack aware policy and is designed to improve fault tolerance by storing data blocks in multiple racks. However, the HDFS replica engine does not consider key performance indicators of data center resources such as rack utilization and node storage utilization. Furthermore, in HDFS data is stored as uniformly divided small-sized blocks, which increases traffic flow during the entire file access, therefore degrading the response time. In this research, we propose a Storage and Rack Sensitive (SRS) replica placement algorithm that aims at improving the rack and storage utilization of data center resources. The proposed algorithm also attempts to optimize traffic flow during file access by storing data as original files instead of small uniform blocks. Experimental results of the proposed SRS algorithm are compared against the default HDFS replica distribution and significant improvement on rack-utilization and storage-utilization were observed. Furthermore, latest literature confirms that the 'Data as a File' approach indeed decreases the amount of data flow caused by file access traffic.

, , , , ,
2020 International Conference on COMmunication Systems and NETworkS, COMSNETS 2020
Department of Systems and Computer Engineering

Venkataramanachary, V. (Vinay), Reveron, E. (Enrique), & Shi, W. (2020). Storage and Rack Sensitive Replica Placement Algorithm for Distributed Platform with Data as Files. In 2020 International Conference on COMmunication Systems and NETworkS, COMSNETS 2020 (pp. 535–538). doi:10.1109/COMSNETS48256.2020.9027494