Project

Data staging in HPC cluster

The supercomputers, which are essential computational infrastructure in modern science, are mostly built as high-performance computing (HPC) cluster which is composed of many computing nodes connected by a network. The HPC clusters often adopt computing nodes with high-speed storage to provide higher I/O performance than a shared file system. These HPC clusters need data staging, a function to move input data from the shared file system to computing nodes and move output data from the computing nodes to the shared file system. The data staging will take an important role in the future as data-intensive computing such as machine learning is focused.

Research subject

The data staging derives large scale of traffic on a network connecting computing nodes for moving input and output data between the shared file system and the computing nodes. In this network, the traffic of inter-process communication also flows and consequently mutual interference between both types of traffic may degrade network performance. For example, burst traffic derived from the data staging increases delay in inter-process communication. Also, both types of traffic compete network bandwidth and consequently communication time is increased. In this research, we tackle the problem caused by the mutual interference based on the idea which the problm can be avoided by using Software-defined networking (SDN).