A parallel Back-Propagation(BP) neural network training technique using Compute Unified Device Architecture (CUDA) on multiple Graphics Processing Units(GPUs) is proposed. To exploit the maximum performance of GPUs, we propose to implement batch mode BP training by building input neurons, hidden neurons and output neurons into matrix form. The implementation includes CUDA Basic Linear Algebra Subroutines (cuBLAS) function to perform matrix and vector operations and CUDA kernel. The proposed technique utilizes multiple GPUs to achieve further acceleration. Each GPU has the same neural network structure and weight parameter. The number of training samples are distributed to multiple GPUs. Each GPU calculates local training error and the gradient at each layer then transferred to the first GPU to calculate the summations. The summations are transferred back to each GPU to update the local weights until the training goal is achieved. A cavity microwave bandpass filter example is used to illustrate the validity of this technique.

Additional Metadata
Keywords Back Propagation, cuBLAS, CUDA, GPGPU, Neural Network
Persistent URL dx.doi.org/10.1109/NEMO.2015.7415056
Conference IEEE MTT-S International Conference on Numerical Electromagnetic and Multiphysics Modeling and Optimization, NEMO 2015
Citation
Zhang, S. (Shunlu), Gunupudi, P, & Zhang, Q.J. (2016). Parallel back-propagation neural network training technique using CUDA on multiple GPUs. Presented at the IEEE MTT-S International Conference on Numerical Electromagnetic and Multiphysics Modeling and Optimization, NEMO 2015. doi:10.1109/NEMO.2015.7415056