|Name:||(RP08) Pascal vs KNL: Performance Evaluation with ICCG Solver|
|Time:||Tuesday, June 20, 2017
08:35 am - 09:45 am
|Breaks:||07:30 am - 10:00 am Welcome Coffee|
|Presenter:||Tetsuya Hoshino, University of Tokyo|
Preconditioned Conjugate Gradient Method by Incomplete Cholesky Factorization (ICCG) solver is widely used as a solver of simultaneous linear equations for sparse matrix in scientific computation. ICCG solver requires coloring and reordering for parallelization, however, the most suitable coloring and storage of matrix methods depend on the performance characteristics of target devices. In this poster, to evaluate new many-core architectures; NVIDIA Pascal GPU and Intel Xeon Phi Knights Landing, we parallelized an ICCG solver with OpenACC and OpenMP. The experimental results shows that the most suitable ordering and storage of matrix methods are different between Pascal GPU and Knights Landing; SELL-64-1 format with coalesced numbering is the best for Pascal GPU and SELL-8-1- format with sequential numbering is the best for Knights Landing. We also applied a series of optimizations, which reduces synchronization costs of OpenACC and OpenMP. We achieved 1.21x and 1.20x performance improvement by the optimizations on P100 and KNL, respectively.
Tetsuya Hoshino, The University of Tokyo
RP08_Hoshino.pdf (12372 KB)