JUNE 18–22, 2017

Session Details

Name: Tutorial 08: Introduction to Manycore Programming on Intel’s 2nd Generation Xeon Phi (Knights Landing)
Time: Sunday, June 18, 2017
09:00 am - 01:00 pm
Room:   Expose  
Breaks:08:00 am - 10:00 am Welcome Coffee
11:00 am - 11:30 am Coffee Break
01:00 pm - 02:00 pm Lunch
Presenter:   John Cazes, TACC
  Robert Evans, TACC
  Kent Milfeld, TACC
  Cyrus Proctor, TACC
Abstract:   This introductory training is designed for experienced programmers familiar with OpenMP and MPI who wish to familiarize themselves with Intel’s second generation manycore processor, the Xeon Phi “Knights Landing”(KNL). We will discuss the evolution of processors from multicore to manycore architectures and cover the basics of vectorization, multi-threaded programming, memory affinity, load balance, and hybrid execution. We will also provide an overview of the KNL hardware and its various modes of operation. This session will include hands-on exercises that demonstrate the techniques discussed on the KNL-upgraded Stampede system at the Texas Advanced Computing Center (TACC). The KNL processor brings many changes from the first generation, Knights Corner (KNC). The new processor supports self-hosted nodes, connects cores via a mesh topology rather than a ring, and uses a new memory technology, MCDRAM. Many of the lessons learned from using KNC still apply, such as efficient multi-threading, optimized vectorization, and strided memory access. The similarities and differences with KNC and regular Xeon processors will be discussed. In lab sessions, students will examine the variations in performance between the different cluster modes and configurations of MCDRAM, experiment with affinity settings to properly bind processes and threads, and investigate the effects of vectorization.  

Content Level 
40% Beginner 40% Intermediate 20% Advanced  

Targeted Audience 
This tutorial is intended for application developers who wish to port their codes to supercomputers consisting of manycore processors; it will also help users take advantage of multicore processors that support vectorization.