JUNE 18–22, 2017

Presentation Details

Name: Evaluating Parallel Application Resiliency with the Software Fault Injector, PFSEFI
Time: Wednesday, June 21, 2017
02:05 pm - 02:25 pm
Room:   Panorama 1
Messe Frankfurt
Speaker:   Nathan DeBardeleben, Los Alamos National Laboratory
Abstract:   Application resiliency to faults is a concern as supercomputers grow to ever larger sizes while the semiconductor industry shrinks components and carefully reduces voltage to minimize power use.  Users of today's supercomputers need to plan for tomorrow's systems and the parallel software fault injection tool, PFSEFI, can help by evaluating application resiliency and vulnerability.  In this talk we will discuss PFSEFI and see how insights gained from using the tool can be used to quantify application vulnerability to silent data corruption and look at techniques to improve application resilience.