JUNE 18–22, 2017

Presentation Details

Name: Support for Resilience in Parallel Applications
Time: Wednesday, June 21, 2017
02:45 pm - 03:05 pm
Room:   Panorama 1
Messe Frankfurt
Speaker:   George Bosilca, University of Tennessee
Abstract:   Over the last years, the resilience topic evolved from an open question to a more pragmatic version where the occurrences are not questioned anymore, but instead the focus is on the frequency of such radical events during the execution of applications at scale. Solutions to transparently manage faults at the system level exists, with their benefits and drawbacks. Empowering the developers to deal with the failure events instead, is a much more revolutionary approach, an approach with higher opportunities for efficiency, that needs holistic support from all layers: hardware, software and from the parallel programming paradigm. This talk will highlight application-driven techniques to survive faults and their expected costs at scale, as well as the necessary support from the programming paradigms and their runtimes.