|Name:||The Best of Both Worlds? Combining HPC & Big Data Methods for Bioinformatics Applications|
|Time:||Tuesday, June 20, 2017
03:45 pm - 04:05 pm
|Breaks:||03:15 pm - 03:45 pm Coffee Break|
|Speaker:||Andreas Hildebrandt, University of Mainz|
Recent years have seen a tremendous increase in the volume of data generated in the life sciences. The
analysis of these data sets poses difficult computational challenges and is an active field of research. Currently,
a popular strategy in data rich scenarios across many areas of science and industry is to adopt
big data technology, a class of highly scalable and fault-tolerant systems for handling huge data sets.
However, the characteristics of typical biological data sets and their intended usage differ significantly
from most other application areas of big data technologies: while biological data sets are
often remarkably large, they tend to be smaller than typical ’big’ data sets. On the other hand, biological
data processing often requires more complex analysis techniques than can be afforded by big data technology,
which is often constrained to algorithms with linear or sub-linear complexity.
Consequently, the computational life sciences today tend to rely on classical high performance computing
(HPC) paradigms instead. Conceptually, most HPC methods work at a lower level of abstraction than common
big data techniques and often demand a much deeper understanding for the intricacies of parallel computing.
In my talk, I will demonstrate how hybrid approaches, combining ideas from big data with HPC methodologies,
can help flexible and highly performant HPC methods to scale towards large data sets. At the same time, much of
the domain specific programming can be performed at a high level of abstraction, similar to classical big data
approaches, enabling the user to focus more on the biological problem at hand than at the low level primitives.