JUNE 18–22, 2017

Presentation Details

Name: (RP03) A Portable Distributed Sparse Grid Density Estimation for Big Data Clustering
Time: Tuesday, June 20, 2017
08:35 am - 09:45 am
Room:   Substanz 1+2  
Breaks:07:30 am - 10:00 am Welcome Coffee
Presenter:   David Pfander, University of Stuttgart
The clustering of data points is one of the central tasks in data mining. For Big Data scenarios with millions to billions of data points, highly-efficient algorithms are required. We present an accelerator-enabled distributed clustering algorithm. It is based on a spatial discretization using sparse grids. Our clustering algorithm uses density estimation of the dataset to prune a nearest neighbor graph of the dataset. A key benefit of the sparse grid density estimation is that it scales linearly in the size of the dataset and it is therefore well-suited for vast datasets. We have realized efficient implementations in OpenCl that efficiently exploit CPUs and accelerator cards of different vendors. First results show a good scaling behavior on 64 nodes of Piz Daint, a large Nvidia Pascal installation, for synthetic datasets with up to 10 dimensions and 10 million data points. On the node-level, we achieve between 23% and 50% of the peak performance on hardware platforms of different vendors. As we are limited to two thirds of the peak performance due to the instruction mix, we achieve up to 76% of the practically possible peak performance. Our approach displays good scalability, high node-level performance and performance portability.

David Pfander, Universität Stuttgart
Gregor Daiß, Universität Stuttgart
Dirk Pflüger, Universität Stuttgart

RP03_Pfander.pdf (14147 KB)