Parallel Computing

With Python

Jason Champion

@Xangis

github.com/Xangis

Why GPGPU?

Supercomputing with high performance per watt.

Physics simulations, medical, oil and gas exploration.

Weather, particle systems, gravity.

Gigaflops Performance

(Approximate)

Intel Core 2 Duo: 20 GFLOPs
Intel i5, AMD Phenom II x4: 50 GFLOPs
Intel i7: 100 GFLOPs
NVIDIA Geforce 630M: 300 GFLOPs
NVIDIA Geforce 640, 650M, 750M: 700 GLFOPs
Intel Xeon Phi: 1000-1200 GFLOPs
NVIDIA GTX 760, AMD Radeon 6950: 2200 GFLOPS
NVIDIA Geforce GTX Titan: 4500 GFLOPS
AMD Radeon 7990: 8200 GFLOPS (face melt!)

Speed of the FASTEST Supercomputer

1993: 60 GFLOPs (~ Intel Core i5)

1995: 220 GFLOPs (~ Modern Dual Xeon)

1997: 1.3 TFLOPs (~ Intel Xeon Phi)

1999: 2.4 TFLOPs (~ Radeon 6950)

2002: 35 TFLOPs (~ Quad Radeon 7990)

2005: 280 TFLOPs

2008: 1 PFLOP

2010: 10 PFLOPs

2013: 33 PFLOPs

A $100 million 1998-era supercomputer can be had for $200.

Toys!

Intel Xeon Phi

Toys!

NVIDIA Geforce GTX Titan

Toys!

AMD Radeon 7990

Before GPGPU

Win32 Threads

The "dinner from a diaper" of parallelism. In C.

Being Replaced by C++ AMP, but still Windows-only.

Pthreads

"Old reliable" works great for non-GPU uses once you know it.

OpenMP

Makes life easy and in the multicore CPU world.

(If you're doing multiprocessor in notPython, it's well worth learning)

Early GPGPU

No dedicated computing language or APIs.

People used programmable graphics shaders
to perform calculations.

Slightly less fun than writing assembly language.

NVIDIA CUDA

Modified C programming language.

First general-purpose GPU computing API.

Only runs on NVIDIA hardware.

OpenCL

More complex than CUDA.

General-purpose C-based computing API.

Standards by the Khronos group (same as OpenGL, similar API).

Runs on CPUs and GPUs.

PyOpenCL was used for poclbm OpenCL bitcoin miner.

Python for Supercomputing

Fortran, C, and Assembly consistently benchmark as

the fastest programming languages.

Global interpreter lock prevents CPU-based

implementations of OpenMP from being good in Python.

Python is widely known as being slow,

so why use it for supercomputing?

Python for Supercomputing

It's optimized for the developer, and programmer time often costs more than CPU time.

Python has awesome libraries, especially for

data visualizaton.

Fast prototyping.

Can easily (-ish) port to C if more speed is needed

after the idea is proven, but you won't need to.

Prerequisites

AMD:

The AMD APP SDK

NVIDIA:

The NVIDIA CUDA SDK

Intel:

The Intel Xeon Phi SDK

Get them at the manufacturer website (see resources)

NumPy, SciPy (pip handles these)

Installing PyOpenCL

For the lazy Linux user:

sudo apt-get install python-pyopencl

It's on PyPI:

https://pypi.python.org/pypi/pyopencl

(pip install pyopencl)

Apple + AMD users are out of luck.

Pitfalls / Hassles

Ubuntu 13.10 w/NVIDIA Optimus (bumblebee) not awesome. One of the many reasons Linus gave NVIDIA the finger.

Installing PyCUDA

For the lazy Linux user:

sudo apt-get install python-pycuda

It's on PyPI:

https://pypi.python.org/pypi/pycuda

(pip install pycuda)

Trivial Example:

Mandelbrot Set Fractal

PyCUDA

[Source in Console]

From:

http://craneium.net/index.php?option=com_content&view=category&layout=blog&id=37&Itemid=97

Less Trivial Examples:

N-body Gravity Simulator

CUDA

Ocean Simulator

OpenCL

[See video]

Cryptocurrency on the GPU:

Don't Do It!

GPUs are much better for cryptographic calculations than CPUs.

Bitcoin doesn't use GPUs anymore, it uses custom hardware.

Litecoins and Feathercoins use GPU reasonably well.

Your return on investment, including hardware amortization and electricity costs, will be breakeven at best.

Just buy the coins on an exchange. Or create a better cryptocurrency that doesn't depend on environment-damaging power use.

OpenGL Interop

Your data is already on the GPU. Why not just render it?

Memory copies are suddenly not a problem.

That's what's happening with the ocean simulator.

Pitfalls

You have to know where your memory is.

You have to learn a good deal about GPU architecture to use it well.

You have to think way too much about what memory you're using and what data you're copying where.

The actual GPU programs are still in C.

Did I mention thinking too much about memory?

Resources

Lots of Books. Pretty much all of them focus on C:

CUDA by Example

The CUDA Handbook

OpenCL Programming Guide

Website of Andreas Klöcker, creator of PyCUDA and PyOpenCL

http://mathema.tician.de/software/

Good documentation and examples on the wiki.

This presentation and trivial test apps for CUDA and OpenCL:

https://slid.es/xangis/parallel-computing/ ; http://github.com/Xangis

More Resources

NVIDIA CUDA SDK:

https://developer.nvidia.com/cuda-downloads

LOTS of great code samples in the SDK.

AMD APP SDK (OpenCL):

http://developer.amd.com/tools-and-sdks/heterogeneous-computing/amd-accelerated-parallel-processing-app-sdk/

Intel OpenCL SDK:

http://software.intel.com/en-us/vcsource/tools/opencl-sdk