With Python
Jason Champion
@Xangis
github.com/Xangis
Supercomputing with high performance per watt.
Physics simulations, medical, oil and gas exploration.
Weather, particle systems, gravity.
(Approximate)
1993: 60 GFLOPs (~ Intel Core i5)
1995: 220 GFLOPs (~ Modern Dual Xeon)
1997: 1.3 TFLOPs (~ Intel Xeon Phi)
1999: 2.4 TFLOPs (~ Radeon 6950)
2002: 35 TFLOPs (~ Quad Radeon 7990)
2005: 280 TFLOPs
2008: 1 PFLOP
2010: 10 PFLOPs
2013: 33 PFLOPs
A $100 million 1998-era supercomputer can be had for $200.
Intel Xeon Phi
NVIDIA Geforce GTX Titan
AMD Radeon 7990
Win32 Threads
The "dinner from a diaper" of parallelism. In C.
Being Replaced by C++ AMP, but still Windows-only.
Pthreads
"Old reliable" works great for non-GPU uses once you know it.
OpenMP
Makes life easy and in the multicore CPU world.
(If you're doing multiprocessor in notPython, it's well worth learning)
Modified C programming language.
First general-purpose GPU computing API.
Only runs on NVIDIA hardware.
More complex than CUDA.
General-purpose C-based computing API.
Standards by the Khronos group (same as OpenGL, similar API).
Runs on CPUs and GPUs.
PyOpenCL was used for poclbm OpenCL bitcoin miner.
Fortran, C, and Assembly consistently benchmark as
the fastest programming languages.
Global interpreter lock prevents CPU-based
implementations of OpenMP from being good in Python.
Python is widely known as being slow,
so why use it for supercomputing?
It's optimized for the developer, and programmer time often costs more than CPU time.
Python has awesome libraries, especially for
data visualizaton.
Fast prototyping.
Can easily (-ish) port to C if more speed is needed
after the idea is proven, but you won't need to.
AMD:
The AMD APP SDK
NVIDIA:
The NVIDIA CUDA SDK
Intel:
The Intel Xeon Phi SDK
Get them at the manufacturer website (see resources)
NumPy, SciPy (pip handles these)
For the lazy Linux user:
sudo apt-get install python-pyopencl
It's on PyPI:
https://pypi.python.org/pypi/pyopencl
(pip install pyopencl)
Apple + AMD users are out of luck.
Ubuntu 13.10 w/NVIDIA Optimus (bumblebee) not awesome. One of the many reasons Linus gave NVIDIA the finger.
For the lazy Linux user:
sudo apt-get install python-pycuda
It's on PyPI:
https://pypi.python.org/pypi/pycuda
(pip install pycuda)
PyCUDA
[Source in Console]
From:
http://craneium.net/index.php?option=com_content&view=category&layout=blog&id=37&Itemid=97
CUDA
Ocean Simulator
OpenCL
[See video]
GPUs are much better for cryptographic calculations than CPUs.
Bitcoin doesn't use GPUs anymore, it uses custom hardware.
Litecoins and Feathercoins use GPU reasonably well.
Your return on investment, including hardware amortization and electricity costs, will be breakeven at best.
Just buy the coins on an exchange. Or create a better cryptocurrency that doesn't depend on environment-damaging power use.
Your data is already on the GPU. Why not just render it?
Memory copies are suddenly not a problem.
That's what's happening with the ocean simulator.
You have to know where your memory is.
You have to learn a good deal about GPU architecture to use it well.
You have to think way too much about what memory you're using and what data you're copying where.
The actual GPU programs are still in C.
Did I mention thinking too much about memory?
Lots of Books. Pretty much all of them focus on C:
CUDA by Example
The CUDA Handbook
OpenCL Programming Guide
Website of Andreas Klöcker, creator of PyCUDA and PyOpenCL
http://mathema.tician.de/software/
Good documentation and examples on the wiki.
This presentation and trivial test apps for CUDA and OpenCL:
https://slid.es/xangis/parallel-computing/ ; http://github.com/Xangis
NVIDIA CUDA SDK:
https://developer.nvidia.com/cuda-downloads
LOTS of great code samples in the SDK.
AMD APP SDK (OpenCL):
http://developer.amd.com/tools-and-sdks/heterogeneous-computing/amd-accelerated-parallel-processing-app-sdk/
Intel OpenCL SDK:
http://software.intel.com/en-us/vcsource/tools/opencl-sdk