Parallel computing has become one of the essential techniques among industry and academy. As a programmer, not only understanding the theory of parallel computing is important, but learn how to work with some parallel computing library is also critical. Today, I am going to introduce a list of parallel computing libraries for C++ and hopefully would benefit to C++ programmers!

Firstly, I would like to divide parallel computing libraries into several categories and thus I can introduce them one by one. Based on the implementational strategy of the parallel computing, the numerous parallel computing libraries can be divided as

  • Data-centric parallel computing libraries
  • Task-based parallel computing libraries
  • Message passing parallel computing libraries
  • Shared memory parallel computing libraries
  • Algorithmic skeleton based parallel computing lirbaries
  • Other

I will start from category Other firstly. For this bunch of libraries, there are pthreads, std::thread, and future from C++ STL. Pthreads, or POSIX Threads, is an execution model that exists independently from a language, as well as a parallel execution model. Senior folks who have worked with C may know pthreads well very. Though it is not a C++ specific parallel computing library, it still works well with modern C++ after several years! Almost all the operating systems that people use every day have implementations of POSIX, making it a general and cross platform library.

Std::thread can be viewed as an encapsulation of pthreads to work better with modern C++. The low-level implementation of std::thread is based on pthread and there is nearly no overhead(in memory and time) comparing with pthreads. As a result, I would recommend to choose std::thread over pthreads if you want to directly manipulate threads.(Note that manipulating threads on Windows is quite different with that on Unix systems).

Furthermore, there are also future and async that provides APIs for asynchronous operations - then there is no need to manually operate with threads and locks/atomics!

We can see that categery Other are pretty low-level, and maybe many programmers would no choose them though they could be potential of performance.

Messaging passing is a popular parallel programming paradigm. MPI is the standard for messaging passing based programming. Shared memory programming model is yet another popular parallel programming model, of which openMP is a de facto implementation - it is quite difficult to compare the popularity of MPI and OpenMP as they are both widely used libraries! And nowadays, developer have integrated MPI with OpenMP, making OpenMP programmers operating distributed machines with messages while “treating” them as a shared memory - quite a fantastic work!

Then, below are other categories that are important but seem to be of “less” popularity.

Taskflow is a task based parallel computing model. It is a general purpose and heterogeneous library that can run programs in CPU and GPU! The tasks are easy to implement - they can be drawn as easy as a directed graph and then your task-based parallel computing program will be there!

For algorithmic skeleton based libraries, eSkel is the first one(though it was originally designed for C not C++!). FunctionalPlus is a modern functional programming library and it also provides algorithmic skeletons. For example, you can map a collection with its transform function. There is also an awesome library called SkePU. SkePU provides containers as well as algorithm skeletons for CPU and GPU. Concurrent data structures such as libcds provides concurrent containers that act as data-centric parallel computing.

I do not know how to categorise the below libraries - SYCL and Stapl. SYCL enables code for heterogeneous and offload processors to be written using modern C++. It is more like a standard, not a library, as there are several implementations for SYCL. It is not easy to intereact with SYCL. Stapl(Standard Template Adaptive Parallel Library), as the name suggests, it is a framework for developing parallel C++ program. It can almost be used like a STL implementation that we use in everyday programming such as a std::vector or std::transform, but there are some different between our ordinary one and the parallel one.

The list will be extended/updated occasionally.