Intel oneAPI is a unified programming model and software development toolkit that provides a common platform for developing applications that can run on various architectures, including CPUs, GPUs, FPGAs, and other accelerators. The goal of Intel oneAPI is to simplify software development by providing a consistent programming model across different hardware platforms and to help developers get the best performance out of their applications. With Intel oneAPI, developers can use a single codebase to target multiple hardware platforms, saving time and effort compared to developing separate code for each platform. Additionally, Intel oneAPI includes a comprehensive set of libraries and tools to help developers optimize their applications and take full advantage of the capabilities of different hardware platforms.

Prior to Intel OneAPI, there has been another great library - Intel TBB(Threading Building Blocks), which has been now merged to Intel OneAPI, called Intel OneTBB. Inte OneTBB is a widely used C++ template library that provides a high-level, efficient, and portable abstraction for parallelism in C++. It provides a set of concurrent containers and algorithms, as well as low-level synchronization mechanisms, to help developers write parallel applications. The library is designed to be easy to use and it abstracts away the complexities of parallel programming, allowing developers to focus on the task at hand. OneTBB enables developers to write parallel code in a simple and straightforward manner, taking care of the underlying parallelism and scalability on different platforms. OneTBB also provides automatic task scheduling and load balancing, which hides the developers from low-level details such as thread creation and management, or the distribution of work among threads. This helps the developers to write code that is easy to understand, maintain, and debug, while still delivering high performance on heterogeneous parallel platforms. OneTBB has been widely used in massive applications, from scientific simulations and deep learning to computer graphics and video processing. It is a highly optimised library, with performance that is competitive with hand-written parallel code from domain experts, and it is designed to be highly scalable, so that applications can take advantage of the parallel processing capabilities of modern multi-core and many-core systems.

During my several years of parallel and distributed programming, I’ll never forget the time I tried to debug a static library issue - “libtbb not found.” Many parallel libraries depend on Intel TBB. When programming with C++ on a Linux machine, managing threads is not an easy task, as there are many potential issues to be aware of, such as locks, synchronization, and semaphores. For example, if I want to perform concurrent operations on an STL vector, that can be a challenge. But with the concurrent vector from Intel TBB, life becomes much easier. All I need to do is use the concurrent vector from Intel TBB just as I would an STL vector, and the job is done.

Intel OneAPI is such a wonderful framework, however, there is an interesting question - can I use Intel OneAPI on an Arm machine? As both architectures are quite different. I googled with key words like Intel OneAPI on Arm and Intel TBB on Arm, but I couldn’t find any clear answers. As I have got an Oracle Arm Ampere A1 Compute at hand, I had a try.

Below are the system information of my Arm machine,

  • OS - Oracle Linux 5.4.17-2102.204.4.4.el8uek.aarch64
  • Ram - 24GB
  • CPU - 4 cores

I started by downloading the installation package from Intel OneAPI’s website. I selected Linux and the Offline installation package. The Yum package should also work fine with my operating system.

1wget https://registrationcenter-download.intel.com/akdlm/irc_nas/19079/l_BaseKit_p_2023.0.0.25537_offline.sh
2sudo sh ./l_BaseKit_p_2023.0.0.25537_offline.sh

To my surprise, there was an error message returned. It seems that the package was extracted correctly on my Arm machine, but an executable file couldn’t be executed. This is likely due to the differences between the architectures.

Do not worry, there is an alternative way - we can go to Intel OneTBB’s repository and try to build it from source. Intel has also provided an installation guide, so I tried with their instructions,

1mkdir build
2cd build
3cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX=/opt/library/oneTBB -DCMAKE_BUILD_TYPE=RELEASE ..
4make

where I specified my installation path as “/opt/library/oneTBB”. Release mode can significantly reduce the building time. I also turned off the multi-task building. After some time, it has been successfully built on my machine! And then I was really happy to install and test it,

1make install

When writing a Hello World with CMake, just add the following lines and Intel OneTBB will be configured,

1include_directories(/opt/library/oneTBB/include)
2set(TBB_DIR /opt/library/oneTBB/lib64/cmake/TBB/)
3link_directories("/opt/library/oneTBB/lib64)
4target_link_libraries(Your_project_name TBB::tbb pthread)

Because we installed Intel OneTBB from source, the TBB_DIR may not be auto-configured thus my CMake could not detect it at the first time, thus we need to set TBB_DIR manually. Replace the path in include_directories and link_directories with yours, as they contain header files and linked libraries. We need finally specify our linker with TBB::tbb and pthread to link our project. Although Intel’s documentation suggests linking with TBB::Thread, pthread works fine for me.

That’s it, done! Have a try on Intel OneTBB on your Arm machine and happy coding! This is also the first time I write a post with ChatGPT as a copilot.