Developerator

Getting Started with OpenCL and C++ on MacOS Catalina

I’m currently working on a parallel processing problem and decided to give OpenCL a try. OpenCL 1.2 still comes with MacOS but not with any C++ header files for it.

I wanted to use C++ instead of C, because the header file provides the safeguards that I need to get my solution up and running quicker.

Before we get started, I have a feeling you already have Xcode Command Line Tools installed, if not, we need to have it installed as I’m fairly confident it contains the OpenCL implementation files.

$ xcode-select --install

Now, we’re going to need the C++ header file for OpenCL. The best place to get it is from Khronos Group’s GitHub repository. The header file is backwards compatible and supports all versions of OpenCL.

$ curl -o /tmp/opencl.hpp https://raw.githubusercontent.com/KhronosGroup/OpenCL-CLHPP/master/include/CL/opencl.hpp

Note: Above will download the latest version of the header file. Here’s the snapshot of the file that I’m using at the time of writing this post.

Then move the header file to the OpenCL framework directory. In my case, it’s under MacOSX10.15.sdk:

$ sudo mv /tmp/opencl.hpp /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/System/Library/Frameworks/OpenCL.framework/Headers/opencl.hpp

And that’s pretty much it! Let’s give it a crack. Make a file called clDemo.cpp with the following contents:

#include <OpenCL/opencl.hpp>
#include <iostream>
#include <cassert>
#include <numeric>

int main() {
    std::vector<cl::Platform> platforms;
    cl::Platform::get(&platforms);

    assert(platforms.size() > 0);

    auto platform = platforms.front();

    std::vector<cl::Device> devices;
    platforms.front().getDevices(CL_DEVICE_TYPE_GPU, &devices);

    assert(devices.size() > 0);

    auto device = devices.front();
    auto vendor = device.getInfo<CL_DEVICE_VENDOR>();
    auto version = device.getInfo<CL_DEVICE_VERSION>();

    std::cout << "Device Vendor: " << vendor << std::endl;
    std::cout << "Device Version: " << version << std::endl;

    cl::Context context(device);
    cl::Program::Sources sources;

    std::string kernelCode =
        "   void kernel squareArray(global int* input, global int* output) {"
        "       size_t gid = get_global_id(0);"
        "       output[gid] = input[gid] * input[gid];"
        "   }";
    sources.push_back({kernelCode.c_str(), kernelCode.length()});

    cl_int exitcode = 0;

    cl::Program program(context, sources, &exitcode);
    program.build();
    assert(exitcode == CL_SUCCESS);

    cl::Kernel kernel(program, "squareArray", &exitcode);
    assert(exitcode == CL_SUCCESS);

    auto workGroupSize = kernel.getWorkGroupInfo<CL_KERNEL_WORK_GROUP_SIZE>(device);
    std::cout << "Kernel Work Group Size: " << workGroupSize << std::endl;

    std::vector<int> outVec(1024);
    std::vector<int> inVec(1024);
    std::iota(inVec.begin(), inVec.end(), 1);

    cl::Buffer inBuf(context,
                     CL_MEM_READ_ONLY | CL_MEM_HOST_NO_ACCESS | CL_MEM_COPY_HOST_PTR,
                     sizeof(int) * inVec.size(),
                     inVec.data());
    cl::Buffer outBuf(context,
                      CL_MEM_WRITE_ONLY | CL_MEM_HOST_READ_ONLY,
                      sizeof(int) * outVec.size());
    kernel.setArg(0, inBuf);
    kernel.setArg(1, outBuf);

    cl::CommandQueue queue(context, device);

    queue.enqueueNDRangeKernel(kernel, cl::NullRange, cl::NDRange(inVec.size()));
    queue.enqueueReadBuffer(outBuf, CL_TRUE, 0, sizeof(int) * outVec.size(), outVec.data());

    for (std::vector<int>::const_iterator i = outVec.begin(); i != outVec.end(); ++i)
        std::cout << *i << std::endl;

    return 0;
}

Compile and execute it:

$ clang++ clDemo.cpp \
        -framework OpenCL \
        -std=c++14 \
        -DCL_HPP_TARGET_OPENCL_VERSION=120 \
        -DCL_HPP_MINIMUM_OPENCL_VERSION=120 \
        -o clDemo
$ ./clDemo | head
Device Vendor: Intel Inc.
Device Version: OpenCL 1.2
Kernel Work Group Size: 256
1
4
9
16
25
36
49

That’s it. Good luck.