Due to the difference in hardware structure, Desktop GPU and Mali GPU are different. Therefore, when using the Mali GPU, there are restrictions on the use of some functions.
Read the ARM® Mali™ GPU OpenCL Developer Guide, which can have a huge impact on performance.
Mali memory model
Mali GPU uses global memory with cache, instead of using local or private memory. If local or private memory is allocated, it is difficult to expect performance improvement, because it is allocated to global memory. In addition, unnecessary data movement may occur, which can degrade performance.
It is recommended to use
clCreateBuffer(CL_MEM_ALLOC_HOST_PTR) whenever possible.
It is better not to create a buffer using
clCreateBuffer(CL_MEM_USE_HOST_PTR) whenever possible.
Creating a buffer with HOST_PTR creates a buffer accessed by the host program and a buffer accessed by the GPU in global memory. The result is 'unnecessary copying'.
There is a global memory area that the host program can access, not the Mali GPU.