Renderscript from the perspective of an OpenCL/CUDA/C++ AMP programmer
Now that Renderscript Compute supposedly works on GPUs, here are some points to ponder about this strange “compute” API
1. In OpenCL or CUDA, you specify a thread grid to launch a kernel. In Renderscript, there is no concept of a thread grid. Instead, you specify input and/or output arrays and each “thread” processes one output item. It reminds me of the limitations of the very old GPGPU technologies like the original Brook language, and is essentially similar to a pixel shader model (each shader thread writes one item). You can’t even query the thread ID (similar to say get_global_id() in OpenCL) in Renderscript.
Even scatter is really complicated and inefficient. You cannot really scatter writes to the output array. However, you can do scatter to separately bound arrays and so you have to adopt the following hack:
a) Do not pass in the actual input and output array directly. Bind the input and output array as dynamic pointers separately
b) Pass an array containing the output indices as input.
c) For each index in the passed array, do the computation and write to the index.
This is just INEFFICIENT. There is no need for such inefficiency on modern hardware. (See also this stackoverflow thread: http://stackoverflow.com/questions/10576583/passing-array-to-rsforeach-in-renderscript-compute )
2. In Renderscript, the API chooses which device to run your code on. That’s right, you have no idea if your code is running on the CPU or GPU or DSP etc. The work is supposedly automatically distributed between processors by the Renderscript runtime according the driver implemented by the SoC, and currently no guidelines are given about how to ensure code runs on GPU beyond “simple code should run on GPU”.
3. Renderscript’s philosophy is to not expose the actual hardware information and properties to the programmer. OpenCL lets you query a lot of information about the hardware properties, like the amount of local memory available. I guess given that the programmer can’t even decide where to run the code, this is not surprising.
4. CUDA introduced on-chip shared memory, and that concept has been adopted by almost every GPGPU API today including OpenCL, C++ AMP etc. However, Renderscript does not have any concept of on-chip shared memory. Thus, performance will be lower to well-optimized OpenCL kernels on many families of GPUs.
5. Renderscript is not available directly from the Android NDK. This is a significant limitation because high-performance applications (such as performance sensitive games) will often be written using the NDK.
Overall I do not think that the current iteration of Renderscript is meant for writing high performance code. Well optimized OpenCL/CUDA/C++ AMP kernels will always significantly outperform Renderscript code simply because Renderscript tries to present a simple abstraction and gives no control over performance. Performance will be entirely dependent upon the Renderscript compiler and driver, and will only come close to an API like OpenCL, CUDA or C++ AMP in very simple cases where the compiler may have the right heuristics built in.
At the same time, Renderscript has very weird programming model limitations, such as the scatter limitation outlined above. I think Renderscript was designed with only one application in mind: Simple image processing filters. And as @jimrayvaughn pointed out on twitter, many of those can be done efficiently using GLSL using well-understood techniques.
I hope that the SoC vendors and mobile handset vendors are reading this blog, and I hope that GPGPU on Android does not remain limited to Renderscript. Mobile vendors are wasting the power and potential of modern GPUs by not exposing the full power of the hardware to the developers. If you want to unlock the performance of your GPU, Renderscript is not the solution you are looking for.
Disclaimer: I am not a Renderscript expert. Finding documentation on Renderscript has been very tough, and my comments here are based upon what I could glean from the docs. If you find errors in this article, please point them out and I will update the article.
edited: Added NDK issue.
edited: Originally stated gather requires similar hack. However, gather works just fine. Only scatter is problematic.
Posted on February 1, 2013, in Uncategorized and tagged gpgpu, opencl, renderscript. Bookmark the permalink. 12 Comments.
Thank you for your thoughts on Renderscript. I too am finding it daunting to get real information outside of writing my own benchmarking/discovery suite and that the API is quite limiting (even with my very little knowledge of het-compute). On the other hand, It feels as if Google is attempting to make Renderscript platform agnostic like the Dalvik VM, which probably explains the inability to choose where kernels execute, or to query underlying hardware. As for the design, I suspect these are still early days, and the language will increase in functionality as time progresses. I think things will get better over time.
But I too would really like an OpenCL implementation, and feel it a mistake that this standard is not supported. Thankfully, a select few vendors have included support for OpenCL in Android which is encouraging — hopefully this is a trend that will continue. I would imagine that NVidia will eventually support cuda on its Tegra platform. If renderscript is lacking, who knows? Others may try to compete with the mature OpenCL.
I hope the next iteration of Android brings with it a far improved compute API.
Sean Lumly says: “Thankfully, a select few vendors have included support for OpenCL in Android which is encouraging”
Please list these vendors who have included OpenCL drivers on Android, I have looked heels over mountains for an ARM device on the market with OpenCL drivers installed.
Sorry for the late reply!
I know that Zii labs ZMS line supports CL, and certainly many chipsets have or are getting linux drivers (and thus Android drivers). Unfortunately it is not officially supported by the Android project. Something tells me that many vendors will build it in and developers will unofficially target it. On the up-side, apparently the Nexus 10 has a fully functional undocumented CL driver! Kishoti’s CLBenchmark has been successfully run on stock hardware.
http://clbenchmark.com/device-info.jsp?config=14669863&test=CLB10101
They are not the highest scores, but it appears to be executing on the GPU, which is something.
Thank you Sean for the Nexus 10 heads-up, It looks like we can add Nexus 8 to the OpenCL enabled list as well:
http://www.anandtech.com/show/6804/opencl-drivers-discovered-on-nexus-4-and-nexus-10-devices
“On Nexus 4, drivers appear to be present for both the quad-core Krait CPU and Adreno 320 GPU”
Same comment here. And the fact that Renderscript does not even support 3D texture for example. Ok on a mobile that might not be used very often but some nice effects can be done with it. And on a tablet it will make sense for some applications.
About the documentation, yes it is poor and why do we need to wait 2 or 3 years until it matures when OpenCL could be used right now. If Google was allowing the OpenCL drivers on their Android, the problem would go away straight away… This is really a political decision and the current situation brings no benefit at all. I think Google should get their act together on that one.
I suggested OpenCL for Android:
http://code.google.com/p/android/issues/detail?id=36361
Status = Declined
That is very unfortunate. There is no technical reason why a driver cannot support both APIs. I am hoping mobile vendors will get around to shipping APIs like OpenCL or CUDA or C++ AMP soon, except perhaps on Nexus devices where Google dictates the terms.
Well apparently Amazon was experimenting with it. But again Amazon maintains its own branch of Android, I would not expect mobile vendors using the default Google code to get the authorization to add OpenCL support. I can’t remember where I read that Google would not support any images providing OpenCL support. That is really bad new for OpenCL. I can imagine the issue if Microsoft had forbidden CUDA or OpenCL on its software (ok they did they best to remove OpenGL but it is still here)… But hey Google can still get away with stuff like that.
It may be of interest that the Nexus 10 indeed has an undocumented OpenCL driver, and in-fact Kishonti’s OpenCL benchmark is running on the unit: http://www.youtube.com/watch?v=GrqKJehawr8
This insight was gleaned from a question answered in the video comments.
Renderscript SUX. We need OpenCL on each mobile, desktop and server device. Period.
100% agreed. I’ve also looked at renderscript a bit and it seemed like a dead end. I just don’t see the point. Embedded has the late-mover advantage here, so there is no need to revisit the childhood years of GPGPU. And both GPU vendors and developers audiences will be happier with OpenCL as they already have experience with that from desktop products (and sure, NVidia may have had success with pushing proprietary GPU languages such as Cg and CUDA, but that was mostly before there was a portable alternative..).
Pingback: zafena development » Valentine news update, v3 is the new