More thoughts on Renderscript
(This was originally posted on a forum in reply to Tim Murray of Google. Posting it here for readers who come to my blog for Renderscript information).
(TLDR: Renderscript is a fine idea, and I think has good intentions, but let people experiment with alternate solutions too.)
Tim, I see and partially agree with your vision. I understand that mobile architectures are complicated. There are considerations such as dynamic power distribution, shared memory bandwidth etc that one has not seen (before say Ivy Bridge) on desktop. I agree that trying to write 6 different codepaths for 6 different architectures is hard. But I do not agree that Renderscript solves the issues either. At present, all it does (compard to say CUDA) is prevent the programmer from specifying certain parameters.
Think about this way. You can divide the programming tools into two categories:
a) Close-to-metal. OpenCL is fairly close to metal exposing individual devcies, exposing memory hierarchy and thread dispatch mechanics.
Even OpenCL however offers some amount of possibility of optimization by the driver. For example, you can leave out thread group size and let the driver choose a suitable one. Vectorization may also be performed by the driver (for example, Intel’s driver does this on Ivy Bridge). It is also possible to write OpenCL drivers that automatically use local memory (on-chip shared memory in CUDA parlance) if the programmer does not. But let us ignore even OpenCL’s compiler optimization possibilities and let us think of it is as close to metal.
b) Middleware solutions that offer higher level programming languages. This typically includes a supposedly smart compiler and some kind of a scheduler+runtime. Renderscript falls in this category. My current research area happens to be this exact field, and I am a big proponent of the need of more productive languages in parallel computing so I sympathize with your goals.
But I think Renderscript is essentially just one particular middleware solution. Renderscript compiler and drivers will have one particular set of heuristics. You said, how can the developer code for an architecture he has not seen before? I think as a compiler writer, I often face the opposite issue: How can you ensure that your compiler+driver has the right heuristics for algorithms and use-cases you have never seen?
You said yourself that mobile architectures are more complicated than typical workstations, and people still haven’t solved building good middlewares for workstations. A compiler that attempts to automatically schedule computations on the right hardware needs to have at least some performance model for the application and an idea of how that will map to the archiecture. This is very hard (and completely unsolved) on “simple” desktop architectures. How do you think this will be done automatically by a driver on mobile, where things are more complicated by your own admission? We have been trying to solve some of the same problems as you in our lab and I think years of work needs to be done.
There is also the issue of over-optimizing for a particular architecture that you said can happen with OpenCL. This can certainly happen, but I think this can very well happen with Renderscript too. Just to give an example: Lets say the Renderscript driver on my machine always happens to choose CPU. It is very well possible to write an algorithm that performs wonderfully on my particular device’s CPU and ship that, while the algorithm itself may peform very badly on other devices where driver happens to choose the GPU. CPU algorithms are not always suitable for GPUs for example, so having the same source code for both is more disastrous than trying to run code optimized for one CPU on another. No matter how smart the compiler is, it cannot replace the algorithm with another one.
I think a better approach is to let people build different middlewares for different types of applications. Let a thousand middlewares bloom, and more particularly, let a thousand domain-specific tools bloom. This cannot be done on top of Renderscript. Middlewares building on top of another level of middleware with undocumented (and potentially ever-changing) set of optimizations is a bad bad idea but can potentially be done on top of lower level interfaces (like OpenCL). For example, let game engine programmers decide where/how they want to run the physics code. Let people build domain specific tools like Halide and let them choose how/when to optimize for which architecture. Let people build their own dynamic schedulers (like StarPU) and experiment with what scheduling algorithm suits them best. I understand Googlers are smart, but so are people at Unity or MIT or many other places.
I am not saying Renderscript is bad. I think Renderscript is a good idea that tries to tackle a real issue, and I think you should continue developing it further. But it is not, and can never be, a good solution for everyone and by forcing people to only use this tool will limit the exploration of alternate technical solutions. This is why choice is important. Limiting the choice to only middleware solution, which happes to choose one particular set of parameters in this vast unexplored design space of middleware solutions, will be bad for everyone in the long run.