UPDATE: Issues related to texture arrays appears to be an application error. Michael Marks provides a fork that corrects some issues.
UPDATE 2: I reran some of the Linux benchmarks, earlier Linux results appear to have a bug. Performance on Linux and Windows now similar.
The strengths and weaknesses of OpenGL compared to other APIs (such as D3D11, D3D12 and Mantle) and the recent talk Approaching Zero Driver Overhead (AZDO) have become topics of hot discussion. AZDO talk included a nice tool called “apitest” that allows us to compares a number of solutions in OpenGL and D3D. Hard data is always better than hand-wavy arguments. In the AZDO talk, data from “apitest” was shown for Nvidia hardware but no numbers were given for either Intel or AMD hardware. Michael Marks ran the tool on Linux and had some interesting results to report that imply that AMD’s driver have higher overhead than Nvidia’s driver.
However, I wanted to answer slightly different questions. For example, if we just restrict to AMD hardware, how does the performance compare to D3D? What is the performance and compatibility difference between Windows and Linux? And what is the performance of various approaches across hardware generations? With these questions in mind, I built and ran the apitool on some AMD hardware on both Linux and Windows.
Hardware: AMD A10-5750M APU with 8650G graphics (VLIW4) + 8750M (GCN) switchable graphics. Catalyst 14.4 installed on both Linux and Windows. Catalyst allows explicit selection of graphics processor. Laptop has a 1366×768 screen.
Build: On Windows, built for Win32 (i.e. 32-bit) using VS 2012 Express and DX SDK (June 2010). Release setting was used. On Linux, built for 64-bit using G++ 4.8 on OpenSUSE 13.1. Required one patch in SDL cmake file.
Run: Tool was run using “apitest.exe -a oglcore -b -t 15” which is the same setting as Michael Marks. On Linux, it was run under KDE and desktop effects were kept disabled in case that makes a difference.
Issues encountered:
I encountered some issues. I am not sure if the error is in the application, the user (i.e me) or the driver.
- Solutions using shader draw parameters (often abbreviated as SDP in the talk) appear to lead to driver hangs on GCN and are unsupported on VLIW4. Therefore I have not reported any SDP results here. Michael Marks also saw the same driver hangs on GCN on Linux, did some investigation and has posted some discussion here.
Solutions involving ARB_shader_image_load_store (which is core in OpenGL 4.2 and not some arcane extension) appear to be broken on Windows but are working on Linux despite installing the same Catalyst version. On Windows, the driver appears to be reporting some compilation error for some shaders saying that “readonly” is not supported unless you enable the extension..UPDATE: Was application bug.- GCN based 8750M should support bindless textures. However, some of the bindless based solutions failed to work. For example GLBindlessMultiDraw failed. Sparse bindless also failed to work.
Data:
I did not test 8750M on Linux, partially because I am lazy and partially because I did not want to disturb my Linux setup which I use for my university work. Anyway, here is the data for 3 problems covered by apitest.
Dynamic streaming
Solution | 8650G Windows (FPS) | 8650G Linux (FPS) | 8750M Windows (FPS) |
D3D11MapNoOverwrite | 14.629 | 0 | 19.6 |
D3D11UpdateSubresource | 0.978 | 0 | 1.198 |
GLMapPersistent | 19.987 | 19.471 | 20.885 |
GLBufferSubData | 0.89 | 1.015 | 0.843 |
GLMapUnsynchronized | 0.397 | 0.409 | 0.362 |
Textured quads
GLBindlessUnsupportedUnsupported112.7
Solution | 8650G Windows (FPS) | 8650G Linux (FPS) | 8750M Windows (FPS) |
D3D11Naive | 65.11 | 0 | 42.463 |
GLTextureArrayMultiDraw-NoSDP | 346 | 400.25 | 492 |
GLTextureArray | 235 | 276.67 | 350 |
GLNoTex | 215.94 | 275.57 | 347.608 |
GLTextureArrayMultiDrawBuffer-NoSDP | 212 | 239 | 472 |
GLNoTexUniform | 81.429 | 93.38 | 133.115 |
GLTextureArrayUniform | 80.75 | 92.72 | 109.5 |
GLNaiveUniform | 32.717 | 31.64 | 32.059 |
GLNaive | 27.3 | 15.02 | 27.21 |
Untextured quads:
Solution | 8650G Windows (FPS) | 8650G Linux (FPS) | 8750M Windows (FPS) |
D3D11Naive | 4.078 | 0 | 2.16 |
GLMultiDraw-NoSDP | 17.221 | 17.661 | 19.93 |
GLMapPersistent | 10.844 | 11.089 | 13.687 |
GLDrawLoop | 10.615 | 10.45 | 13.59 |
GLBufferStorage-NoSDP | 9.862 | 10.041 | 5.096 |
GLMultiDrawBuffer-NoSDP | 9.069 | 9.404 | 7.703 |
GLMapUnsynchronized | 5.908 | 5.726 | 7.281 |
GLTexCoord | 5.702 | 5.382 | 4.963 |
GLUniform | 3.282 | 3.53 | 4.399 |
GLBufferRange | 3.031 | 3.509 | 3.119 |
GLDynamicBuffer | 0.361 | 0.515 | 0.37 |
Conclusion:
- The theoretical principles discussed in the AZDO talk appear to be sound. The “modern GL” techniques discussed do appear to substantially reduce driver overhead compared to older GL techniques. The reduction was seen on AMD hardware on both Windows and Linux and worked on two different architectures (VLIW4 based APU, GCN based discrete). In particular, persistent buffer mapping (sometimes called PBM) and multi-draw-indirect (MDI) based techniques seem useful.
- On Windows, the best OpenGL solutions do appear to significantly outperform D3D. I am not an expert on D3D so I am not sure if better D3D11 solutions exist.
- If a test ran successfully on both Windows and Linux, then the performance was qualitatively similar in most cases.
- However, while theoretically things look good, in practice some issues were encountered. Some of the solutions failed to execute despite theoretically being supported by the hardware. In particular, shader draw parameters as well as some variations of bindless textures appear to be problematic. I am not sure if it was the fault of the application, the user (me) or the driver.