Very complex shaders and graphic drivers
category: code [glöplog]
I have noticed that on several video cards, when shaders that take too much time to compute (eg : 5 sec) drivers usually cancel the operation and reset the display (and then the intro / demo just crash).
Some other machines just freeze or become totally unresponsive (eg : loading heavy glslheroku scripts on mac os + ATI).
Is there any tricks to avoid that ? This is particulary useful for procedural gfx, where a single frame can take many seconds to compute.
Some other machines just freeze or become totally unresponsive (eg : loading heavy glslheroku scripts on mac os + ATI).
Is there any tricks to avoid that ? This is particulary useful for procedural gfx, where a single frame can take many seconds to compute.
For procedual graphics, you can avoid these by not rendering a single fullscreen quad. Instead, you just render smaller tiles to fill the screen.
Just for testing purposes, these shader timeout values can be tuned by setting a registry value. But you can ouf course not assume those registry keys to be set on the compo machine. :)
Just for testing purposes, these shader timeout values can be tuned by setting a registry value. But you can ouf course not assume those registry keys to be set on the compo machine. :)
Tigrou: the reset is to protect the user - what happens if a shader takes infinite time? You never get back to the desktop :) Actually it used to be a big problem on mac that it *didn't* do that reset, I'd regularly cock up a for() loop and end up with infinite iterations, resulting in a hard reset. Not sure what to do to avoid it though.
The unresponsive thing on OSX: use chrome for stability, safari for speed. I've no idea why, but safari will become unresponsive on some shaders, often crashing some minutes later if you force-close the tab, while chrome will work fine. But safari is ~50% faster at executing the same shader. Strange.
The unresponsive thing on OSX: use chrome for stability, safari for speed. I've no idea why, but safari will become unresponsive on some shaders, often crashing some minutes later if you force-close the tab, while chrome will work fine. But safari is ~50% faster at executing the same shader. Strange.
So maximum allowed time is not for a full frame (between two distinct swapbuffers() calls) but per triangle ? I guess this is a trade off : rendering too much small quads would just be slower than rendering a single one ?
i think when you submit the triangle to the GPU and it starts drawing, it effectively stops responding until it's done. If it doesn't respond for say 5 seconds the OS assumes something went badly wrong and resets the GPU. So it doesn't matter if lots of small triangles takes longer, the GPU is responsive.
Ok. Anyway when a shader is taking very long time, why can't the GPU just yield and do other vital tasks (like drawing desktop and such). For eg on any modern OS if a process is taking 100% of CPU, system is still responsive and process doesn't need to be "reset" (terminated).
It could be certainly nice if GPU cores had pre-emptive logic, that would avoid any issues like this.
windows WDDM drivers do now have pre-emption built in to the drivers, but this is controlled by windows to keep the desktop responsive, but so far as I remember this happens on a per-work packet basis so your fullscreen quad would most likely still end up being a single work packet. I think microsoft are trying to reduce the granularity down as much as possible in this area, but I forget how granular it really is.
in windows you can change the TDR timeout value with a registry key, but that's not really something you should be doing because you are removing a safety net. You can find those at http://msdn.microsoft.com/en-us/library/windows/hardware/ff569918(v=vs.85).aspx
I'm not sure how most GPUs handle the situation internally though - powerVR can self-detect and recover (hence that 'lockup' value in the xcode gpu stats that hopefully always stays at 0), so even without TDR running it should still be okay, but I don't know how the big boys handle things.
My vote would be for sending a large batch of triangles to avoid the situation - the overhead can't be that bad compared to the raw rendering work required...
in windows you can change the TDR timeout value with a registry key, but that's not really something you should be doing because you are removing a safety net. You can find those at http://msdn.microsoft.com/en-us/library/windows/hardware/ff569918(v=vs.85).aspx
I'm not sure how most GPUs handle the situation internally though - powerVR can self-detect and recover (hence that 'lockup' value in the xcode gpu stats that hopefully always stays at 0), so even without TDR running it should still be okay, but I don't know how the big boys handle things.
My vote would be for sending a large batch of triangles to avoid the situation - the overhead can't be that bad compared to the raw rendering work required...
Why can't the shader compiler just use static analysis on the shader to determine how long it will take to run? (</troll>).
Quote:
But you can ouf course not assume those registry keys to be set on the compo machine. :)
There is API for that :D but it will prolly result in this dialog box "strobo requires admin privileges to run"
Quote:
It could be certainly nice if GPU cores had pre-emptive logic, that would avoid any issues like this.
hell no. imagine you are a game developer and shaders were preemptive. no way to predict the performance of a frame? hell no. adding hints like preemtiveness could fix it, but still, default has to be non preemptive.
Personally, i haven't been able to run most recent procedural gfx because windows reset card driver after a few seconds (on a GTX560) (waiting a little bit more to get a picture would not have been a problem)
Is the timeout value applied per-pixel, per-triangle or per-drawcall?
Quote:
So maximum allowed time is not for a full frame (between two distinct swapbuffers() calls) but per triangle?
It actually counts for each draw call. So if you render a vertex array full of small tiles, you won't actually win anything. For immediate mode, it works though.
@urs : rendering small tiles in immediate mode doesn't help (drivers "resets"). only thing i which works is rendering very small tiles + swapbuffers() between each call (without clearing buffer off course).but then user can see actual progression which is not good. other suggestions ?
render it to a rendertarget ?
@pantaloon : Neat idea. Is there any code to start with ? i have found iq framework to be very useful. I had a look at pouet procedural graphics prods but its hard to tell which ones have source.
tigrou: older nvidia drivers had that problem (that it wasn't enough to split drawcalls, but you had to sync (don't remember if glflush was enough)), but I think that has been fixed for more than a year now.
It obviously SHOULD be enough to ensure that your draw calls doesn't take too long.
It obviously SHOULD be enough to ensure that your draw calls doesn't take too long.
Quote:
hell no. imagine you are a game developer and shaders were preemptive. no way to predict the performance of a frame? hell no. adding hints like preemtiveness could fix it, but still, default has to be non preemptive.
As far as I can tell, they are adding mid-shader preemption in Windows 8, of couse if HW supports it. But with more and more GPU compute used in apps, this was inevitable.
And yes - there always was mid-frame context switching between draw calls (sometimes requiring massive pipeline and cache flushes), so this doesn't change overall picture of performance prediction.