HLSL variable order bug?
category: code [glöplog]
Please someone have an explanation for this.. I'm going to lose my mind soon.
I have this piece of HLSL code, which works and does exactly what I need it to do..
That's great and all but if I change the position of the "d" var, everything goes to hell!
IE, changing the order to this completely breaks it:
To make matters even worse, the result is also different depending on where I put it... But consistent as long as it stays in the same place.
Not redefing it at the end of the loop also breaks it. Even if I declare it as d=3 (which gives a different but working result), I have to keep the "d=3" in "b/=d=3;" in order not to break it.
The preprocessed code is exactly the same other than the variable order, but the compiled code will have a different number of instructions depending on the order.
This is with compiler version 9.12.589.0000, target ps_3_0.
I know there's newer compilers (haven't tested yet though) but this is for an intro (maybe ;)) and I guess this is the best I can use without losing too much compatibility.
Anyone?
I have this piece of HLSL code, which works and does exactly what I need it to do..
Code:
float o(float3 a)
{
float b=.4;
float c=0;
float d=2;
while(b>.01)
{
c=max(c,-m(n(a,d*b),b));
b/=d=3;
}
return c;
}
That's great and all but if I change the position of the "d" var, everything goes to hell!
IE, changing the order to this completely breaks it:
Code:
float d=2;
float b=.4;
float c=0;
To make matters even worse, the result is also different depending on where I put it... But consistent as long as it stays in the same place.
Not redefing it at the end of the loop also breaks it. Even if I declare it as d=3 (which gives a different but working result), I have to keep the "d=3" in "b/=d=3;" in order not to break it.
The preprocessed code is exactly the same other than the variable order, but the compiled code will have a different number of instructions depending on the order.
This is with compiler version 9.12.589.0000, target ps_3_0.
I know there's newer compilers (haven't tested yet though) but this is for an intro (maybe ;)) and I guess this is the best I can use without losing too much compatibility.
Anyone?
Update.. It's got something to do with the loop. If I manually unroll the loop everything works fine no matter the placement. Obviously not great for size though...
How long is the compile and how is the memory usage of the compiler?
It compiles instantly, no endless unrolling or anything going on there. The entire shader uses 122 instructions with after manually unrolling the loop 4 times. 109 without unrolling.
I tried changing the while(b>.01) to for(int i=0; i<4; i++) and that didn't help either.
Anyway I just tested a couple of newer compilers. The one I was using was the one that d3dx9_30.dll uses.
If I use d3dx9_32 or newer, everything works as expected. Compile time is a bit longer but still almost instant.
So next question then as I'm horribly out of date when it comes to this stuff, would it be safe to assume that anyone who's going to watch a 4k intro would have d3dx9_32.dll? ;)
I tried changing the while(b>.01) to for(int i=0; i<4; i++) and that didn't help either.
Anyway I just tested a couple of newer compilers. The one I was using was the one that d3dx9_30.dll uses.
If I use d3dx9_32 or newer, everything works as expected. Compile time is a bit longer but still almost instant.
So next question then as I'm horribly out of date when it comes to this stuff, would it be safe to assume that anyone who's going to watch a 4k intro would have d3dx9_32.dll? ;)
More likely the IHV compiler.. what driver and hardware?
snq: if I were you I'd invest 30 minutes to use the recent D3DCompile (comes with it's own lib in the Windows 8 SDK), at least your compiler will be up to date (less prone to trouble with what psycho said) and likely installed
AMD/ATI 6850 with the latest drivers. Not sure if that's it though, what worries me is that the resulting asm code has a different length depending on where I declare the variable. You could be right of course, I'll test on some other machines.
Plek, I need the shader to compile at runtime so me having the latest compiler won't help much. It's bad enough having to drag in d3dx but at least nowadays I guess everyone has the required dlls.
Plek, I need the shader to compile at runtime so me having the latest compiler won't help much. It's bad enough having to drag in d3dx but at least nowadays I guess everyone has the required dlls.
D3DX is deprecated these days, D3DCompiler is where the runtime functions are at now. See http://msdn.microsoft.com/en-us/library/windows/desktop/dd607340%28v=vs.85%29.aspx
I know it's deprecated. But in 4k or 1k you gotta do what you gotta do ;) With 4k I guess maybe I could live with the longer DLL name..
Besides, all the small stuff is always deprecated yet still works. Like _lopen() was supposed to be for 16 bit compatibility and has been deprecated for about 2 decades but Win8 still has it, even in it's 64 bit version of kernel32.
Besides, all the small stuff is always deprecated yet still works. Like _lopen() was supposed to be for 16 bit compatibility and has been deprecated for about 2 decades but Win8 still has it, even in it's 64 bit version of kernel32.
"d3dx9_32" - oldschool and buggy as hell.
d3dcompiler_47.dll is recent and shipped with windows 8.1 (at least as update foo), sometimes you have to sacrifice compatibility. d3dcompiler_43.dll is also kind of ok.
d3dcompiler_47.dll is recent and shipped with windows 8.1 (at least as update foo), sometimes you have to sacrifice compatibility. d3dcompiler_43.dll is also kind of ok.
These compilers are really starting to piss me off now.. God forbid they make one that doesn't bug. d3dcompiler_47.dll introduces other unexplainable bugs that are possibly even weirder.
Is it supposed to be like this, having to change the order of declarations and add random const values to unbug the compiler, or am I just having the worst luck?
Is it supposed to be like this, having to change the order of declarations and add random const values to unbug the compiler, or am I just having the worst luck?
Something that could have to do with your problem (considering the buggyness of _32), but mustn´t, is your way of declaring floating point values...you need to declare it as floats strictly...
i.e your "float d=2;" should read "float d=2.0;", while you can cut it to "float d=2.;"
...not doing so will let your intro not run on many setups. Forgetting to do so at just only one place in your whole shader-code can result in infunctionality.
Your "float c=0;" should read "float c=0.;" or even "float c=.0;"...both will work.
And be sure everyone of us 4k-coders has to do it this way, so it´s not a waste of bytes or so, crinkler will take good care of these extra-dots anyway!
I use _39 in my intros for years by now. The ShaderCompiler is a bit slower than in _32 but also not that buggy and compile times are still ok i´d say!
i.e your "float d=2;" should read "float d=2.0;", while you can cut it to "float d=2.;"
...not doing so will let your intro not run on many setups. Forgetting to do so at just only one place in your whole shader-code can result in infunctionality.
Your "float c=0;" should read "float c=0.;" or even "float c=.0;"...both will work.
And be sure everyone of us 4k-coders has to do it this way, so it´s not a waste of bytes or so, crinkler will take good care of these extra-dots anyway!
I use _39 in my intros for years by now. The ShaderCompiler is a bit slower than in _32 but also not that buggy and compile times are still ok i´d say!
I'll try that snippet later today.
I'd suggest trying the shader in a different environment (easy: take the SDK spinning triangle w/HLSL example) - might be your D3D code is causing the ruckus (something with constant uploads if there are any, or perhaps something "cleverly" left out which certainly won't default to a desirable state any longer).
I'd suggest trying the shader in a different environment (easy: take the SDK spinning triangle w/HLSL example) - might be your D3D code is causing the ruckus (something with constant uploads if there are any, or perhaps something "cleverly" left out which certainly won't default to a desirable state any longer).
I use the June 2010-DirectX9-SDK btw.
additional crinkler-options:
/REPLACEDLL:d3dx9_43=d3dx9_39
I went thru a lot of shit with these shaderCompilers myself and settled on this setup, it being the fastest in compileTimes and works for years by now.
additional crinkler-options:
/REPLACEDLL:d3dx9_43=d3dx9_39
I went thru a lot of shit with these shaderCompilers myself and settled on this setup, it being the fastest in compileTimes and works for years by now.
Thanks for the tips hardy, I'll look into that :)
Re: periods though, wouldn't it work everywhere as long as it compiles to the same code? Eg I have this somewhere:
t = 3*saturate(t);
The 3 shows up in the asm code as:
no matter how if I declare it as 3 or 3.0.
I would assume that as long as the byte (or dword) code is the same it should work the same on all setups? I think the conversion from int to float will be done by the compiler so the driver will never know the difference.
Re: periods though, wouldn't it work everywhere as long as it compiles to the same code? Eg I have this somewhere:
t = 3*saturate(t);
The 3 shows up in the asm code as:
Code:
def c8, xx, xx, xx, 3
no matter how if I declare it as 3 or 3.0.
I would assume that as long as the byte (or dword) code is the same it should work the same on all setups? I think the conversion from int to float will be done by the compiler so the driver will never know the difference.
Plek: I'm actually testing in 2 different environments already, one is my shader tester tool written in C# and the other one is my intro which of course has somewhat hackier D3D code ;)
The tool updates the C0-11 constants with a timer value in C0 and other values I enter in the UI, the intro code only updates C0 with a timer for now.
The tool updates the C0-11 constants with a timer value in C0 and other values I enter in the UI, the intro code only updates C0 with a timer for now.
where are m & n comming from and are there any constant packing rules involved with sm3.0?
snq: take a look at the actual generated code, instead of only the IL, using the shaderanalyzer
@Psycho: thanks for reminding me of that one :) I've got a few performance bottlenecks to look at.