win32 1k compression for dummies
category: general [glöplog]
here is my question :
actually, whats the best for compressing a win32 1k?
some years ago people use a exe dropper (which create a com or cab file) then a compressor (like upx or apack)
the problem is this trick is not very clean (it write files to hardrive), unsafe, and some parties strickly forbid it (no reg or hdd access).
even worse: sometimes, ordinal import was use, which reduce dramatically intro size but also certainly make sure the intro wont work on an os that is different from the one you compiled the stuff.
then crinkler arrived, were the idea was to offer good compression and compatibility, working directly on obj files and using only safe tricks (ex : import is done by using hash).
the problem is crinkler is not so efficient on 1k file, my minimal directx framework (init directx, use timer, draw a flat square, send empty shader), written is x86 asm, is 855 bytes, which left me "only" 256 bytes compressed for the shader part.
i have seen intros with large shaders (700 bytes and more). One of them run on linux (you massive clod). it use shell commands for compression (which give more tricks for compressing than win32), some others seems using custom/LZ compressor ( like himalaya ).
what i have in mind :
- using a text compressor for shader only (something huffman like, since shader contain only a reduced charset "a-z () +-*/ ; {}" and so...
gain = shader uncompressed - shadercompressed - size compressor
then compress it normally
- writting a very very small win32 exe (like on this page http://www.phreedom.org/solar/code/tinype/) that will uncompressed all the binary data (shader + init dx stuff) and run it (i never done this have no idea how to do it )
currently i'm working on one intro were the shader is 600 bytes, 1100-1200 bytes compressed (crinkler) and i found no way of reducing it (expect unsing unsafe stuff)
any help will be appreciated
actually, whats the best for compressing a win32 1k?
some years ago people use a exe dropper (which create a com or cab file) then a compressor (like upx or apack)
the problem is this trick is not very clean (it write files to hardrive), unsafe, and some parties strickly forbid it (no reg or hdd access).
even worse: sometimes, ordinal import was use, which reduce dramatically intro size but also certainly make sure the intro wont work on an os that is different from the one you compiled the stuff.
then crinkler arrived, were the idea was to offer good compression and compatibility, working directly on obj files and using only safe tricks (ex : import is done by using hash).
the problem is crinkler is not so efficient on 1k file, my minimal directx framework (init directx, use timer, draw a flat square, send empty shader), written is x86 asm, is 855 bytes, which left me "only" 256 bytes compressed for the shader part.
i have seen intros with large shaders (700 bytes and more). One of them run on linux (you massive clod). it use shell commands for compression (which give more tricks for compressing than win32), some others seems using custom/LZ compressor ( like himalaya ).
what i have in mind :
- using a text compressor for shader only (something huffman like, since shader contain only a reduced charset "a-z () +-*/ ; {}" and so...
gain = shader uncompressed - shadercompressed - size compressor
then compress it normally
- writting a very very small win32 exe (like on this page http://www.phreedom.org/solar/code/tinype/) that will uncompressed all the binary data (shader + init dx stuff) and run it (i never done this have no idea how to do it )
currently i'm working on one intro were the shader is 600 bytes, 1100-1200 bytes compressed (crinkler) and i found no way of reducing it (expect unsing unsafe stuff)
any help will be appreciated
how about doing all your rendering it in software then? Mode13 still works..
I think that crinkler still the best option for 1k intro, because it's compatible with windows 2000/XP/Vista. It's compression ratio is really amazing.
You are wasting bytes somewhere, because I've the same framework and the final size is 745 bytes (NOT using ordinal import) and probably it can be a bit smaller just reallocating some data and code. Btw I'm using FASM.
Quote:
the problem is crinkler is not so efficient on 1k file, my minimal directx framework (init directx, use timer, draw a flat square, send empty shader), written is x86 asm, is 855 bytes, which left me "only" 256 bytes compressed for the shader part.
You are wasting bytes somewhere, because I've the same framework and the final size is 745 bytes (NOT using ordinal import) and probably it can be a bit smaller just reallocating some data and code. Btw I'm using FASM.
FASM or MASM since at the end it produce asm opcodes, doest it really change something???
Here is a list of thing that (i think) consume space in my framework :
(maybe there is some glrect() equivalent)
The main rendering loop :
For the timer :
I call QueryPerformanceCounter() substract oldtime (previous performancecounter call result ) then sudivide by QueryPerformanceFrequency() which give me time delta.
the delta is added to a global variable (then passed to the shader)
is this lighter to use timegettime()?
List of imported functions :
_ExitProcess
_QueryPerformanceCounter
_QueryPerformanceFrequency
_CreateWindowExA
_GetAsyncKeyState
_ReleaseDC
_ShowCursor
_D3DXCompileShader
_Direct3DCreate9
The way i compile it :
Here is a list of thing that (i think) consume space in my framework :
Code:
Square real4 -1.0, -1.0, 0.0, -1.0, -1.0
real4 -1.0, 1.0, 0.0, -1.0, 1.0
real4 1.0, 1.0, 0.0, 1.0, 1.0
real4 1.0, -1.0, 0.0, 1.0, -1.0
(maybe there is some glrect() equivalent)
The main rendering loop :
Code:
call [ebx + IDirect3DDevice9.BeginScene]
call [ebx + IDirect3DDevice9.SetPixelShaderConstantF]
call [ebx + IDirect3DDevice9.SetPixelShader]
call [ebx + IDirect3DDevice9.SetFVF]
call [ebx + IDirect3DDevice9.DrawPrimitiveUP]
call [ebx + IDirect3DDevice9.EndScene]
call [ebx + IDirect3DDevice9.Present]
For the timer :
I call QueryPerformanceCounter() substract oldtime (previous performancecounter call result ) then sudivide by QueryPerformanceFrequency() which give me time delta.
the delta is added to a global variable (then passed to the shader)
is this lighter to use timegettime()?
List of imported functions :
_ExitProcess
_QueryPerformanceCounter
_QueryPerformanceFrequency
_CreateWindowExA
_GetAsyncKeyState
_ReleaseDC
_ShowCursor
_D3DXCompileShader
_Direct3DCreate9
The way i compile it :
Code:
C:\masm32\bin\Link.exe /STACK:0x200000,0x200000 /LIBPATH:"C:\masm32\lib" /SUBSYSTEM:WINDOWS framework.obj
crinkler framework.obj kernel32.lib user32.lib d3dx9.lib d3d9.lib /SUBSYSTEM:WINDOWS /ENTRY:EntryPoint /LIBPATH:"C:\masm32\lib" /CRINKLER /UNSAFEIMPORT /COMPMODE:SLOW /HASHSIZE:200 /HASHTRIES:50 /ORDERTRIES:6000
You can use GetTickCount() for time. I think it's the lightest option.
GetTickCount: very low precision
QueryPerformanceCounter: problems with AMD's CoolNQuiet technologiy
have you tried timeGetTime?
QueryPerformanceCounter: problems with AMD's CoolNQuiet technologiy
have you tried timeGetTime?
Do you need a matrix if you use xyzrhw? Also are you drawing a gigantic triangle that covers the whole screen or 2 triangles?
Why to ReleaseDC?
You can draw a triangle instead of a quad (a triangle that covers all the screen of course).
I think as bartman sais that timeGetTime() should be more than enough.
You can draw a triangle instead of a quad (a triangle that covers all the screen of course).
I think as bartman sais that timeGetTime() should be more than enough.
timeGetTime has exactly the same precision as GetTickCount (same timer) unless you use timeBeginPeriod first, in which case both timers are more accurate.
@iq :
i dont know from where come that RealeaseDC
the list i give was from the .obj file, in my source i dont have any call for ReleaseDC
also one question :
SetPixelShaderConstantF set link program var and a shader reg right?
if i do
it set var t correctly
since now i use timeGetTime() i try
but it dont seems to work, any idea?
i dont know from where come that RealeaseDC
the list i give was from the .obj file, in my source i dont have any call for ReleaseDC
also one question :
SetPixelShaderConstantF set link program var and a shader reg right?
if i do
Code:
SetPixelShaderConstantF(1,&myvar,0);
float t
fct_shader(...)
{
...
}
it set var t correctly
since now i use timeGetTime() i try
Code:
call timegetTime()
myvar = eax //or mov myvar ,something
SetPixelShaderConstantI(1,&myvar,0);
int t;
fct_shader(...)
{
...
}
but it dont seems to work, any idea?
welcome to the frustrating world of windows 1k.
1. Dont use import by ordinal - you'll get murdered by people in the scene - its a hanging offence. Better kill small children : its much more acceptable round here. Oh and its not really compatible anyway once you get past the religious fervour.
2. Cab dropping =/= import by ordinal. Lot of people equate them and shout for you to be hanged anyway. Its ugly, its messy - its out of fashion. Its the Mick Jagger of the teeny-pop world. Its the midi school of music. However tempted you are , don't do it.
So crinkler is used. Its clean, compatible and gorgeous. Crinkler compresses staggeringly well but has a large header. Cab dropping compresses a lot less well but has a much smaller header. In my experiments, cab dropping wins... AT 1K..for my code. AT 1.2K its a very different story.
So back to 1k exes. People use crinkler to compress. If you want to find out whats eating space, use the
#pragma data_seg(".name1")
#pragma code_seg(".name2")
so crinkler can give you a report and also so it can compress better.
Oh ...
Remember that TBC 1k exes of recent times ARE NOT CRINKLERED. They use a special tuned version of crinkler, better for 1k compression. This tuned version uses a different approach to compression and a much smaller header. It gives TBC about 200 bytes advantage at 1k. Thats a killer advantage at this size so dont expect to match them unless you write your own compressor. Thats why TBC rule - they write killer compressors.
Glad to see someone at ind trying this : maybe even Pirx will come out of this with more respect for the small windows coding :-).
1. Dont use import by ordinal - you'll get murdered by people in the scene - its a hanging offence. Better kill small children : its much more acceptable round here. Oh and its not really compatible anyway once you get past the religious fervour.
2. Cab dropping =/= import by ordinal. Lot of people equate them and shout for you to be hanged anyway. Its ugly, its messy - its out of fashion. Its the Mick Jagger of the teeny-pop world. Its the midi school of music. However tempted you are , don't do it.
So crinkler is used. Its clean, compatible and gorgeous. Crinkler compresses staggeringly well but has a large header. Cab dropping compresses a lot less well but has a much smaller header. In my experiments, cab dropping wins... AT 1K..for my code. AT 1.2K its a very different story.
So back to 1k exes. People use crinkler to compress. If you want to find out whats eating space, use the
#pragma data_seg(".name1")
#pragma code_seg(".name2")
so crinkler can give you a report and also so it can compress better.
Oh ...
Remember that TBC 1k exes of recent times ARE NOT CRINKLERED. They use a special tuned version of crinkler, better for 1k compression. This tuned version uses a different approach to compression and a much smaller header. It gives TBC about 200 bytes advantage at 1k. Thats a killer advantage at this size so dont expect to match them unless you write your own compressor. Thats why TBC rule - they write killer compressors.
Glad to see someone at ind trying this : maybe even Pirx will come out of this with more respect for the small windows coding :-).
Hi Tigrou,
Don't try to guess what takes up space in your exe and what doesn't. Use the /REPORT option of Crinkler and see for yourself. Verify that all your code goes into the code section and all data into the data section.
You are importing the call-stub versions of the functions rather than the function pointers themselves. This causes a small call-stub to appear for each function, which takes up space.
Instead of
extern _ExitProcess
and
call _ExitProcess
do
extern __imp__ExitProcess@4
call [__imp__ExitProcess@4]
(Nasm syntax - adapt to your assembler.)
You can see the names of the function pointer imports in the Crinkler compression report.
Don't try to guess what takes up space in your exe and what doesn't. Use the /REPORT option of Crinkler and see for yourself. Verify that all your code goes into the code section and all data into the data section.
You are importing the call-stub versions of the functions rather than the function pointers themselves. This causes a small call-stub to appear for each function, which takes up space.
Instead of
extern _ExitProcess
and
call _ExitProcess
do
extern __imp__ExitProcess@4
call [__imp__ExitProcess@4]
(Nasm syntax - adapt to your assembler.)
You can see the names of the function pointer imports in the Crinkler compression report.
Also, use more HASHTRIES. They are cheap in terms of compression time and can only improve the size at no cost in memory overhead. This might give you those last 2 bytes you need. :)
Quote:
It shouldn't make a difference. But: If you don't specify the CPU instruction set in your MASM/TASM program, the assembler will assume that you're using 8086. Since there are some commonly used instructions not supported by 8086, they are converted. So an instruction that occupies only 1-2 bytes might be converted to a several instructions occupying 10 bytes in total. Try .286, .386, .486 etc. and see how it affects the size of your .com file.FASM or MASM since at the end it produce asm opcodes, doest it really change something???
Quote:
It shouldn't make a difference. But: If you don't specify the CPU instruction set in your MASM/TASM program, the assembler will assume that you're using 8086. Since there are some commonly used instructions not supported by 8086, they are converted. So an instruction that occupies only 1-2 bytes might be converted to a several instructions occupying 10 bytes in total. Try .286, .386, .486 etc. and see how it affects the size of your .com file.
I think you missed the point by a few miles.
And yes, there's an alternative to the public version of crinkler for Windows 1k.
Quote:
I think you missed the point by a few miles.
Adok missing the point? I refuse to believe it!
Maybe this could help you, the framework is 668 bytes:
ftp://ftp.untergrund.net/users/hitchhikr/1kpack.zip
ftp://ftp.untergrund.net/users/hitchhikr/1kpack.zip
Quote:
1kPack - An experimental packer for Windows 1k effects
Written by Franck "hitchhikr" Charlet / Neural
-------
How to use:
All you have to worry about is to insert your code in the framework.asm file than assemble it with nasm and pack it with 1kpack.
-------
About:
The basic principle is to use the zlib embedded inside each d3dx9 dlls (via the png library) to depack a compressed stream.
The depacker/importer itself is relatively small as it only takes 220 bytes (including the complete PE structure).
The imported dll (d3dx) is also used to extract the few APIs we need in order to import the necessary functions to open a window and initialize DirectX (the provided framework.asm file takes advantage of that).
The functions in the framework.asm aren't imported by any hash method because the code of such importer (+ the hash data) would take more size than the way it is done now.
The result should work on Windows 2000, XP and Vista and give slightly better results than the public version of crinkler.
Just keep in mind that this wasn't *really tested* & is *experimental* software.
f.
@hitchhikr: The "framework.exe" crashes on Vista 32 bits :/
Quote:
"Not valid Win32 application"
what about that: http://pagesperso-orange.fr/franck.charlet/temp/test.zip does it work ?
Same problem...
Quote:
The basic principle is to use the zlib embedded inside each d3dx9 dlls (via the png library) to depack a compressed stream.
Brilliant!
That's puzzling somebody told me it worked under vista or perhaps i modified it afterwards and it doesn't work anymore under it, too bad.
many, many, thanks to hitchhikr, Blueberry and the others ...
i'll try out all of this as soon as i can
i'll try out all of this as soon as i can
rbz: also are you sure that it isn't some sort of activated protection in your vista due to the unorthodoxy of the file ?
Nope, checked DEP config and it's Ok, unless there's something else..
Those intros packed with Mentor's special packer also doesn't work.
Anyway great work, this technique is really smart, I'm sure the fix could be done.
Those intros packed with Mentor's special packer also doesn't work.
Anyway great work, this technique is really smart, I'm sure the fix could be done.