Amiga 500: New blitter c2p routine implemented. Need help to speedtest
category: general [glöplog]
Doom:
touché I'd say. :D
Quote:
:DThe decruncher should run fast on a mc68000
touché I'd say. :D
Pah. Real coders optimize with regards to UAE's JIT translation.
Anyway even on the 68000 those extra 50k or so clock cycles at load time aren't noticable. It's for 4ks, right?
Anyway I think overall my optimizations helped on the 68000 too. So shut up.
547311173 WD08 is coming! And then you will cry like a little gir! Hah!
Anyway even on the 68000 those extra 50k or so clock cycles at load time aren't noticable. It's for 4ks, right?
Anyway I think overall my optimizations helped on the 68000 too. So shut up.
547311173 WD08 is coming! And then you will cry like a little gir! Hah!
Doom: iirc LZW is patented. Do you want us to break the law? ;)
No LZW is not patented, the patent expired 2003.
lzw is not that good.. lz77+huff (deflate, like zip or gzip) are generally better. arithmetic coding was tried on atari st by MrPink/RG but he said it was slow as snails (i can imagine). arithmetic coding is protected by patent (last thing i heard) but patentless replacement called range coding (much better name) is available too. anyway, if it's really for 4K, range coding might be an option..
about lz78: i tested depackspeed on plain falcon (030@16) and it was ~200 KByte/sec output speed. as i'm now looking for streaming depack this is important to me.
about lz78: i tested depackspeed on plain falcon (030@16) and it was ~200 KByte/sec output speed. as i'm now looking for streaming depack this is important to me.
IIRC Jam packer 4 on ST offered LZH and LZW compression. Result pack ratios were close between the two. I can't really remember right now but I think that lzh was a teeny bit better.
I had an idea for a packer for 4k intros (or small programs like that) which would separate the opcodes from their data (I mean seperate the first word from the rest -if any- that follows), store these at the end and then compress that using something like a fixed table (which is pre-weighted) and lz + huffman. But I fear that it would not produce that good a pack ratio and that the depacker might be bigish. Any thoughts?
btw, Shrimp: are you Shrimp from New Core?
I had an idea for a packer for 4k intros (or small programs like that) which would separate the opcodes from their data (I mean seperate the first word from the rest -if any- that follows), store these at the end and then compress that using something like a fixed table (which is pre-weighted) and lz + huffman. But I fear that it would not produce that good a pack ratio and that the depacker might be bigish. Any thoughts?
btw, Shrimp: are you Shrimp from New Core?
Concerning arithmetical coding, it is slower to depack than huffman encoding, but not that much slower. About two or three times slower, if I remember correctly. But if you wan't higher speed and better compression performance at once, you can build new symbols by combining two or more symbols and then use huffman compression on the new larger alphabet. The more you use, the closer to the limits of arithmetical encoding you get (or even better than the limit if you take advantage of the symbols not beeing independent of each others) and the speed should also increase slightly (but the initial time taken to construct a larger huffman tree is longer of course).
And yes ggn, there is only one Shrimp. =)
And yes ggn, there is only one Shrimp. =)
In that case, damn you! I yet have to figure out how the intro screen in coreflakes works ;)
Oh, I can explain, the guy on the telly drinks some pepsi coke and then his head explodes... =)
D'oh! I meant the zoomer thingy (memory's not what it used to be etc etc :)
Quote:
Real coders optimize with regards to UAE's JIT translation.
Real coders optimize UAE's JIT translation. ;)
Hahaha :P That made my day. Thanks Blueberry. :-)
ggn: Aha, the zoomy thingy. =)
(I sort of guessed you meant that, but I like to play stupid. It's a hobby of mine ;).
Well, mostly it's a result of selfgenerating code that streches the two bitmaps along the x-axis.
The y-axis of the foreground bitmap is stretched by repeating scanlines, and the background zoomed bitmap is copied with the cpu to the current scanline in a race with the electron beam.
But I'm not that proud of the screen really, sure the code is optimized in absurdum as usual (I like optimizing stuff). But I was a bit lazy and all cpu-time of the scanlines in overscan is just wasted (except the few scanlines displaying both zoomers), and the top half of the screen is always in overscan even when not needed. So between 45% and 75% of the cpu-time of each frame is just wasted.
(I sort of guessed you meant that, but I like to play stupid. It's a hobby of mine ;).
Well, mostly it's a result of selfgenerating code that streches the two bitmaps along the x-axis.
The y-axis of the foreground bitmap is stretched by repeating scanlines, and the background zoomed bitmap is copied with the cpu to the current scanline in a race with the electron beam.
But I'm not that proud of the screen really, sure the code is optimized in absurdum as usual (I like optimizing stuff). But I was a bit lazy and all cpu-time of the scanlines in overscan is just wasted (except the few scanlines displaying both zoomers), and the top half of the screen is always in overscan even when not needed. So between 45% and 75% of the cpu-time of each frame is just wasted.
@ggn, @Shrimp: you talking about what prod (demo :) )?
They are talking about this demo. Which is pretty cool. :) Nice work, Shrimp. =)
shrimp: Yeah ok, I just had in mind a lot of stuff that need to be done in synchro (mod player, fullscreen, zoom), so I started coming up with weird ideas, like not using fullscreen at all and just doing palette changes when the graphic goes beyond the border, but that would mean other problems etc etc. :)
calimero: You actually mean that you didn't know coreflakes? Shame :)
calimero: You actually mean that you didn't know coreflakes? Shame :)
Thanks for the kind words guys =)
ggn: Another part of the answer is that the modreplayer just took about 10% cpu, and in fact the x-scaling of the zoom isn't synchronized, it's working to a buffer a few frames in advance so that the peaks in processor time are averaged out.
ggn: Another part of the answer is that the modreplayer just took about 10% cpu, and in fact the x-scaling of the zoom isn't synchronized, it's working to a buffer a few frames in advance so that the peaks in processor time are averaged out.
Those are the threads I really love at Pouet. More of these please!
Pitty I can't participate or manage to read yet because I am not into 68000 yet..
Pitty I can't participate or manage to read yet because I am not into 68000 yet..
Hmm, it seems like Optimus killed the thread. :)
Yes. He ruined it with optimism. Go figure.
Well, it was getting WAY OT anyway :)
i used Ray's lz77 packer in my 4k http://www.pouet.net/prod.php?which=24961, combined with alot of pure code generation. it should have been possible to optimise it more in terms of size, and put in one or two more effects - but you know how it is with time and demo party pressures.
my favourite 4k http://www.pouet.net/prod.php?which=10872 uses a cut down version of the ice pack algorithm, which extracts directly into low memory, and also means a relocation table is not needed - very cool :D
defjam's way definately saves a 100 bytes, but on the other hand, it doesn't return to the desktop - just what you like I suppose. 4getful is written entirely with pc relative code, and also needs no relocation table, or relocator.
in my opinion, i think the key to a good 68k 4k demo is good demo system (like the one DHS uses for example), and to generate as much stuff as possible. i was experimenting with a lot of routines, and there was not a single case where a table was smaller than the code needed to generate it, or something like it.
the most fun part though is surely writing small audio routines :)
my favourite 4k http://www.pouet.net/prod.php?which=10872 uses a cut down version of the ice pack algorithm, which extracts directly into low memory, and also means a relocation table is not needed - very cool :D
defjam's way definately saves a 100 bytes, but on the other hand, it doesn't return to the desktop - just what you like I suppose. 4getful is written entirely with pc relative code, and also needs no relocation table, or relocator.
in my opinion, i think the key to a good 68k 4k demo is good demo system (like the one DHS uses for example), and to generate as much stuff as possible. i was experimenting with a lot of routines, and there was not a single case where a table was smaller than the code needed to generate it, or something like it.
the most fun part though is surely writing small audio routines :)
Quote:
Provided you're musically impaired ... as I am u_uthe most fun part though is surely writing small audio routines :)
+not
although my audio routines are loads of fun.
although my audio routines are loads of fun.
Quote:
i was experimenting with a lot of routines, and there was not a single case where a table was smaller than the code needed to generate it, or something like it.
I've found that generating audio by interpolating between waveforms is pretty neat and also space-efficient. But it goes without saying that data is what you generally don't want in a 4k.
I agree about the demosystem. Some simple scripting is only a few bytes:
.mainloop move.l current_frame(pc),d6
lea script(pc),a0
move.l a0,a1
.next movem.w (a0)+,d0-d2
tst.w d0
bmi.b .end
cmp.w d1,d6
blo.b .skip
cmp.w d2,d6
bhs.b .skip
movem.l a0-a1/d6,-(a7) ; could be left out
sub.w d1,d6 ; could be left out
jsr (a1,d0.w)
movem.l (a7)+,a0-a1/d6 ; could be left out
.skip bra.b .next
.end bsr.b buffer
bra.b .mainloop
script
dc.w clearscreen-timing, 0, 1000
dc.w effect1-timing, 0, 500
dc.w effect2-timing, 500, 1000
dc.w -1