nulstein by Nulstein

screenshot added by nulstein on 2009-10-16 11:08:05

platform :	Windows Windows
type :	demotool demotool
release date :	october 2009
release party :	Evoke 2009
compo :	none
ranked :	n/a

popularity : 62%

62%

0.67

alltime top: #7834

[download]
[mirrors...]

added on the 2009-10-16 11:08:05 by nulstein

popularity helper

increase the popularity of this prod by spreading this URL:

or via: facebook twitter pinterest tumblr bluesky threads

comments

Sorry for the delay in releasing this code, the process took much longer than I initially expected...

There is an article explaining how this all works that will be soon published soon, but I really wanted this to be posted to the scene first as, really, it was meant to be released *at* the party.

So, here you go !

added on the 2009-10-16 11:16:52 by nulstein

Very cool, thanks. :)

rulez added on the 2009-10-16 12:04:22 by rc55

Yay, thank you !

rulez added on the 2009-10-16 12:49:08 by MsK`

Hmm, it seems hard to work at intel : nulstein.exe, kkcompress.bat, compileVSH.bat and compilePSH.bat have been blocked and are missing from the zip. (beautiful warnings :)

added on the 2009-10-16 12:53:57 by MsK`

sounds cool, judging from the infofile. pardon me asking, but where was that seminar, and is there a video?

added on the 2009-10-16 12:59:13 by skrebbel

rulez added on the 2009-10-16 13:00:40 by panic

argh... Virus checkers everywhere...

nulstein.exe, is the result of the build, so we can make do...
kkcompress.bat is the batch to invoke kkrunchy so that's not too important either... You know how to use that, right ?

On the other hand compileVSH.bat and compilePSH.bat are the scripts that take care of shader compilation, and without them, you can't build the thing. Damn Damn Damn.

Let's see what I can do.

added on the 2009-10-16 13:32:58 by nulstein

rename them to .txt :P

added on the 2009-10-16 16:18:34 by Gargaj

I've uploaded the correct file to this other URL until we fix the file on the intel servers... Sorry about that.
www.gpuviewer.com/download/nulstein.zip

skrebbel: I presented this seminar at Evoke 2009 in Cologne. There was no video, it happened on the sunday morning and cameraman was nowhere to be found (probably crashed somewhere, fast asleep ;) )

added on the 2009-10-16 16:30:41 by nulstein

haven't looked at the code yet, but will the meat in this port to other platforms? i.e. mac / bsd / linux...?

added on the 2009-10-16 16:38:53 by jaw

jaw: definitely, this should port to other platforms without much sweat.

The only caveat is that the whole approach revolves around the assumption of "shared memory", and this makes it not very suitable for PS3. Otherwise, I can imagine this being ported to the systems you mention and others like XBox 360.

I'd love any of these ports to happen, drop me a line if you head down that route, I'll help where I can.

added on the 2009-10-16 16:47:45 by nulstein

Cool. I was looking for something like this.

rulez added on the 2009-10-16 23:49:57 by xernobyl

thumbs up...
In case this seminar was right after the 4klang seminar - I guess the camera man escaped (to get a cold beer!) from that - by far too warm room.

rulez added on the 2009-10-17 10:47:52 by las

so? intel tbb is the fatty while nulstein or jobswarm are lightweight alternatives?

added on the 2009-10-17 11:57:29 by guardian ٩๏̯͡๏۶

What rc55 said

rulez added on the 2009-10-17 21:00:45 by Defiance

TBB is only fat if you look at it from a 64K perspective... nulstein is fat if you look at it from a 4K perspective, too :)

There are two goals to this:
- make a "working scale model" of TBB that makes it easier to understand the basic concepts of task scheduling
- explore simple ways to make a game engine scale over more than a few cores

note to self: look jobswarm up

added on the 2009-10-18 10:07:33 by nulstein

Thank you for sharing!

rulez added on the 2009-10-18 22:07:23 by vestige

Quote:

note to self: look jobswarm up

please do, i'm curious about the comparison from someone familiar with this type of code

added on the 2009-10-19 17:56:50 by guardian ٩๏̯͡๏۶

My article explaining how this works is now online on Intel Software Network:
http://software.intel.com/en-us/articles/do-it-yourself-game-task-scheduling/

added on the 2009-11-06 15:52:10 by nulstein

useful

rulez added on the 2009-11-23 01:30:38 by T$

the article on how this works has been published on Gamasutra too, now.
http://www.gamasutra.com/view/feature/4287/sponsored_feature_doityourself_.php

Thsi has finally prompted me to comment on the difference with Jobswarm:
"JobSwarm (http://code.google.com/p/jobswarm/) is the simplest approach possible: there is one circular buffer that serves as a job queue, and worker threads pump from it as they need. What happens with this sort of configuration is that the queue soon becomes the main contention point. As the size of the jobs gets smaller, the overhead of accessing it increases. The impact is minimal in JobSwarm because of another aspect of it: only the main thread can submit work. If this is enough for your application, then this is pretty much as simple as it can get.

In TBB (and nulstein), a task can be further subdivided (i.e. it can spawn more tasks). This makes it almost trivial to cut&dice your workload: tasks spawn more tasks and split further until we have workloads that don't benefit from being split further. There are two big consequences to this feature: you can't have one big centralized queue as it would cause too much contention, you need a queue per worker thread. This leads to the second issue, imbalance: some tasks take more time than others and this implies some queues empty faster than others. The solution is "work stealing" which, in effect, ends up load-balancing the system. "

Thought I might as well copy the answer here as question was asked here first.

added on the 2010-03-05 13:42:44 by nulstein

Quote:

he previously was Technical Director at Bits Studios

Ah-HAH! ;)

added on the 2010-03-05 13:51:42 by gloom

the talk was fantastic, i learned a lot. i have implemented my own work stealing system, and 500 lines of code seems to be the breaking point between something too simple and unnecessary complex.

rulez added on the 2010-03-05 15:56:07 by chaos

chaos > did you diverge much from what's done in nulstein?

added on the 2010-03-05 16:13:53 by guardian ٩๏̯͡๏۶

rulez added on the 2010-03-05 16:24:16 by iks

:§: it is pretty much the same, but i have to cooler examples:

BB Image

each job calculates one of the 1024 lines of this mandelbrot. the color code at the left identifies the hardware thread, this is a core i7 with 8 threads.

you can see how each thread works in it's initially assigned segment from top to bottom. the segments with the dark spots take longer, and the other threads come for help. this is really impressive in animation, when you see how the load gets balanced.

stealing happens 19 times, and none of the locks stalls. In my implementation, a "steal" may fail when two threads try to steal the same thing at the same time, and that almost never happens.

note that the mandelbrot in this example is not optimized, the whole point of choosing fractals is to find something that is slow enough to be worth the effort.

added on the 2010-03-05 21:43:47 by chaos

missed that seminar :/
thumb for releasing it, alltho late !
@chaos: this really remembers me a lot on good old amiga ! ( move.l #$f00,$dff180; // after everythings called in the main-loop, to determine how much cpu-cycles are left for the frame ! )

rulez added on the 2010-03-05 22:15:39 by ɧ4ɾɗվ.

chaos > really slick, would you share it by chance?

added on the 2010-03-06 18:36:39 by guardian ٩๏̯͡๏۶

the talk was fantastic? Man, I wouldn't have thought anyone would say that...

Fractals are a good example for the reason you state, chaos, but also because splitting is awkward: you can't break the load in equal cpu-time chunks... You have to load balance as you go and task-stealing is really good in that sort of case.

note to self: need to work on my examples-coolness skillz

note to others: next iteration of nulstein still has the big cubes but also has the lil' cubes replaced by point-lights (much cooler :) )

added on the 2010-03-08 14:34:39 by nulstein

@nulstein, very inspiring code. while(StealTasks()); is clever. thank you

rulez added on the 2010-03-10 20:47:23 by neoneye

pouët.net

popularity helper

comments

submit changes

add a comment