nulstein by Nulstein
[nfo]
|
||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
||||||||||||||
|
popularity : 62% |
|||||||||||||
alltime top: #7834 |
|
|||||||||||||
added on the 2009-10-16 11:08:05 by nulstein |
popularity helper
comments
Very cool, thanks. :)
Yay, thank you !
Hmm, it seems hard to work at intel : nulstein.exe, kkcompress.bat, compileVSH.bat and compilePSH.bat have been blocked and are missing from the zip. (beautiful warnings :)
sounds cool, judging from the infofile. pardon me asking, but where was that seminar, and is there a video?
.
argh... Virus checkers everywhere...
nulstein.exe, is the result of the build, so we can make do...
kkcompress.bat is the batch to invoke kkrunchy so that's not too important either... You know how to use that, right ?
On the other hand compileVSH.bat and compilePSH.bat are the scripts that take care of shader compilation, and without them, you can't build the thing. Damn Damn Damn.
Let's see what I can do.
nulstein.exe, is the result of the build, so we can make do...
kkcompress.bat is the batch to invoke kkrunchy so that's not too important either... You know how to use that, right ?
On the other hand compileVSH.bat and compilePSH.bat are the scripts that take care of shader compilation, and without them, you can't build the thing. Damn Damn Damn.
Let's see what I can do.
rename them to .txt :P
I've uploaded the correct file to this other URL until we fix the file on the intel servers... Sorry about that.
www.gpuviewer.com/download/nulstein.zip
skrebbel: I presented this seminar at Evoke 2009 in Cologne. There was no video, it happened on the sunday morning and cameraman was nowhere to be found (probably crashed somewhere, fast asleep ;) )
www.gpuviewer.com/download/nulstein.zip
skrebbel: I presented this seminar at Evoke 2009 in Cologne. There was no video, it happened on the sunday morning and cameraman was nowhere to be found (probably crashed somewhere, fast asleep ;) )
haven't looked at the code yet, but will the meat in this port to other platforms? i.e. mac / bsd / linux...?
jaw: definitely, this should port to other platforms without much sweat.
The only caveat is that the whole approach revolves around the assumption of "shared memory", and this makes it not very suitable for PS3. Otherwise, I can imagine this being ported to the systems you mention and others like XBox 360.
I'd love any of these ports to happen, drop me a line if you head down that route, I'll help where I can.
The only caveat is that the whole approach revolves around the assumption of "shared memory", and this makes it not very suitable for PS3. Otherwise, I can imagine this being ported to the systems you mention and others like XBox 360.
I'd love any of these ports to happen, drop me a line if you head down that route, I'll help where I can.
Cool. I was looking for something like this.
thumbs up...
In case this seminar was right after the 4klang seminar - I guess the camera man escaped (to get a cold beer!) from that - by far too warm room.
In case this seminar was right after the 4klang seminar - I guess the camera man escaped (to get a cold beer!) from that - by far too warm room.
so? intel tbb is the fatty while nulstein or jobswarm are lightweight alternatives?
What rc55 said
TBB is only fat if you look at it from a 64K perspective... nulstein is fat if you look at it from a 4K perspective, too :)
There are two goals to this:
- make a "working scale model" of TBB that makes it easier to understand the basic concepts of task scheduling
- explore simple ways to make a game engine scale over more than a few cores
note to self: look jobswarm up
There are two goals to this:
- make a "working scale model" of TBB that makes it easier to understand the basic concepts of task scheduling
- explore simple ways to make a game engine scale over more than a few cores
note to self: look jobswarm up
Thank you for sharing!
Quote:
note to self: look jobswarm up
please do, i'm curious about the comparison from someone familiar with this type of code
My article explaining how this works is now online on Intel Software Network:
http://software.intel.com/en-us/articles/do-it-yourself-game-task-scheduling/
http://software.intel.com/en-us/articles/do-it-yourself-game-task-scheduling/
useful
the article on how this works has been published on Gamasutra too, now.
http://www.gamasutra.com/view/feature/4287/sponsored_feature_doityourself_.php
Thsi has finally prompted me to comment on the difference with Jobswarm:
"JobSwarm (http://code.google.com/p/jobswarm/) is the simplest approach possible: there is one circular buffer that serves as a job queue, and worker threads pump from it as they need. What happens with this sort of configuration is that the queue soon becomes the main contention point. As the size of the jobs gets smaller, the overhead of accessing it increases. The impact is minimal in JobSwarm because of another aspect of it: only the main thread can submit work. If this is enough for your application, then this is pretty much as simple as it can get.
In TBB (and nulstein), a task can be further subdivided (i.e. it can spawn more tasks). This makes it almost trivial to cut&dice your workload: tasks spawn more tasks and split further until we have workloads that don't benefit from being split further. There are two big consequences to this feature: you can't have one big centralized queue as it would cause too much contention, you need a queue per worker thread. This leads to the second issue, imbalance: some tasks take more time than others and this implies some queues empty faster than others. The solution is "work stealing" which, in effect, ends up load-balancing the system. "
Thought I might as well copy the answer here as question was asked here first.
http://www.gamasutra.com/view/feature/4287/sponsored_feature_doityourself_.php
Thsi has finally prompted me to comment on the difference with Jobswarm:
"JobSwarm (http://code.google.com/p/jobswarm/) is the simplest approach possible: there is one circular buffer that serves as a job queue, and worker threads pump from it as they need. What happens with this sort of configuration is that the queue soon becomes the main contention point. As the size of the jobs gets smaller, the overhead of accessing it increases. The impact is minimal in JobSwarm because of another aspect of it: only the main thread can submit work. If this is enough for your application, then this is pretty much as simple as it can get.
In TBB (and nulstein), a task can be further subdivided (i.e. it can spawn more tasks). This makes it almost trivial to cut&dice your workload: tasks spawn more tasks and split further until we have workloads that don't benefit from being split further. There are two big consequences to this feature: you can't have one big centralized queue as it would cause too much contention, you need a queue per worker thread. This leads to the second issue, imbalance: some tasks take more time than others and this implies some queues empty faster than others. The solution is "work stealing" which, in effect, ends up load-balancing the system. "
Thought I might as well copy the answer here as question was asked here first.
Quote:
he previously was Technical Director at Bits Studios
Ah-HAH! ;)
the talk was fantastic, i learned a lot. i have implemented my own work stealing system, and 500 lines of code seems to be the breaking point between something too simple and unnecessary complex.
chaos > did you diverge much from what's done in nulstein?
<3
:§: it is pretty much the same, but i have to cooler examples:
each job calculates one of the 1024 lines of this mandelbrot. the color code at the left identifies the hardware thread, this is a core i7 with 8 threads.
you can see how each thread works in it's initially assigned segment from top to bottom. the segments with the dark spots take longer, and the other threads come for help. this is really impressive in animation, when you see how the load gets balanced.
stealing happens 19 times, and none of the locks stalls. In my implementation, a "steal" may fail when two threads try to steal the same thing at the same time, and that almost never happens.
note that the mandelbrot in this example is not optimized, the whole point of choosing fractals is to find something that is slow enough to be worth the effort.
each job calculates one of the 1024 lines of this mandelbrot. the color code at the left identifies the hardware thread, this is a core i7 with 8 threads.
you can see how each thread works in it's initially assigned segment from top to bottom. the segments with the dark spots take longer, and the other threads come for help. this is really impressive in animation, when you see how the load gets balanced.
stealing happens 19 times, and none of the locks stalls. In my implementation, a "steal" may fail when two threads try to steal the same thing at the same time, and that almost never happens.
note that the mandelbrot in this example is not optimized, the whole point of choosing fractals is to find something that is slow enough to be worth the effort.
missed that seminar :/
thumb for releasing it, alltho late !
@chaos: this really remembers me a lot on good old amiga ! ( move.l #$f00,$dff180; // after everythings called in the main-loop, to determine how much cpu-cycles are left for the frame ! )
thumb for releasing it, alltho late !
@chaos: this really remembers me a lot on good old amiga ! ( move.l #$f00,$dff180; // after everythings called in the main-loop, to determine how much cpu-cycles are left for the frame ! )
chaos > really slick, would you share it by chance?
the talk was fantastic? Man, I wouldn't have thought anyone would say that...
Fractals are a good example for the reason you state, chaos, but also because splitting is awkward: you can't break the load in equal cpu-time chunks... You have to load balance as you go and task-stealing is really good in that sort of case.
note to self: need to work on my examples-coolness skillz
note to others: next iteration of nulstein still has the big cubes but also has the lil' cubes replaced by point-lights (much cooler :) )
Fractals are a good example for the reason you state, chaos, but also because splitting is awkward: you can't break the load in equal cpu-time chunks... You have to load balance as you go and task-stealing is really good in that sort of case.
note to self: need to work on my examples-coolness skillz
note to others: next iteration of nulstein still has the big cubes but also has the lil' cubes replaced by point-lights (much cooler :) )
@nulstein, very inspiring code. while(StealTasks()); is clever. thank you
submit changes
if this prod is a fake, some info is false or the download link is broken,
do not post about it in the comments, it will get lost.
instead, click here !
There is an article explaining how this all works that will be soon published soon, but I really wanted this to be posted to the scene first as, really, it was meant to be released *at* the party.
So, here you go !