Realtime 3D rendering engine driven by a diffusion model
category: code [glöplog]
Also you do understand the goal of the paper was to demonstrate method to "stable auto-regressive generation over long trajectories". The Doom thing is merely a marketing stunt.
This, as so much about this tech, seems so predictably stupid and perversely backward to me. So if I understand it right (let’s go past marketing stunts), this is akin to training an “AI” on recorded jet flights. Then creating a steering system that will not transfer the movement of the flystick directly, but instead a system that will take input from the flystick and based on that input and training data predict what the next movement of the flaps should be. Either completely oblivious to other realtime sensors or just including their readout data in the decision/prediction mix through some elaborate formula (but nothing too complex, we don’t want to lose that milisecond gain over pure human pilot we got by prediction). It seems to me that this cannot work in the sense of being truly useful and beyond a parlor trick. Not now nor ever. Because the thinking behind it seems flawed and stacked backward from the start. Like how that series “Dark Matter” cannot work as they envisioned it because the authors don’t seem to understand Schrödinger’s thought experiment. However, it’s very possible I got it all wrong because I can’t even pretend to understand what “stable auto-regressive generation over long trajectories” means. :-)
Some thoughts about headlines and publishing: that's not really the fault of researchers that nowadays certain to topics get more attention that they should and sometimes researchers are even ashamed of it
Maybe you know the new formula for PI news when authors of the papers. The formula is new,but not better than existing ones and it wasn't the main topic of the paper, so no wonder researchers think the press reaction was over the top.
With AI Doom, ofc it was more deliberate to stirr attention around it, as I said, even deliberate marketing stunt, but still all those stupid reports that "game devs are now on life support" are completely over the top and missing entirely the point of it.
Maybe you know the new formula for PI news when authors of the papers. The formula is new,but not better than existing ones and it wasn't the main topic of the paper, so no wonder researchers think the press reaction was over the top.
With AI Doom, ofc it was more deliberate to stirr attention around it, as I said, even deliberate marketing stunt, but still all those stupid reports that "game devs are now on life support" are completely over the top and missing entirely the point of it.
Errata: "when authors of the papers" -> "when authors of the papers were annoyed of the attention it gets"
Some of the reactions are not totally unexpected, but still a bit disappointing. The demoscene I recall would have taken in any new technology and made a demo with it / on it / about it.
There are some fairly interesting underlying Ideas in ML/"AI" that can also be used on a much smaller scale.
- Gradient descent basically allows to fit any differentiable function to the data your are interested in, and there are really powerful frameworks out there to do this efficiently.
- Many people above mentioned style transfer - actually already "old" by ML timescale, but quite interesting for demos.
- Diffusion models are really interesting technology for image synthesis.
- Transformers are powerful in learning spatial relationships and order, but they really need scale.
- Neural radiance fields and gaussian splatting are technologies that come up for 3d rendering. The ideas are well known to the demoscene, but there seems to be quite some progress in efficient model training.
Nobody forces you to feed the entire internet into a 100k GPU training cluster that runs on a nuclear power plaint, just to train another incarnation of a barely differentiated LLM.
I am also only scratching the surface as an outsider to the field. But in contrast to some earlier technology hypes (Blockchain, Big data, etc...) I am quite enthralled by the technology behind it.
Indeed, "nice demo" is the first thing i thought...
Hehe, yeah, I have been inactive for a long time. But of course still peeking at what is going on.
They took an interesting and recent technology and used it to implement something that is technically unexpected, yet is well known to many people so they can grasp the effort and challenge behind it.
Sounds a bit like a demo :)
Publishing in AI/ML is quite different from how it used to in science in general, people have to rely much more on personal marketing. Also considering that the most impactful recent paper is titled "Attention is all you need", there seems to be a competition for the most whimsical paper title.
There are some fairly interesting underlying Ideas in ML/"AI" that can also be used on a much smaller scale.
- Gradient descent basically allows to fit any differentiable function to the data your are interested in, and there are really powerful frameworks out there to do this efficiently.
- Many people above mentioned style transfer - actually already "old" by ML timescale, but quite interesting for demos.
- Diffusion models are really interesting technology for image synthesis.
- Transformers are powerful in learning spatial relationships and order, but they really need scale.
- Neural radiance fields and gaussian splatting are technologies that come up for 3d rendering. The ideas are well known to the demoscene, but there seems to be quite some progress in efficient model training.
Nobody forces you to feed the entire internet into a 100k GPU training cluster that runs on a nuclear power plaint, just to train another incarnation of a barely differentiated LLM.
I am also only scratching the surface as an outsider to the field. But in contrast to some earlier technology hypes (Blockchain, Big data, etc...) I am quite enthralled by the technology behind it.
Quote:
I was already wondering why no one had opened a thread here. :]
Indeed, "nice demo" is the first thing i thought...
Quote:
It makes me happy to see a post by Azure in 2024.. :)
Hehe, yeah, I have been inactive for a long time. But of course still peeking at what is going on.
Quote:
Also you do understand the goal of the paper was to demonstrate method to "stable auto-regressive generation over long trajectories". The Doom thing is merely a marketing stunt.
They took an interesting and recent technology and used it to implement something that is technically unexpected, yet is well known to many people so they can grasp the effort and challenge behind it.
Sounds a bit like a demo :)
Quote:
Some thoughts about headlines and publishing: that's not really the fault of researchers that nowadays certain to topics get more attention that they should and sometimes researchers are even ashamed of it
Publishing in AI/ML is quite different from how it used to in science in general, people have to rely much more on personal marketing. Also considering that the most impactful recent paper is titled "Attention is all you need", there seems to be a competition for the most whimsical paper title.
Azure: I'm not against good marketing of a research paper, it's indeed a necessity. I was actually trying to tone down voices that discredit research based on that. I personally blame the press here. It is not very surprising that all those flashy headlines with outrageous claims can irritate people.
On this note it's also ironic that the people criticising were doing very similar "stunts", just in a different field.
Quote:
Azure: I'm not against good marketing of a research paper, it's indeed a necessity. I was actually trying to tone down voices that discredit research based on that. I personally blame the press here. It is not very surprising that all those flashy headlines with outrageous claims can irritate people.
Agreed. The way anything AI is reported on in the press is sensationalized and often only focusses on the most controversial parts. Easy to fall prey to that.
I basically only added to your point that this behavior is not due to the researchers being particular hypesters, but that this behavior is more along the norm in academic publishing in ML nowadays. (And also in other fields).
the shitload of youtube videos with that AI Doom video and 'THIS WILL CHANGE EVERYTHING!' is indeed ridiculous. Killer robots should do something about that AI hypetrain!
Quote:
Back in the days Doom in a fragment shaders was the shit and got some decent spotlight, even Carmack was impressed. Today, it seems AI-Doom gets pretty big spotlight. Both equally useless *as is* for games. You don't see Doom Eternal rendering as full-screen quad raycasting per pixel, do you? How is that different?
The point of the insane full-screen visuals in demos and shadertoy is not to replace any current tech. Instead, they are a creative demonstration, an invitation for developers to explore other ways of creating content and solving problems.
They play the same role as Haute Couture plays in fashion - the point there isn't to wear those crazy garments in your day-to-day life as you go to the supermarket, but to express creativity, set trends and inspire the industry to explore new design ideas.
And in the case of demos and shadertoy experiments we know for a fact it's moving the needle. We often see technology inspired (if not full based on) these "insane" techniques applied maps in a cab service app, in data visualization in some space agency's website, and all sort of places (there's lots of rendering happening that is not gamedev).
So it's not fair to say these "wasteful" techniques are "useless", I think they do play a very practical and important role. Although it would also be fine if all they were was pure creative expression.
iq: nicely put, and I agree, however... I may be wrong, but I got impression that you think Doom in fragment shader is better achievement and more "Haute Couture" than AI-Doom?
“Can it play Doom?” is an important piece of culture. It’s a digital cultural heritage imho. I find the direction in which this sorta subculture was going is perhaps the most important part. That direction is and was ‘down’ in sense of ever smaller tenhnological complexity of prospective platform. It is and was like: can this coffee maker run Doom, wait can this calculator from the 90s run Doom, wait can this ancient console run Doom, wait can this industrial microcontroller run Doom. This is pure spirit of what I’m after in this hobby. So, real achievement would be something like, can C1541 disk drive run Doom for example, or can this Speak&Spell run Doom if I connect a screen instead of speaker, can the Stonehenge run Doom? Proving that the 5th gen fighter jet or a newest space station can run Doom is totally missing the point, making a huge ass AI server farm fake that it’s running Doom even more so. The complexity of machinery needed to fake that it’s running Doom is such that it could simultaneously run 1000000 real Dooms with a million players. I can’t understand people who pretend they don’t see this idiocy. I’m sure there are people who pretend nothing, who genuinely can’t see, and that terrifies me. One of recognisable and defining points of the demoscene (as I see it) was never “opening cans with a nuclear bomb” thinking, it was always “making a calculator out of a can opener” thinking.
It seems to me you guys are just strongly biased against anything related to AI by default, so you try to rationalize it various ways, but I don't really see a clear argument.
4gentE: following your logic, demonstrating that you can run Doom in a fragment shader on modern beefy GPU that has order of magnitude more flops than original 486 or Pentium is not exactly an impressive feat either. Also where this AI server farm comes from...you talk about training? Cause they say it runs on single TPU.
4gentE: following your logic, demonstrating that you can run Doom in a fragment shader on modern beefy GPU that has order of magnitude more flops than original 486 or Pentium is not exactly an impressive feat either. Also where this AI server farm comes from...you talk about training? Cause they say it runs on single TPU.
Yes, I of course I thought of training and play data amount.
It was just a personal observation on the meaning of the whole “Can it run Doom?” cultural moment, and how this feels different from it.
Running same base code on which you ducktape more code (or training data) on ever more complex machinery with almost the same results has a name. It’s called bloat and you can observe it in big OSs, on web etc. This whole LLM charade, especially in creation of visuals is that - even more, orders of magnitude more, BLOAT. I don’t think demoscene is the right place for bloat, that’s all.
And yes, more generally, you got it right. I think this current crop of crap around LLMs (please don’t call it AI, I have nothing against AI as a concept) is wrong and harmful from like 5 different angles. That’s exactly why I’m so opposed to its over-proliferation, trying to hold on to some safe heaven, safe space free of it. I’m not about to drop any of the angles, just so that someone wouldn’t dismiss me as being biased. You know “the fact that I’m paranoid doesn’t mean I’m not being followed” drill…
It was just a personal observation on the meaning of the whole “Can it run Doom?” cultural moment, and how this feels different from it.
Running same base code on which you ducktape more code (or training data) on ever more complex machinery with almost the same results has a name. It’s called bloat and you can observe it in big OSs, on web etc. This whole LLM charade, especially in creation of visuals is that - even more, orders of magnitude more, BLOAT. I don’t think demoscene is the right place for bloat, that’s all.
And yes, more generally, you got it right. I think this current crop of crap around LLMs (please don’t call it AI, I have nothing against AI as a concept) is wrong and harmful from like 5 different angles. That’s exactly why I’m so opposed to its over-proliferation, trying to hold on to some safe heaven, safe space free of it. I’m not about to drop any of the angles, just so that someone wouldn’t dismiss me as being biased. You know “the fact that I’m paranoid doesn’t mean I’m not being followed” drill…
All good, but I think we may be forgetting generational aspect of it. AFAIK many younger devs/researchers look up into AI as a new exciting and still vastly unexplored territory, kind of like we were looking up to real-time rendering back in the days. Now imagine you are trying to appeal to oldschoolers like us with oldschool tropes like "can it play doom" and then all you hear is from top dogs here "your interpretation of - can it play doom - is wrong, we are not impressed, go home".
I understand.
Haha “top dogs” made me LOL. Man, I’m a literal nobody, I’m an underdog if anyone is!
Haha “top dogs” made me LOL. Man, I’m a literal nobody, I’m an underdog if anyone is!
It’s not “running doom” - it’s generating imagery based on a training set of doom imagery. But you can’t actually play Doom on it, right? As in, the actual game with the levels we remember? Aren’t you just seeing a faux-Doom?
But don't you feel everything about this tech is faux or fake? Because it's upside down? Like how rendering to bitmap is the last and final step in every creation. While these things ingest already bitmapped pixels and via noise wrangle them to other bitmaps. They work in the opposite direction from logical 'creation' vector.
How it normally works? You all know the drill. You create points. Connect them to make polygons. Add more polygons. It becomes a mesh/surface. Colour it. Texture it. Assign materials. Light it. All is parametric. Add camera. Render it. Only now the pixels come into play. It's the last step and it's destructive. Entropy sets in. There's no going back. That's the normal flow. These things swim in the opposite direction to the normal flow of creation. That's why they need so much resources for the simplest tasks. Because they are brute forcing themselves swimming in the wrong direction. Accelerating entropy. From humongous amount of lossy rendered final pixel bitmaps the machinery teaches itself to predict what new bitmaps should look like. These things could be said to "understand" less about anything than an advanced physical renderer does. They "understand" nothing. An yet We stubbornly call them AI.
How it normally works? You all know the drill. You create points. Connect them to make polygons. Add more polygons. It becomes a mesh/surface. Colour it. Texture it. Assign materials. Light it. All is parametric. Add camera. Render it. Only now the pixels come into play. It's the last step and it's destructive. Entropy sets in. There's no going back. That's the normal flow. These things swim in the opposite direction to the normal flow of creation. That's why they need so much resources for the simplest tasks. Because they are brute forcing themselves swimming in the wrong direction. Accelerating entropy. From humongous amount of lossy rendered final pixel bitmaps the machinery teaches itself to predict what new bitmaps should look like. These things could be said to "understand" less about anything than an advanced physical renderer does. They "understand" nothing. An yet We stubbornly call them AI.
farfar: it's not the same game, yes, but playable. Doom on Shadertoy is also different game.
4gentE: what you say reminds me of “Natural numbers were created by God, everything else is the work of men.”.
4gentE: what you say reminds me of “Natural numbers were created by God, everything else is the work of men.”.
Wait, did men invent the concept of negative integers before or after they invented the concept of God?
4gentE: yeah, it was way after, and even later, they invented tensors and back propagation.
And now they're back to God I see.
Yeah, some of them with simulation theory bullshit for sure.
AI is amazing code trickery we are not using a lot when making demos. Could any of the techniques be used to generate images for a 64K or 4K for example? Billions of parameters doesn’t sound very 64K, but perhaps there are something in these techniques that can be useful in demomaking?