Nvidia is working on their new text to image AI tool eDiff-I, similar to Stable diffusion, Midjourney and Dall-e 2. Let’s have a look and see if it can be a contender to Stable diffusion.
https://deepimagination.cc/eDiffi/
Stable diffusion ultimate guide:
https://youtu.be/DHaL56P6f5M
Black Panther with partial scale armor fighting a Tiger as a test image 🙂, and test if can draw hands properly please!
Open course?
I gotta be honest with you Seba, I care not about Stable diffusion or anything like it, I just watch your videos because of your hilarious jokes!
The reference to compare these tools should be "one hand with 4 fingers and 1 thumb lighting up a "firequacker" "
Beautiful
"NIDCKA VIDA" is gibberish in Russian, also half of the letters used in these words doesn't even exist in Russian alphabet.
и снова видео и бумага…. код не дождёмся… канал превращается в TWO PAPERS? "ой посмотрите как это круто, вау, вау" реализацию для пользователей не дождёмся…
the new lighting is dope (y)
dethrone SD? unlikely. It is no doubt advanced, but SD will catch up eventually, the difference is SD is (or can be) local, open source, free, no online mode necessary, customizable, and endless potential for anyone or any purpose, from an artist wanting a handy quick background generator, clipart needs, visual novel rapid generation, hobby and fun, or whatever all on your desktop.
This is corporate run, which tells me immediately it will be online only, no open source, heavily censored, paid subscription based, etc.
I hope to be wrong, but chances are I am not. It might dethrone Midjourney but SD will reign king simply for flexability.
Unless it understands composition it won't be any better. The quality might differ but they all will just be advance tracing programs and not true AIs…
I think this kind of multimask workflow is totally possible with stable diffusion. Question is current quality of its in-painting, but that’s likely to be fixed until 2.0 version.
Otherwise, Nvidia’s context understanding looks amazing.
I’m getting the latest NVIDIA graphics card next week, to help with generating a large number of images for a project. Is this a better cloud-based option by NVIDIA built without Stable Diffusion?
There is already work on implementing paint with words with Stable Diffusion. But it is quite rought around the edges right now but it works.
Can’t wait to see what the future holds with all of this. By future I mean next week 😊
Is it open source? – No
Will it thus dethrone SD? – No
Thank you so much for the update super happy to know there is another player in the industry.
The real question is: people will stop blame the model and start understand they have poor immagination?
That cover photo is ❤
I am curious about one thing…CAN IT DO FINGERS PROPERLY? If yes, instant win over any other image generator
I love you
Clip guidance will be implemented to all soon I don’t think that makes it more impressive tbh
I think any kind of work they do on this is great I'm glad that there are other companies working on this stuff. I'm going to tell you right now though that stable diffusion has the upper leg on everybody because it's free. It's open source and you have a large community working on it. I guarantee you that they're going to fix the fingers and hands issues they're going to fix the being able to put words and text in it. This will all get fixed.
If it ain't free then no, it won't dethrone SD.
hey Sebastian, unrelated, but I really wanna thank you for your videos. they’ve helped me so much with the A.I.’s and your personality makes them so much more enjoyable than the rest 🙂
Is this model open source?
I miss half the video because I'm laughing at your jokes, you are going to have to calm down, sir.
Seriously though, this AI understands pronouns better than half the people in the world.
I love Nvidia and SD both
Is it free?
Think it's pronounced like the word edify, meaning to teach/instruction. Googling the word will give you the pronunciation.
Great video, thanks for sharing this update in ML with us all! 🙂
Hello Seb. You mention that you can learn a specific style with dreambooth. You mean a specific art style? I know you can learn faces with it, but I don't know how you can learn a specific art style of a lesser known artist. Do you have any advice on how to do that?
Only evolutionary. The community behind Stable Diffusion will be able to implement the same.
Another AI Lab flexing their closed solution?
The paint with words feature shouldn't be all that hard to implement for Stable Diffusion – it's mostly a GUI thing where different masks are created, then those images sequentially created by img2img
If they have different Unet denoiser at each step, then the model will be huge?
I think it's pronounced "e-Diff one". Maybe they're planning to release other versions too (cf. Dall E, Dall E II). While this model is far better than Stable Diffusion, I hate the fact that they've only restricted its use to themselves unlike SD. SD is open source and available to everyone. If I were to choose between the two, I would plump for SD.
The throne is cake.
Sorry I don't understand your accent/english.
Aside from readable text, The only thing this has that midjourney doesnt is the paint with words tool
Its not going to kill anything… You will need to buy nvidias new gpu for every new version of this… Nvidia is actively killing dlss with segmentig it acros diferent gpu generation's
Can't wait for paint-with-words and instant style recognition to make it to local stable diffusion!
But all were deceived another text-to-image was made that was imbued with the cruelty and menace to govern all life. One text-to-image to rule them all. One to find them and to eternal generating bind them.
6:41 "they can't cherry pick every image"
The left one is the Stable Diffusion image, middle one is of DALLE-2, and right one is of their work
if only they can fix eyes and hands
assebastian it's Anvidia
I would have liked to see if it can render hands correctly, but it seems very promising. Maybe we also can get good faces without having to use a different model for the faces. But unless the model is released open source then it will not dethrone Stable Diffusion, nothing will until then.
These still feel like hacks. I bet soon there will be systems using full 3d, then use that as a template for diffusion style img2img for detailing and surface materials.
PS it’s pronounced “envidia” (Spanish for envy)
if we cant have access, its useless.
Curious if those results took lots of variations to achieve.
Unless they make it Open Source, it won't dethrone Stable Diffusion. If being better was the criteria for dethroning then the art generators by the companies Google, Microsoft, Facebook, OpenAI, and Baidu have already dethroned Stable Diffusion.
Should have mentioned Eler Fudd in the riddle
You have a lot of good one-liners during your serious explanation of the t2i mechanics, but your intentional "jokes" are really bad. Love the irony.
Damn, your jokes are bad. Thanks for the vid NVIDA is on it's way.
5:00 So I tested this prompt in Midjourney V 4 and it actually produced very similar results While there was no panda it did create a very coherent dragon on one of the teapots much more coherent than the one on display from eDiff-I. I would argue that only one of the animals asked for is on a teapot which midjourney matched.
нидска вида, очень по русски, спасибо😂
4:16 According to my trusty GPT-3:
💬 "Nidcka vida is Swedish for 'pussy willow.' The Swedish word nidcka is derived from the Old Norse word hnykkja, which means 'to bend down.' The word vida comes from the Old Norse word viðr, meaning 'willow'." 🤡
WOW, It uses exactly the same principle (except for prompt) as Nvidia Canvas (AI software to paint landscapes).
` prompt:"anthropic drumset playing a banjo" ` or would it be "anthropomorphized drumset playing a banjo"
5:28 Anyone? No takers? Goddamn it.
Closed models need to die. If they open this and the code then fine but otherwise I just don't care.
I think the first picture at 4:44 had interpreted "having a picture of a panda" as having it overhead
Is it free and downloadable on git hub?
doubt it will dethrone stable diffusion as SD is out and free to all. The nvida one is not released (not even sure it will be released)
Yes Sabastian, I did learn something from this video, thank you! I learned excellent dad jokes.
Holly molly xD You got me at the fire quacker joke x) Now I can't stop laughing. Such a basic joke, yet so efficient!
Holly Molly xD The crickets xD 🐼🦗
"edify"
is this in beta??
8:42 LOL @ the Aurora behind the moon. For all of it's smarts, absolutely rookie photoshopper mistake.
Unless it's open source, stable diffusion will catch it no matter what
If its not open source and therefore widely available to anyone, no it won't dethrone SD.
I think eDiff-I is a play on ed·i·fy
/ˈedəˌfī/
verbFORMAL
instruct or improve (someone) morally or intellectually.
I know these tools improve my abilities significantly 😁
Ed – if – eye is the word "edify"
But they are saying E – Diff – eye
Unless I can natively train my own datasets on it like I do with Stable Diffusion, I don't care
If it's open source and can be ran locally, then sure.
If it can generate good hands and it is uncensored unlike Dall-E or Midjourney maybe.
Is it open source?
good tips and information. thanks
I have been playing with SD 1.5 for about a week. I have found that you need to think about descriptions of things you want to see in the image specifically. There is no inferencing with most of AI language interpretation. So for example you dont want to say "a girl standing inside of a house" because it doesnt inference "inside" of "the house". It just interprets "house" as something that should be in the image.
So what you want to do is describe object that would be found inside this room you are thinking of. "bed, pillow, night stand, lamp, curtains", this are the objects that you would find common inside of a bedroom. I find that this strategy works very well for most things. If you want to see a dog, it may or may not show you the entire dog, so you add the words of the image that are missing, Dog ears, Dog tail, Dog hind legs, Dog teeth.
Your painting a picture with words, so you need to include the words that describe what you are seeing. If you have an idea of an image, its actually very fun to play with in this style.
Also you can steal styles very well with SD, I found that if you jump to Image to Image, and use your starting image as your style, you can describe an entire different scene, and it will use a lot of what was in the starting image to recreate entire new concepts.
If it isn't free / open-source it will likely not beat Stable Diffusion.
People like free and open-source stuff that doesn't come with rules or strings attached which most AI tools lack.
Stable Diffusion has a huge advantage over any other AI because anyone with free time and a passion can work on it and improve it.
I'm not a good painting but I feel like the way my creativity works all this ai stuff could get some of me ideas out in different forms. I love good ol painting even though I suck but I'm gonna deep dive into all this ai stuff now
If it’s not free and open source, then no, it will not
The Throne is not held by Stable Diffusion at the moment it is Midjourney V4, for a little while. Dalle-2 is in the dust
I just tested all of these prompts on my SD model.. My results were better lmao.. Piss off Nvidia.
The way I see it they are only changing the interpreter while the base remains the same. Img2Img can already can work with your own drawing, you only need to do the actual picture in Paint or other program. Speaking of text, that's nice to have, although it seems it was only slaped on the picture as the last step and I don't think even Nvidia interpreter would be able to combine the readable text with blue/red shirt in just one prompt. Still, an improvement but I don't think they will dethrone SD. Well, maybe DALL-E which is now so obsolete and almost dead. The problem with Nvidia is that they make this feature only available for their newest gpus or in their paid app. I think it won't be long since people codes this into SD.
I love NVidia AI Audio2Face + Metahumans + Unreal.
My requests reflect more of the game dev community's desire. We need a decent image to 3D. And a free mocap AI with pre-recorded video with foot locking without jittering.
AI for 2D, I'm only interested in Stable Diffusion img2img improvements. If NVidia's eDiff-I doesn't have anything similar, I'm not interested.
Midjouney V4 is gamechanging for MJ and really needs to be considered when talking about any new models, papers, and DLE2 and SD. Plenty of limitations but what it does well is incredible. It’s detail level and coherency is SO much better than before.
No, it's not russian words XD In russian looks "н-видиа скалы"
Yeah let me see it make a penis…. I'll let you know if it's the killer of stable diffusion or not
Is it possible to create an actuall text in a image with sd, dalle or likewize?
Wow, It's smarter than GenZ, because it actually knows how pronouns work.